AWS S3: listing directories with tailored AWS S3 API

Advertisement

SpAwN_gUy
Guest

AWS S3: listing directories with tailored AWS S3 API

We started to generate partitioned datasets to S3, so they are now spread in a great number of folders. And usually have 1 file in them.

And now I want to download/delete these folders and "listing" stage takes too much time going through folders 1-by-1.

AWS has an API ListObjectsV2 https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html
That should speed things up a lot. And I think it is also suitable/tailored to get all the keys from "sub-folders" quite easily.

Thanks for the great tool, btw.

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,262
Location:
Prague, Czechia

Re: AWS S3: listing directories with tailored AWS S3 API

Thanks for your suggestion.
We will see, if more people ask for this.
Btw, I do not see how ListObjectsV2 helps here, comparing to ListObjects that WinSCP us using now. Can you explain that?

Reply with quote

SpAwN_gUy
Guest

AWS S3: listing directories with tailored AWS S3 API

Hey, Martin.

good question about v1 vs v2.
aws suggests using v2 over v1:
This action has been revised. We recommend that you use the newer version, ListObjectsV2, when developing applications. For backward compatibility, Amazon S3 continues to support ListObjects.
apparently,
This (v1) operation is not supported by directory buckets.

In my code I use ListObjectsV2 everywhere. i.e. I follow the recommendation and it is "still" supported with the new features.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html

I don't use directory buckets (yet), but AWS writes directory buckets are kinda not replicated across availability zones and are much faster. (probably for temp distributed storage and AI stuff)

And anyway, V1orV2 I don't mind, but listing all keys in a prefix for a "delete/download" - should significantly speed things up. For now I switch to aws-console to delete a hefty folder.

Reply with quote

martin
Site Admin
martin avatar

Re: AWS S3: listing directories with tailored AWS S3 API

Sure, I understand that retrieving "recursive" listing using one call will make this quite faster. But it is always difficult to use protocol-specific hacks/improvements/optimizations in multi-protocol application like WinSCP (comparing to dedicated aws).

Reply with quote

Advertisement

You can post new topics in this forum