Skip to main content

Multipart Upload

Intro

Multipart Upload is a way to upload large files to an S3 compatible Object Storage like Cubbit by splitting them into smaller parts and uploading each part in parallel.

Uploading multiple pieces simultaneously improves upload speed and provides better reliability and resumability in case of network errors or interruptions. In fact, if the upload of a single part fails, the other parts remain unaffected and the process can resume at any time. The parts are then combined into a single object on the server.

This has several advantages:

  • Upload speed: uploading more parts in parallel makes the uploading process faster.
  • Failure recovery: if the connection is dropped while uploading, the object is still safe. Parts that have previously been uploaded are retained and the upload can resume with the missing parts.
  • Pause the upload: The upload of the object can be stopped and resumed as needed, without requiring the entire upload to be restarted.

This feature is ideal for uploading large objects, maximizing network throughput, or for uploading files in an unstable network where failures are common.

Multipart Upload step by step

The multipart upload process involves the following steps:

  1. Initiate the upload: Start the upload process by sending a CreateMultipartUpload API. This request returns an upload ID, which is used to identify the upload in subsequent API requests.
  2. Upload parts: Upload parts of the file in parallel by sending UploadPart API requests with the upload ID and a part number. The part number should start at 1 and increment for each part.
  3. Complete the upload: Once all parts are uploaded, send a CompleteMultipartUpload API request with the upload ID and a list of part numbers and their corresponding ETags (a hash of the data).
  4. Verify the upload: Confirm the successful completion of the upload by downloading the entire file using GetObject and comparing it to the original file.
How to stop an upload

In case of failure, it is possible to abort a multipart upload with a AbortMultipartUpload API request, which will discard the uploaded parts and free up storage.

Usage

Let's see how to work with this feature using the AWS s3api CLI commands.

Create a multipart upload

Let's start initializing a multipart upload:

aws s3api create-multipart-upload --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg

This will print a UploadId in the output, let's take note of that.

Did you forget to write down the UploadId?

No worries, if you forget to write down the UploadId you can perform a ListMultipartUploads, which shows a list of ongoing multipart uploads with their ID.

Show the ongoing multipart uploads

If at any point we need to check which multipart uploads are still ongoing:

aws s3api list-multipart-uploads --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg

Upload a few parts

Then we can upload a few parts, let's make two:

aws s3api upload-part --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.log --upload-id <UploadId> --part-number 1 --body ~/sergio-part1.log

aws s3api upload-part --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.log --upload-id <UploadId> --part-number 2 --body ~/sergio-part2.log

These will print the ETag values in the output, let's take note of them.

Did you forget to write down the ETags?

No worries, if you forget to write down the ETags you can perform a ListParts, which shows a list of parts with their part number and ETag.

Upload a part by copying from another object

When uploading a part, we can also omit the body and specify an existing object as a source to copy from:

aws s3api upload-part-copy --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.log --upload-id <UploadId> --part-number 1 --copy-source "my-cubbit-bucket/my-source-object"

Alternatively, we can choose to only copy a portion of the source object. For example, we could copy the only first 1024 bytes:

aws s3api upload-part-copy --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.log --upload-id <UploadId> --part-number 1 --copy-source "my-cubbit-bucket/my-source-object" --copy-source-range bytes=0-1023

Just like ordinary part uploading, you will need to take note of the value of ETag printed.

Possible error during the copy

Some very old objects might be considered ineligible for copying resulting in an error saying "Invalid source object". Please contact us if you encounter this case.

Show the uploaded parts

If at any point we need to check which parts have already been uploaded:

aws s3api list-parts --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg --upload-id <UploadId>

Complete a multipart upload

Finally, once all the parts have been uploaded we can create the final object out of them, by completing the upload:

aws s3api complete-multipart-upload --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg --upload-id <UploadId> --multipart-upload "Parts=[{ETag=<ETag first part>,PartNumber=1},{ETag=<ETag second part>,PartNumber=2}]"

Limits

The multipart upload has some size limitations as summarized in the following table:

ItemLimit
Maximum object size5 TiB
Maximum number of parts per upload10,000
Part numbers1 to 10,000 (inclusive)
Minimum part size5 MiB
Maximum part size5 GiB
Maximum number of parts returned for a list parts request1000
Maximum number of multipart uploads returned in a list multipart uploads request1000