Skip to main content

Multipart Upload

Multipart Upload is a way to upload large files to an S3 Object Storage like Cubbit, by splitting them into smaller parts and uploading each part in parallel.

Uploading multiple pieces at the same time improves the upload speed and provides improved reliability and resumability in case of network errors or interruptions, since if the upload of a single part fails, the other parts are not affected and the process can resume at any moment. The parts are then combined into a single object on the server.

It has several advantages:

  • Upload speed: uploading more parts in parallel makes the uploading process faster.

  • Failure recovery: if the connection is dropped while uploading, the object is still safe. Parts that have previously been uploaded are still there and the upload can go on with the missing parts.

  • Pause the upload: object's upload can be started and stopped as needed without having to re-upload the entire thing.

All of these benefits make this feature a great fit to upload large objects maximizing the network throughput or to upload files in an unstable network, where failures may occur quite easily.

The Multipart Upload steps

Here are the steps involved:

  • Initiate the upload: the first step is to initiate the upload process by sending a CreateMultipartUpload API request to create a new multipart upload. This request returns an upload ID, which is used to identify the upload in subsequent API requests.

  • Upload parts: next, you can upload the parts of the file in parallel by sending UploadPart API requests with the upload ID and part number. The part number should start with 1 and increment for each part.

  • Complete the upload: once all the parts have been uploaded, you can complete the upload by sending a CompleteMultipartUpload API request with the upload ID and a list of the part numbers and their corresponding ETags (a hash of the data).

  • Verify the upload: finally, you can verify the successful completion of the upload by downloading the entire file (with a GetObject) and comparing it to the original file.

How to stop an upload

In case of a failure, it is possible to abort a multipart upload with a AbortMultipartUpload API request, which will discard the uploaded parts and free up storage.

Usage

Let's see how to work with this feature using the AWS s3api CLI commands.

Create a multipart upload

Let's start initializing a multipart upload:

aws s3api create-multipart-upload --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg

This will print a UploadId in the output, let's take note of that.

Did you forget to write down the UploadId?

No worries, if you forget to write down the UploadId you can perform a ListMultipartUploads, which shows a list of ongoing multipart uploads with their ID.

Show the ongoing multipart uploads

If at any point we need to check which multipart uploads are still ongoing:

aws s3api list-multipart-uploads --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg

Upload a few parts

Then we can upload a few parts, let's make two:

aws s3api upload-part --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.log --upload-id <UploadId> --part-number 1 --body ~/sergio-part1.log

aws s3api upload-part --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.log --upload-id <UploadId> --part-number 2 --body ~/sergio-part2.log

These will print the ETag values in the output, let's take note of them.

Did you forget to write down the ETags?

No worries, if you forget to write down the ETags you can perform a ListParts, which shows a list of parts with their part number and ETag.

Show the uploaded parts

If at any point we need to check which parts have already been uploaded:

aws s3api list-parts --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg --upload-id <UploadId>

Complete a multipart upload

Finally, once all the parts have been uploaded we can create the final object out of them, by completing the upload:

aws s3api complete-multipart-upload --endpoint https://s3.cubbit.eu --bucket my-cubbit-bucket --key sergio.jpg --upload-id <UploadId> --multipart-upload "Parts=[{ETag=<ETag first part>,PartNumber=1},{ETag=<ETag second part>,PartNumber=2}]"