Splitting and uploading extremely large files to Amazon S3

When dealing with extremely large files (10+ GB), uploading directly to Amazon S3 can be challenging due to network timeouts and bandwidth limitations. Amazon S3's multipart upload feature provides a robust solution by splitting large files into smaller chunks that can be uploaded independently and in parallel.

How Multipart Upload Works

The process involves three main steps:

  1. Initiate: Start a multipart upload session with S3
  2. Upload Parts: Split the file into chunks and upload each part independently
  3. Complete: Combine all parts into the final object
Client (Browser/Node.js) Large File (10GB) Split into chunks (5MB - 5GB each) Part 1 Part 2 Part 3 ... Part N Amazon S3 Multipart Upload ? Parts uploaded in parallel ? Failed parts can be retried ? Automatic assembly ? Final object created Benefits of Multipart Upload ? Improved throughput - parallel uploads ? Resume capability - retry failed parts only ? Better error handling - isolated failures ? No EC2 storage costs - direct to S3

Client-Side Implementation (Browser)

Using HTML5 File API and AWS SDK for JavaScript:

// Initialize multipart upload
async function uploadLargeFile(file) {
    const CHUNK_SIZE = 5 * 1024 * 1024; // 5MB chunks
    const fileName = file.name;
    
    try {
        // Step 1: Initiate multipart upload
        const initiateParams = {
            Bucket: 'your-bucket-name',
            Key: fileName,
            ContentType: file.type
        };
        
        const multipart = await s3.createMultipartUpload(initiateParams).promise();
        const uploadId = multipart.UploadId;
        
        // Step 2: Upload parts
        const totalParts = Math.ceil(file.size / CHUNK_SIZE);
        const uploadPromises = [];
        
        for (let partNumber = 1; partNumber  ({
                      ETag: result.ETag,
                      PartNumber: partNumber
                  }))
            );
        }
        
        // Wait for all parts to complete
        const completedParts = await Promise.all(uploadPromises);
        
        // Step 3: Complete multipart upload
        const completeParams = {
            Bucket: 'your-bucket-name',
            Key: fileName,
            UploadId: uploadId,
            MultipartUpload: {
                Parts: completedParts
            }
        };
        
        await s3.completeMultipartUpload(completeParams).promise();
        console.log('Upload completed successfully');
        
    } catch (error) {
        console.error('Upload failed:', error);
        // Cleanup: abort multipart upload
        await s3.abortMultipartUpload({
            Bucket: 'your-bucket-name',
            Key: fileName,
            UploadId: uploadId
        }).promise();
    }
}

Node.js Server Implementation

For server-side uploads with retry logic:

const AWS = require('aws-sdk');
const fs = require('fs');
const path = require('path');

async function uploadFileWithRetry(filePath, bucketName, key) {
    const s3 = new AWS.S3();
    const fileSize = fs.statSync(filePath).size;
    const CHUNK_SIZE = 10 * 1024 * 1024; // 10MB chunks
    const MAX_RETRIES = 3;
    
    // Initiate multipart upload
    const { UploadId } = await s3.createMultipartUpload({
        Bucket: bucketName,
        Key: key
    }).promise();
    
    const totalParts = Math.ceil(fileSize / CHUNK_SIZE);
    const parts = [];
    
    for (let partNumber = 1; partNumber  a.PartNumber - b.PartNumber) }
    }).promise();
    
    console.log('Large file upload completed');
}

Key Advantages

Feature Benefit Use Case
Parallel Upload Faster transfer speeds Large files (1GB+)
Retry Failed Parts Resume from failure point Unstable networks
No Intermediate Storage Reduced EC2 costs Direct client-to-S3
Configurable Chunk Size Optimize for network 5MB to 5GB per part

Best Practices

  • Chunk Size: Use 5-100MB chunks for optimal performance
  • Error Handling: Always abort incomplete uploads to avoid storage charges
  • Progress Tracking: Monitor upload progress for better user experience
  • Parallel Limits: Limit concurrent uploads to avoid overwhelming the client

Conclusion

Amazon S3 multipart upload enables efficient handling of large files by splitting them into manageable chunks. This approach provides better reliability, performance, and cost-effectiveness compared to single-part uploads, especially for files larger than 100MB.

Updated on: 2026-03-15T23:18:59+05:30

664 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements