Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Splitting and uploading extremely large files to Amazon S3
When dealing with extremely large files (10+ GB), uploading directly to Amazon S3 can be challenging due to network timeouts and bandwidth limitations. Amazon S3's multipart upload feature provides a robust solution by splitting large files into smaller chunks that can be uploaded independently and in parallel.
How Multipart Upload Works
The process involves three main steps:
- Initiate: Start a multipart upload session with S3
- Upload Parts: Split the file into chunks and upload each part independently
- Complete: Combine all parts into the final object
Client-Side Implementation (Browser)
Using HTML5 File API and AWS SDK for JavaScript:
// Initialize multipart upload
async function uploadLargeFile(file) {
const CHUNK_SIZE = 5 * 1024 * 1024; // 5MB chunks
const fileName = file.name;
try {
// Step 1: Initiate multipart upload
const initiateParams = {
Bucket: 'your-bucket-name',
Key: fileName,
ContentType: file.type
};
const multipart = await s3.createMultipartUpload(initiateParams).promise();
const uploadId = multipart.UploadId;
// Step 2: Upload parts
const totalParts = Math.ceil(file.size / CHUNK_SIZE);
const uploadPromises = [];
for (let partNumber = 1; partNumber ({
ETag: result.ETag,
PartNumber: partNumber
}))
);
}
// Wait for all parts to complete
const completedParts = await Promise.all(uploadPromises);
// Step 3: Complete multipart upload
const completeParams = {
Bucket: 'your-bucket-name',
Key: fileName,
UploadId: uploadId,
MultipartUpload: {
Parts: completedParts
}
};
await s3.completeMultipartUpload(completeParams).promise();
console.log('Upload completed successfully');
} catch (error) {
console.error('Upload failed:', error);
// Cleanup: abort multipart upload
await s3.abortMultipartUpload({
Bucket: 'your-bucket-name',
Key: fileName,
UploadId: uploadId
}).promise();
}
}
Node.js Server Implementation
For server-side uploads with retry logic:
const AWS = require('aws-sdk');
const fs = require('fs');
const path = require('path');
async function uploadFileWithRetry(filePath, bucketName, key) {
const s3 = new AWS.S3();
const fileSize = fs.statSync(filePath).size;
const CHUNK_SIZE = 10 * 1024 * 1024; // 10MB chunks
const MAX_RETRIES = 3;
// Initiate multipart upload
const { UploadId } = await s3.createMultipartUpload({
Bucket: bucketName,
Key: key
}).promise();
const totalParts = Math.ceil(fileSize / CHUNK_SIZE);
const parts = [];
for (let partNumber = 1; partNumber a.PartNumber - b.PartNumber) }
}).promise();
console.log('Large file upload completed');
}
Key Advantages
| Feature | Benefit | Use Case |
|---|---|---|
| Parallel Upload | Faster transfer speeds | Large files (1GB+) |
| Retry Failed Parts | Resume from failure point | Unstable networks |
| No Intermediate Storage | Reduced EC2 costs | Direct client-to-S3 |
| Configurable Chunk Size | Optimize for network | 5MB to 5GB per part |
Best Practices
- Chunk Size: Use 5-100MB chunks for optimal performance
- Error Handling: Always abort incomplete uploads to avoid storage charges
- Progress Tracking: Monitor upload progress for better user experience
- Parallel Limits: Limit concurrent uploads to avoid overwhelming the client
Conclusion
Amazon S3 multipart upload enables efficient handling of large files by splitting them into manageable chunks. This approach provides better reliability, performance, and cost-effectiveness compared to single-part uploads, especially for files larger than 100MB.
