How to use Boto3 library in Python to get a list of files from S3 based on the last modified date using AWS Resource?

Use the boto3 library in Python to retrieve a list of files from AWS S3 that were modified after a specific timestamp. This is useful for filtering files based on their last modified date using AWS Resource interface.

Prerequisites

Before running the code, ensure you have:

  • AWS credentials configured (via CLI, environment variables, or IAM roles)
  • Boto3 library installed: pip install boto3
  • Proper S3 bucket permissions

Approach

The solution involves these key steps:

  1. Validate the S3 path format
  2. Create AWS session and S3 resource
  3. List all objects in the specified prefix
  4. Compare each file's LastModified timestamp
  5. Return files modified after the given date

Implementation

import boto3
from botocore.exceptions import ClientError
from datetime import datetime

def list_files_by_last_modified(s3_path, last_modified_timestamp):
    """
    Get list of S3 files modified after a given timestamp
    
    Args:
        s3_path: S3 path in format 's3://bucket-name/prefix/'
        last_modified_timestamp: Timestamp string or datetime object
    
    Returns:
        List of S3 file paths modified after the timestamp
    """
    # Validate S3 path format
    if 's3://' not in s3_path:
        raise ValueError('Invalid S3 path. Expected format: s3://bucket-name/prefix/')
    
    # Parse S3 path
    path_parts = s3_path.replace('s3://', '').split('/')
    bucket_name = path_parts[0]
    prefix = '/'.join(path_parts[1:]) if len(path_parts) > 1 else ''
    
    # Add trailing slash if prefix exists
    if prefix and not prefix.endswith('/'):
        prefix += '/'
    
    # Create AWS session and S3 resource
    session = boto3.Session()
    s3_resource = session.resource('s3')
    
    try:
        # List all objects with the given prefix
        response = s3_resource.meta.client.list_objects_v2(
            Bucket=bucket_name, 
            Prefix=prefix
        )
        
        # Handle case when no objects found
        if 'Contents' not in response:
            return []
        
        # Convert timestamp to datetime if it's a string
        if isinstance(last_modified_timestamp, str):
            # Parse timestamp (handles timezone-aware strings)
            if '+' in last_modified_timestamp:
                timestamp = datetime.fromisoformat(last_modified_timestamp.replace('+00:00', '+00:00'))
            else:
                timestamp = datetime.fromisoformat(last_modified_timestamp)
        else:
            timestamp = last_modified_timestamp
        
        # Filter files based on last modified date
        filtered_files = []
        for obj in response['Contents']:
            if obj['LastModified'] >= timestamp:
                full_path = f"s3://{bucket_name}/{obj['Key']}"
                filtered_files.append(full_path)
        
        return filtered_files
        
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == 'NoSuchBucket':
            raise Exception(f"Bucket '{bucket_name}' does not exist")
        elif error_code == 'AccessDenied':
            raise Exception(f"Access denied to bucket '{bucket_name}'")
        else:
            raise Exception(f"AWS error: {e}")
    except Exception as e:
        raise Exception(f"Unexpected error: {e}")

# Example usage
if __name__ == "__main__":
    # Example 1: Find files modified after specific timestamp
    try:
        timestamp = "2021-01-21T13:19:56.986445+00:00"
        files = list_files_by_last_modified("s3://my-bucket/uploads/", timestamp)
        print(f"Files modified after {timestamp}:")
        for file in files:
            print(f"  {file}")
    except Exception as e:
        print(f"Error: {e}")
    
    # Example 2: Using datetime object
    try:
        from datetime import datetime, timezone
        cutoff_time = datetime(2021, 1, 21, 13, 19, 56, tzinfo=timezone.utc)
        files = list_files_by_last_modified("s3://my-bucket/data/", cutoff_time)
        print(f"\nFiles found: {len(files)}")
    except Exception as e:
        print(f"Error: {e}")

Key Features

Feature Description Benefit
Timezone Support Handles timezone-aware timestamps Accurate date comparisons
Error Handling Specific error messages for common issues Better debugging experience
Flexible Input Accepts string or datetime objects Easy integration
list_objects_v2 Uses newer S3 API version Better performance

Common Use Cases

  • Data Processing: Process only newly uploaded files
  • Backup Systems: Identify files that need backing up
  • ETL Pipelines: Filter datasets by modification date
  • Log Analysis: Analyze recent log files only

Best Practices

  • Use list_objects_v2 instead of the older list_objects
  • Handle pagination for buckets with many objects
  • Use timezone-aware datetime objects for accurate comparisons
  • Implement proper error handling for network and permission issues
  • Consider using S3 inventory for large-scale operations

Conclusion

Using boto3 with proper timestamp filtering allows efficient retrieval of recently modified S3 files. The approach combines S3's list_objects_v2 API with datetime comparison for robust file filtering based on modification dates.

Updated on: 2026-03-25T18:09:35+05:30

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements