How to use Boto3 library in Python to get a list of files from S3 based on the last modified date using AWS Resource?


Problem Statement − Use boto3 library in Python to get a list of files from S3, those are modified after a given date timestamp.

Example − List out test.zip from Bucket_1/testfolder of S3 if it is modified after 2021-01-21 13:19:56.986445+00:00.

Approach/Algorithm to solve this problem

Step 1 − Import boto3 and botocore exceptions to handle exceptions.

Step 2s3_path and last_modified_timestamp are the two parameters in function list_all_objects_based_on_last_modified. "last_modified_timestamp" should be in the format “2021-01-22 13:19:56.986445+00:00”. By default, boto3 understands the UTC timezone irrespective of geographical location.

Step 3 − Validate the s3_path is passed in AWS format as s3://bucket_name/key.

Step 4 − Create an AWS session using boto3 library.

Step 5 − Create an AWS resource for S3.

Step 6 − Now list out all the objects of the given prefix using the function list_objects and handle the exceptions, if any.

Step 7 − The result of the above function is a dictionary and it contains all the file-level information in a key named as ‘Contents’. Now extract the bucket-level details in an object.

Step 8 − Now, object is also a dictionary having all the details of a file. Now, fetch LastModified detail of each file and compare with the given date timestamp.

Step 9 − If LastModified is greater than the given timestamp, save the complete file name, else ignore it.

Step 10 − Return the list of files those are modified after the given date timestamp.

Example

The following code gets the list of files from AWS S3 based on the last modified date timestamp −

import boto3
from botocore.exceptions import ClientError

def list_all_objects_based_on_last_modified(s3_files_path,
last_modified_timestamp):
   if 's3://' not in s3_files_path:
      raise Exception('Given path is not a valid s3 path.')
   session = boto3.session.Session()
   s3_resource = session.resource('s3')
   bucket_token = s3_files_path.split('/')
   bucket = bucket_token[2]
   folder_path = bucket_token[3:]
   prefix = ""
   for path in folder_path:
      prefix = prefix + path + '/'
   try:
      result = s3_resource.meta.client.list_objects(Bucket=bucket, Prefix=prefix)
   except ClientError as e:
      raise Exception( "boto3 client error in list_all_objects_based_on_last_modified function: " + e.__str__())
   except Exception as e:
      raise Exception( "Unexpected error in list_all_objects_based_on_last_modified
function of s3 helper: " + e.__str__())
   filtered_file_names = []
   for obj in result['Contents']:
      if str(obj["LastModified"]) >= str(last_modified_timestamp):
         full_s3_file = "s3://" + bucket + "/" + obj["Key"]
         filtered_file_names.append(full_s3_file)
      return filtered_file_names

#give a timestamp to fetch test.zip
print(list_all_objects_based_on_last_modified("s3://Bucket_1/testfolder" , "2021-01-21 13:19:56.986445+00:00"))
#give a timestamp no file is modified after that
print(list_all_objects_based_on_last_modified("s3://Bucket_1/testfolder" , "2021-01-21 13:19:56.986445+00:00"))

Output

#give a timestamp to fetch test.zip
[s3://Bucket_1/testfolder/test.zip]
#give a timestamp no file is modified after that
[]

Updated on: 22-Mar-2021

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements