DynamoDB - Data Pipeline


Data Pipeline allows for exporting and importing data to/from a table, file, or S3 bucket. This of course proves useful in backups, testing, and for similar needs or scenarios.

In an export, you use the Data Pipeline console, which makes a new pipeline and launches an Amazon EMR (Elastic MapReduce) cluster to perform the export. An EMR reads data from DynamoDB and writes to the target. We discuss EMR in detail later in this tutorial.

In an import operation, you use the Data Pipeline console, which makes a pipeline and launches EMR to perform the import. It reads data from the source and writes to the destination.

Note − Export/import operations carry a cost given the services used, specifically, EMR and S3.

Using Data Pipeline

You must specify action and resource permissions when using Data Pipeline. You can utilize an IAM role or policy to define them. The users who are performing imports/exports should make a note that they would require an active access key ID and secret key.

IAM Roles for Data Pipeline

You need two IAM roles to use Data Pipeline −

  • DataPipelineDefaultRole − This has all the actions you permit the pipeline to perform for you.

  • DataPipelineDefaultResourceRole − This has resources you permit the pipeline to provision for you.

If you are new to Data Pipeline, you must spawn each role. All the previous users possess these roles due to the existing roles.

Use the IAM console to create IAM roles for Data Pipeline, and perform the following four steps −

Step 1 − Log in to the IAM console located at https://console.aws.amazon.com/iam/

Step 2 − Select Roles from the dashboard.

Step 3 − Select Create New Role. Then enter DataPipelineDefaultRole in the Role Name field, and select Next Step. In the AWS Service Roles list in the Role Type panel, navigate to Data Pipeline, and choose Select. Select Create Role in the Review panel.

Step 4 − Select Create New Role.