DynamoDB - Global Secondary Indexes


Advertisements


Applications requiring various query types with different attributes can use a single or multiple global secondary indexes in performing these detailed queries.

For example − A system keeping a track of users, their login status, and their time logged in. The growth of the previous example slows queries on its data.

Global secondary indexes accelerate queries by organizing a selection of attributes from a table. They employ primary keys in sorting data, and require no key table attributes, or key schema identical to the table.

All the global secondary indexes must include a partition key, with the option of a sort key. The index key schema can differ from the table, and index key attributes can use any top-level string, number, or binary table attributes.

In a projection, you can use other table attributes, however, queries do not retrieve from parent tables.

Attribute Projections

Projections consist of an attribute set copied from table to secondary index. A Projection always occurs with the table partition key and sort key. In queries, projections allow DynamoDB access to any attribute of the projection; they essentially exist as their own table.

In a secondary index creation, you must specify attributes for projection. DynamoDB offers three ways to perform this task −

  • KEYS_ONLY − All index items consist of table partition and sort key values, and index key values. This creates the smallest index.

  • INCLUDE − It includes KEYS_ONLY attributes and specified non-key attributes.

  • ALL − It includes all source table attributes, creating the largest possible index.

Note the tradeoffs in projecting attributes into a global secondary index, which relate to throughput and storage cost.

Consider the following points −

  • If you only need access to a few attributes, with low latency, project only those you need. This reduces storage and write costs.

  • If an application frequently accesses certain non-key attributes, project them because the storage costs pale in comparison to scan consumption.

  • You can project large sets of attributes frequently accessed, however, this carries a high storage cost.

  • Use KEYS_ONLY for infrequent table queries and frequent writes/updates. This controls size, but still offers good performance on queries.

Global Secondary Index Queries and Scans

You can utilize queries for accessing a single or multiple items in an index. You must specify index and table name, desired attributes, and conditions; with the option to return results in ascending or descending order.

You can also utilize scans to get all index data. It requires table and index name. You utilize a filter expression to retrieve specific data.

Table and Index Data Synchronization

DynamoDB automatically performs synchronization on indexes with their parent table. Each modifying operation on items causes asynchronous updates, however, applications do not write to indexes directly.

You need to understand the impact of DynamoDB maintenance on indices. On creation of an index, you specify key attributes and data types, which means on a write, those data types must match key schema data types.

On item creation or deletion, indexes update in an eventually consistent manner, however, updates to data propagate in a fraction of a second (unless system failure of some type occurs). You must account for this delay in applications.

Throughput Considerations in Global Secondary Indexes − Multiple global secondary indexes impact throughput. Index creation requires capacity unit specifications, which exist separate from the table, resulting in operations consuming index capacity units rather than table units.

This can result in throttling if a query or write exceeds provisioned throughput. View throughput settings by using DescribeTable.

Read Capacity − Global secondary indexes deliver eventual consistency. In queries, DynamoDB performs provision calculations identical to that used for tables, with a lone difference of using index entry size rather than item size. The limit of a query returns remains 1MB, which includes attribute name size and values across every returned item.

Write Capacity

When write operations occur, the affected index consumes write units. Write throughput costs are the sum of write capacity units consumed in table writes and units consumed in index updates. A successful write operation requires sufficient capacity, or it results in throttling.

Write costs also remain dependent on certain factors, some of which are as follows −

  • New items defining indexed attributes or item updates defining undefined indexed attributes use a single write operation to add the item to the index.

  • Updates changing indexed key attribute value use two writes to delete an item and write a new one.

  • A table write triggering deletion of an indexed attribute uses a single write to erase the old item projection in the index.

  • Items absent in the index prior to and after an update operation use no writes.

  • Updates changing only projected attribute value in the index key schema, and not indexed key attribute value, use one write to update values of projected attributes into the index.

All these factors assume an item size of less than or equal to 1KB.

Global Secondary Index Storage

On an item write, DynamoDB automatically copies the right set of attributes to any indices where the attributes must exist. This impacts your account by charging it for table item storage and attribute storage. The space used results from the sum of these quantities −

  • Byte size of table primary key
  • Byte size of index key attribute
  • Byte size of projected attributes
  • 100 byte-overhead per index item

You can estimate storage needs through estimating average item size and multiplying by the quantity of the table items with the global secondary index key attributes.

DynamoDB does not write item data for a table item with an undefined attribute defined as an index partition or sort key.

Global Secondary Index Crud

Create a table with global secondary indexes by using the CreateTable operation paired with the GlobalSecondaryIndexes parameter. You must specify an attribute to serve as the index partition key, or use another for the index sort key. All index key attributes must be string, number, or binary scalars. You must also provide throughput settings, consisting of ReadCapacityUnits and WriteCapacityUnits.

Use UpdateTable to add global secondary indexes to existing tables using the GlobalSecondaryIndexes parameter once again.

In this operation, you must provide the following inputs −

  • Index name
  • Key schema
  • Projected attributes
  • Throughput settings

By adding a global secondary index, it may take a substantial time with large tables due to item volume, projected attributes volume, write capacity, and write activity. Use CloudWatch metrics to monitor the process.

Use DescribeTable to fetch status information for a global secondary index. It returns one of four IndexStatus for GlobalSecondaryIndexes −

  • CREATING − It indicates the build stage of the index, and its unavailability.

  • ACTIVE − It indicates the readiness of the index for use.

  • UPDATING − It indicates the update status of throughput settings.

  • DELETING − It indicates the delete status of the index, and its permanent unavailability for use.

Update global secondary index provisioned throughput settings during the loading/backfilling stage (DynamoDB writing attributes to an index and tracking added/deleted/updated items). Use UpdateTable to perform this operation.

You should remember that you cannot add/delete other indices during the backfilling stage.

Use UpdateTable to delete global secondary indexes. It permits deletion of only one index per operation, however, you can run multiple operations concurrently, up to five. The deletion process does not affect the read/write activities of the parent table, but you cannot add/delete other indices until the operation completes.

Using Java to Work with Global Secondary Indexes

Create a table with an index through CreateTable. Simply create a DynamoDB class instance, a CreateTableRequest class instance for request information, and pass the request object to the CreateTable method.

The following program is a short example −

DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient ( 
   new ProfileCredentialsProvider()));
   
// Attributes 
ArrayList<AttributeDefinition> attributeDefinitions = new 
   ArrayList<AttributeDefinition>();  
attributeDefinitions.add(new AttributeDefinition() 
   .withAttributeName("City") 
   .withAttributeType("S"));
   
attributeDefinitions.add(new AttributeDefinition() 
   .withAttributeName("Date") 
   .withAttributeType("S"));
   
attributeDefinitions.add(new AttributeDefinition() 
   .withAttributeName("Wind") 
   .withAttributeType("N"));
   
// Key schema of the table 
ArrayList<KeySchemaElement> tableKeySchema = new ArrayList<KeySchemaElement>(); 
tableKeySchema.add(new KeySchemaElement()
   .withAttributeName("City") 
   .withKeyType(KeyType.HASH));              //Partition key
   
tableKeySchema.add(new KeySchemaElement() 
   .withAttributeName("Date") 
   .withKeyType(KeyType.RANGE));             //Sort key
   
// Wind index 
GlobalSecondaryIndex windIndex = new GlobalSecondaryIndex() 
   .withIndexName("WindIndex") 
   .withProvisionedThroughput(new ProvisionedThroughput() 
   .withReadCapacityUnits((long) 10) 
   .withWriteCapacityUnits((long) 1)) 
   .withProjection(new Projection().withProjectionType(ProjectionType.ALL));
   
ArrayList<KeySchemaElement> indexKeySchema = new ArrayList<KeySchemaElement>(); 
indexKeySchema.add(new KeySchemaElement() 
   .withAttributeName("Date") 
   .withKeyType(KeyType.HASH));              //Partition key
   
indexKeySchema.add(new KeySchemaElement() 
   .withAttributeName("Wind") 
   .withKeyType(KeyType.RANGE));             //Sort key
   
windIndex.setKeySchema(indexKeySchema);  
CreateTableRequest createTableRequest = new CreateTableRequest() 
   .withTableName("ClimateInfo") 
   .withProvisionedThroughput(new ProvisionedThroughput() 
   .withReadCapacityUnits((long) 5) 
   .withWriteCapacityUnits((long) 1))
   .withAttributeDefinitions(attributeDefinitions) 
   .withKeySchema(tableKeySchema) 
   .withGlobalSecondaryIndexes(windIndex); 
Table table = dynamoDB.createTable(createTableRequest); 
System.out.println(table.getDescription());

Retrieve the index information with DescribeTable. First, create a DynamoDB class instance. Then create a Table class instance to target an index. Finally, pass the table to the describe method.

Here is a short example −

DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient ( 
   new ProfileCredentialsProvider()));
   
Table table = dynamoDB.getTable("ClimateInfo"); 
TableDescription tableDesc = table.describe();  
Iterator<GlobalSecondaryIndexDescription> gsiIter = 
   tableDesc.getGlobalSecondaryIndexes().iterator(); 

while (gsiIter.hasNext()) { 
   GlobalSecondaryIndexDescription gsiDesc = gsiIter.next(); 
   System.out.println("Index data " + gsiDesc.getIndexName() + ":");  
   Iterator<KeySchemaElement> kse7Iter = gsiDesc.getKeySchema().iterator(); 
   
   while (kseIter.hasNext()) { 
      KeySchemaElement kse = kseIter.next(); 
      System.out.printf("\t%s: %s\n", kse.getAttributeName(), kse.getKeyType()); 
   }
   Projection projection = gsiDesc.getProjection(); 
   System.out.println("\tProjection type: " + projection.getProjectionType()); 
   
   if (projection.getProjectionType().toString().equals("INCLUDE")) { 
      System.out.println("\t\tNon-key projected attributes: " 
         + projection.getNonKeyAttributes()); 
   } 
}

Use Query to perform an index query as with a table query. Simply create a DynamoDB class instance, a Table class instance for the target index, an Index class instance for the specific index, and pass the index and query object to the query method.

Take a look at the following code to understand better −

DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient ( 
   new ProfileCredentialsProvider()));
   
Table table = dynamoDB.getTable("ClimateInfo"); 
Index index = table.getIndex("WindIndex");  
QuerySpec spec = new QuerySpec() 
   .withKeyConditionExpression("#d = :v_date and Wind = :v_wind") 
   .withNameMap(new NameMap() 
   .with("#d", "Date"))
   .withValueMap(new ValueMap() 
   .withString(":v_date","2016-05-15") 
   .withNumber(":v_wind",0));
   
ItemCollection<QueryOutcome> items = index.query(spec);
Iterator<Item> iter = items.iterator();

while (iter.hasNext()) {
   System.out.println(iter.next().toJSONPretty()); 
}

The following program is a bigger example for better understanding −

Note − The following program may assume a previously created data source. Before attempting to execute, acquire supporting libraries and create necessary data sources (tables with required characteristics, or other referenced sources).

This example also uses Eclipse IDE, an AWS credentials file, and the AWS Toolkit within an Eclipse AWS Java Project.

import java.util.ArrayList;
import java.util.Iterator;

import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient;
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
import com.amazonaws.services.dynamodbv2.document.Index;
import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.dynamodbv2.document.ItemCollection;
import com.amazonaws.services.dynamodbv2.document.QueryOutcome;
import com.amazonaws.services.dynamodbv2.document.Table;
import com.amazonaws.services.dynamodbv2.document.spec.QuerySpec;
import com.amazonaws.services.dynamodbv2.document.utils.ValueMap;

import com.amazonaws.services.dynamodbv2.model.AttributeDefinition;
import com.amazonaws.services.dynamodbv2.model.CreateTableRequest;
import com.amazonaws.services.dynamodbv2.model.GlobalSecondaryIndex;
import com.amazonaws.services.dynamodbv2.model.KeySchemaElement;
import com.amazonaws.services.dynamodbv2.model.KeyType;
import com.amazonaws.services.dynamodbv2.model.Projection;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput;

public class GlobalSecondaryIndexSample {  
   static DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient ( 
      new ProfileCredentialsProvider()));  
   public static String tableName = "Bugs";   
   public static void main(String[] args) throws Exception {  
      createTable(); 
      queryIndex("CreationDateIndex"); 
      queryIndex("NameIndex"); 
      queryIndex("DueDateIndex"); 
   }
   public static void createTable() {  
      // Attributes 
      ArrayList<AttributeDefinition> attributeDefinitions = new 
         ArrayList<AttributeDefinition>();  
      attributeDefinitions.add(new AttributeDefinition()
         .withAttributeName("BugID") 
         .withAttributeType("S")); 
         
      attributeDefinitions.add(new AttributeDefinition() 
         .withAttributeName("Name")
         .withAttributeType("S"));
         
      attributeDefinitions.add(new AttributeDefinition() 
         .withAttributeName("CreationDate")
         .withAttributeType("S"));
         
      attributeDefinitions.add(new AttributeDefinition() 
         .withAttributeName("DueDate") 
         .withAttributeType("S"));
         
      // Table Key schema
      ArrayList<KeySchemaElement> tableKeySchema = new ArrayList<KeySchemaElement>(); 
      tableKeySchema.add (new KeySchemaElement() 
         .withAttributeName("BugID") 
         .withKeyType(KeyType.HASH));              //Partition key 
      
      tableKeySchema.add (new KeySchemaElement() 
         .withAttributeName("Name") 
         .withKeyType(KeyType.RANGE));             //Sort key
         
      // Indexes' initial provisioned throughput
      ProvisionedThroughput ptIndex = new ProvisionedThroughput()
         .withReadCapacityUnits(1L)
         .withWriteCapacityUnits(1L);
         
      // CreationDateIndex 
      GlobalSecondaryIndex creationDateIndex = new GlobalSecondaryIndex() 
         .withIndexName("CreationDateIndex") 
         .withProvisionedThroughput(ptIndex) 
         .withKeySchema(new KeySchemaElement() 
         .withAttributeName("CreationDate") 
         .withKeyType(KeyType.HASH),               //Partition key 
         new KeySchemaElement()
         .withAttributeName("BugID") 
         .withKeyType(KeyType.RANGE))              //Sort key 
         .withProjection(new Projection() 
         .withProjectionType("INCLUDE") 
         .withNonKeyAttributes("Description", "Status"));
         
      // NameIndex 
      GlobalSecondaryIndex nameIndex = new GlobalSecondaryIndex() 
         .withIndexName("NameIndex") 
         .withProvisionedThroughput(ptIndex) 
         .withKeySchema(new KeySchemaElement()  
         .withAttributeName("Name")  
         .withKeyType(KeyType.HASH),                  //Partition key 
         new KeySchemaElement()  
         .withAttributeName("BugID")  
         .withKeyType(KeyType.RANGE))                 //Sort key 
         .withProjection(new Projection() 
         .withProjectionType("KEYS_ONLY"));
         
      // DueDateIndex 
      GlobalSecondaryIndex dueDateIndex = new GlobalSecondaryIndex() 
         .withIndexName("DueDateIndex") 
         .withProvisionedThroughput(ptIndex) 
         .withKeySchema(new KeySchemaElement() 
         .withAttributeName("DueDate") 
         .withKeyType(KeyType.HASH))               //Partition key 
         .withProjection(new Projection() 
         .withProjectionType("ALL"));
         
      CreateTableRequest createTableRequest = new CreateTableRequest() 
         .withTableName(tableName) 
         .withProvisionedThroughput( new ProvisionedThroughput() 
         .withReadCapacityUnits( (long) 1) 
         .withWriteCapacityUnits( (long) 1)) 
         .withAttributeDefinitions(attributeDefinitions)
         .withKeySchema(tableKeySchema)
         .withGlobalSecondaryIndexes(creationDateIndex, nameIndex, dueDateIndex);  
         System.out.println("Creating " + tableName + "..."); 
         dynamoDB.createTable(createTableRequest);  
      
      // Pause for active table state 
      System.out.println("Waiting for ACTIVE state of " + tableName); 
      try { 
         Table table = dynamoDB.getTable(tableName); 
         table.waitForActive(); 
      } catch (InterruptedException e) { 
         e.printStackTrace(); 
      } 
   }
   public static void queryIndex(String indexName) { 
      Table table = dynamoDB.getTable(tableName);  
      System.out.println 
      ("\n*****************************************************\n"); 
      System.out.print("Querying index " + indexName + "...");  
      Index index = table.getIndex(indexName);  
      ItemCollection<QueryOutcome> items = null; 
      QuerySpec querySpec = new QuerySpec();  
      
      if (indexName == "CreationDateIndex") { 
         System.out.println("Issues filed on 2016-05-22"); 
         querySpec.withKeyConditionExpression("CreationDate = :v_date and begins_with
            (BugID, :v_bug)") 
            .withValueMap(new ValueMap() 
            .withString(":v_date","2016-05-22")
            .withString(":v_bug","A-")); 
         items = index.query(querySpec); 
      } else if (indexName == "NameIndex") { 
         System.out.println("Compile error"); 
         querySpec.withKeyConditionExpression("Name = :v_name and begins_with
            (BugID, :v_bug)") 
            .withValueMap(new ValueMap() 
            .withString(":v_name","Compile error") 
            .withString(":v_bug","A-")); 
         items = index.query(querySpec); 
      } else if (indexName == "DueDateIndex") { 
         System.out.println("Items due on 2016-10-15"); 
         querySpec.withKeyConditionExpression("DueDate = :v_date") 
         .withValueMap(new ValueMap() 
         .withString(":v_date","2016-10-15")); 
         items = index.query(querySpec); 
      } else { 
         System.out.println("\nInvalid index name"); 
         return; 
      }  
      Iterator<Item> iterator = items.iterator(); 
      System.out.println("Query: getting result..."); 
      
      while (iterator.hasNext()) { 
         System.out.println(iterator.next().toJSONPretty()); 
      } 
   } 
}


Advertisements