Schema Design Considerations

When storing video metadata in DynamoDB, schema design directly impacts query performance and scalability. Unlike relational databases, DynamoDB requires careful planning of primary keys and indexes to support access patterns efficiently. A typical video metadata record includes fields such as videoId (partition key), title, duration, fileSize, format, resolution, createdAt, and status. Secondary global indexes (GSIs) enable querying by non-key attributes like userId or category.

The partition key (videoId) should use a uniformly distributed value such as a UUID to avoid hot partitions. For time-series queries, a composite sort key like USERID#DATE allows efficient filtering by user and date range. DynamoDB's 400 KB item size limit necessitates storing large binary data (e.g., thumbnails) in S3, with only references in DynamoDB.

Core Attribute Structure

Each video metadata entry must support essential query and display operations. A typical schema includes:

Attribute Type Description
videoId String Partition key (UUID v4)
userId String GSI partition key for user-specific queries
createdAt Number Sort key (Unix timestamp)
title String Video title
duration Number Duration in milliseconds
resolutions Map S3 paths for different encodings
tags String Set Searchable keywords (SSE enabled)
code
{ "videoId": "a1b2c3d4-5678-90ef-ghij-klmnopqrstuv", "userId": "user_789", "createdAt": 1717645200, "title": "Product Launch Demo", "duration": 354000, "resolutions": { "1080p": "s3://video-bucket/transcoded/a1b2c3d4/1080p.mp4", "720p": "s3://video-bucket/transcoded/a1b2c3d4/720p.m3u8" }, "tags": ["product", "demo", "launch"]}
Banner for Metadata

Access Patterns and Indexing

Primary Key Queries

Retrieve a video by its unique videoId using a simple GetItem operation. This operation is highly efficient and directly uses the partition key.

code
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('VideoMetadata')

response = table.get_item(
Key={'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv'}
)

Global Secondary Index (GSI) for User-Specific Access

To allow user-based filtering, define a GSI with userId as the partition key. This allows efficient queries to list all videos uploaded by a user without scanning the base table.

code
response = table.query(
IndexName='UserIdIndex',
KeyConditionExpression='userId = :uid',
ExpressionAttributeValues={':uid': 'user_789'}
)

Time-Range Queries

For applications where users need to filter videos by upload date, implement a composite key such as userId#createdAt and use range operators. This structure enables efficient retrieval within a given time window.

code
response = table.query(
KeyConditionExpression='userId = :uid AND createdAt BETWEEN :start AND :end',
ExpressionAttributeValues={
':uid': 'user_789',
':start': 1717645000,
':end': 1717645400
}
)

Performance Optimization

Provisioned Throughput Calculations

Read/write capacity units (RCU/WCU) must account for item size and access frequency. A 1 KB item consumes 1 RCU for strongly consistent reads or 0.5 RCU for eventually consistent reads. For example, 100 reads/sec of 3 KB items require 300 RCU (strong) or 150 RCU (eventual).

Adaptive Capacity and Bursting

DynamoDB automatically redistributes throughput for imbalanced workloads, but sustained hot partitions may throttle requests. Enabling on-demand capacity eliminates throughput planning for unpredictable workloads.

Efficient Updates with Condition Expressions

Use atomic counters and conditional expressions for updates like incrementing view counts or ensuring safe updates to a specific version of an item. This avoids conflicts in concurrent write operations.

code
table.update_item(
Key={'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv'},
UpdateExpression='SET viewCount = viewCount + :inc',
ExpressionAttributeValues={':inc': 1},
ReturnValues='UPDATED_NEW'
)

Advanced Features

Stream Processing with Lambda Triggers

Enable DynamoDB Streams to trigger Lambda functions on item insertions or updates. This can be used to automate downstream tasks like thumbnail generation, transcoding job queuing, or audit logging.

code
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
new_image = record['dynamodb']['NewImage']
# Trigger thumbnail generation

Time-to-Live (TTL) for Ephemeral Data

Use the TTL attribute to automatically expire outdated or temporary metadata entries. This reduces storage costs and simplifies cleanup logic. TTL deletes are handled asynchronously and may take up to 48 hours.

code
table.put_item(
Item={
'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv',
'expiryTime': int(time.time()) + 86400 # 24 hours from now
}
)

Error Handling and Retries

Implement exponential backoff for throttled requests using the AWS SDK's built-in retry mechanism:

code
from botocore.config import Config

dynamodb = boto3.client('dynamodb', config=Config(
retries={
'max_attempts': 5,
'mode': 'adaptive'
}
))

Batch operations (BatchGetItem, BatchWriteItem) reduce network overhead for bulk operations, but failed requests require manual retries for unprocessed items.

Cost Considerations

  • Storage Costs: $0.25/GB-month for data, $0.20/GB-month for indexes (us-east-1)
  • On-Demand Pricing: $1.25/million write requests, $0.25/million read requests
  • Backup Costs: $0.10/GB-month for PITR (point-in-time recovery)

DynamoDB's granular IAM permissions allow restricting access to specific attributes (e.g., deny UpdateItem on viewCount for non-admin users). Encryption at rest (AES-256) is enabled by default.