Schema Design Considerations
When storing video metadata in DynamoDB, schema design directly impacts query performance and scalability. Unlike relational databases, DynamoDB requires careful planning of primary keys and indexes to support access patterns efficiently. A typical video metadata record includes fields such as videoId (partition key), title, duration, fileSize, format, resolution, createdAt, and status. Secondary global indexes (GSIs) enable querying by non-key attributes like userId or category.
The partition key (videoId) should use a uniformly distributed value such as a UUID to avoid hot partitions. For time-series queries, a composite sort key like USERID#DATE allows efficient filtering by user and date range. DynamoDB's 400 KB item size limit necessitates storing large binary data (e.g., thumbnails) in S3, with only references in DynamoDB.
Core Attribute Structure
Each video metadata entry must support essential query and display operations. A typical schema includes:
| Attribute | Type | Description |
| videoId | String | Partition key (UUID v4) |
| userId | String | GSI partition key for user-specific queries |
| createdAt | Number | Sort key (Unix timestamp) |
| title | String | Video title |
| duration | Number | Duration in milliseconds |
| resolutions | Map | S3 paths for different encodings |
| tags | String Set | Searchable keywords (SSE enabled) |
{ "videoId": "a1b2c3d4-5678-90ef-ghij-klmnopqrstuv", "userId": "user_789", "createdAt": 1717645200, "title": "Product Launch Demo", "duration": 354000, "resolutions": { "1080p": "s3://video-bucket/transcoded/a1b2c3d4/1080p.mp4", "720p": "s3://video-bucket/transcoded/a1b2c3d4/720p.m3u8" }, "tags": ["product", "demo", "launch"]}Access Patterns and Indexing
Primary Key Queries
Retrieve a video by its unique videoId using a simple GetItem operation. This operation is highly efficient and directly uses the partition key.
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('VideoMetadata')
response = table.get_item(
Key={'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv'}
)Global Secondary Index (GSI) for User-Specific Access
To allow user-based filtering, define a GSI with userId as the partition key. This allows efficient queries to list all videos uploaded by a user without scanning the base table.
response = table.query(
IndexName='UserIdIndex',
KeyConditionExpression='userId = :uid',
ExpressionAttributeValues={':uid': 'user_789'}
)Time-Range Queries
For applications where users need to filter videos by upload date, implement a composite key such as userId#createdAt and use range operators. This structure enables efficient retrieval within a given time window.
response = table.query(
KeyConditionExpression='userId = :uid AND createdAt BETWEEN :start AND :end',
ExpressionAttributeValues={
':uid': 'user_789',
':start': 1717645000,
':end': 1717645400
}
)Performance Optimization
Provisioned Throughput Calculations
Read/write capacity units (RCU/WCU) must account for item size and access frequency. A 1 KB item consumes 1 RCU for strongly consistent reads or 0.5 RCU for eventually consistent reads. For example, 100 reads/sec of 3 KB items require 300 RCU (strong) or 150 RCU (eventual).
Adaptive Capacity and Bursting
DynamoDB automatically redistributes throughput for imbalanced workloads, but sustained hot partitions may throttle requests. Enabling on-demand capacity eliminates throughput planning for unpredictable workloads.
Efficient Updates with Condition Expressions
Use atomic counters and conditional expressions for updates like incrementing view counts or ensuring safe updates to a specific version of an item. This avoids conflicts in concurrent write operations.
table.update_item(
Key={'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv'},
UpdateExpression='SET viewCount = viewCount + :inc',
ExpressionAttributeValues={':inc': 1},
ReturnValues='UPDATED_NEW'
)Advanced Features
Stream Processing with Lambda Triggers
Enable DynamoDB Streams to trigger Lambda functions on item insertions or updates. This can be used to automate downstream tasks like thumbnail generation, transcoding job queuing, or audit logging.
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
new_image = record['dynamodb']['NewImage']
# Trigger thumbnail generationTime-to-Live (TTL) for Ephemeral Data
Use the TTL attribute to automatically expire outdated or temporary metadata entries. This reduces storage costs and simplifies cleanup logic. TTL deletes are handled asynchronously and may take up to 48 hours.
table.put_item(
Item={
'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv',
'expiryTime': int(time.time()) + 86400 # 24 hours from now
}
)Error Handling and Retries
Implement exponential backoff for throttled requests using the AWS SDK's built-in retry mechanism:
from botocore.config import Config
dynamodb = boto3.client('dynamodb', config=Config(
retries={
'max_attempts': 5,
'mode': 'adaptive'
}
))Batch operations (BatchGetItem, BatchWriteItem) reduce network overhead for bulk operations, but failed requests require manual retries for unprocessed items.
Cost Considerations
- Storage Costs: $0.25/GB-month for data, $0.20/GB-month for indexes (us-east-1)
- On-Demand Pricing: $1.25/million write requests, $0.25/million read requests
- Backup Costs: $0.10/GB-month for PITR (point-in-time recovery)
DynamoDB's granular IAM permissions allow restricting access to specific attributes (e.g., deny UpdateItem on viewCount for non-admin users). Encryption at rest (AES-256) is enabled by default.

