Storing Video Metadata in DynamoDB

Schema Design Considerations

When storing video metadata in DynamoDB, schema design directly impacts query performance and scalability. Unlike relational databases, DynamoDB requires careful planning of primary keys and indexes to support access patterns efficiently. A typical video metadata record includes fields such as videoId (partition key), title, duration, fileSize, format, resolution, createdAt, and status. Secondary global indexes (GSIs) enable querying by non-key attributes like userId or category.

The partition key (videoId) should use a uniformly distributed value such as a UUID to avoid hot partitions. For time-series queries, a composite sort key like USERID#DATE allows efficient filtering by user and date range. DynamoDB's 400 KB item size limit necessitates storing large binary data (e.g., thumbnails) in S3, with only references in DynamoDB.

Core Attribute Structure

Each video metadata entry must support essential query and display operations. A typical schema includes:

Attribute	Type	Description
videoId	String	Partition key (UUID v4)
userId	String	GSI partition key for user-specific queries
createdAt	Number	Sort key (Unix timestamp)
title	String	Video title
duration	Number	Duration in milliseconds
resolutions	Map	S3 paths for different encodings
tags	String Set	Searchable keywords (SSE enabled)

code

{ "videoId": "a1b2c3d4-5678-90ef-ghij-klmnopqrstuv", "userId": "user_789", "createdAt": 1717645200, "title": "Product Launch Demo", "duration": 354000, "resolutions": { "1080p": "s3://video-bucket/transcoded/a1b2c3d4/1080p.mp4", "720p": "s3://video-bucket/transcoded/a1b2c3d4/720p.m3u8" }, "tags": ["product", "demo", "launch"]}

Access Patterns and Indexing

Primary Key Queries

Retrieve a video by its unique videoId using a simple GetItem operation. This operation is highly efficient and directly uses the partition key.

code

import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('VideoMetadata')

response = table.get_item(
Key={'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv'}
)

Global Secondary Index (GSI) for User-Specific Access

To allow user-based filtering, define a GSI with userId as the partition key. This allows efficient queries to list all videos uploaded by a user without scanning the base table.

code

response = table.query(
IndexName='UserIdIndex',
KeyConditionExpression='userId = :uid',
ExpressionAttributeValues={':uid': 'user_789'}
)

Time-Range Queries

For applications where users need to filter videos by upload date, implement a composite key such as userId#createdAt and use range operators. This structure enables efficient retrieval within a given time window.

code

response = table.query(
KeyConditionExpression='userId = :uid AND createdAt BETWEEN :start AND :end',
ExpressionAttributeValues={
':uid': 'user_789',
':start': 1717645000,
':end': 1717645400
}
)

Performance Optimization

Provisioned Throughput Calculations

Read/write capacity units (RCU/WCU) must account for item size and access frequency. A 1 KB item consumes 1 RCU for strongly consistent reads or 0.5 RCU for eventually consistent reads. For example, 100 reads/sec of 3 KB items require 300 RCU (strong) or 150 RCU (eventual).

Adaptive Capacity and Bursting

DynamoDB automatically redistributes throughput for imbalanced workloads, but sustained hot partitions may throttle requests. Enabling on-demand capacity eliminates throughput planning for unpredictable workloads.

Efficient Updates with Condition Expressions

Use atomic counters and conditional expressions for updates like incrementing view counts or ensuring safe updates to a specific version of an item. This avoids conflicts in concurrent write operations.

code

table.update_item(
Key={'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv'},
UpdateExpression='SET viewCount = viewCount + :inc',
ExpressionAttributeValues={':inc': 1},
ReturnValues='UPDATED_NEW'
)

Advanced Features

Stream Processing with Lambda Triggers

Enable DynamoDB Streams to trigger Lambda functions on item insertions or updates. This can be used to automate downstream tasks like thumbnail generation, transcoding job queuing, or audit logging.

code

def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
new_image = record['dynamodb']['NewImage']
# Trigger thumbnail generation

Time-to-Live (TTL) for Ephemeral Data

Use the TTL attribute to automatically expire outdated or temporary metadata entries. This reduces storage costs and simplifies cleanup logic. TTL deletes are handled asynchronously and may take up to 48 hours.

code

table.put_item(
Item={
'videoId': 'a1b2c3d4-5678-90ef-ghij-klmnopqrstuv',
'expiryTime': int(time.time()) + 86400 # 24 hours from now
}
)

Error Handling and Retries

Implement exponential backoff for throttled requests using the AWS SDK's built-in retry mechanism:

code

from botocore.config import Config

dynamodb = boto3.client('dynamodb', config=Config(
retries={
'max_attempts': 5,
'mode': 'adaptive'
}
))

Batch operations (BatchGetItem, BatchWriteItem) reduce network overhead for bulk operations, but failed requests require manual retries for unprocessed items.

Cost Considerations

Storage Costs: $0.25/GB-month for data, $0.20/GB-month for indexes (us-east-1)
On-Demand Pricing: $1.25/million write requests, $0.25/million read requests
Backup Costs: $0.10/GB-month for PITR (point-in-time recovery)

DynamoDB's granular IAM permissions allow restricting access to specific attributes (e.g., deny UpdateItem on viewCount for non-admin users). Encryption at rest (AES-256) is enabled by default.

Storing Video Metadata in DynamoDB

Schema Design Considerations

Core Attribute Structure

Access Patterns and Indexing

Primary Key Queries

Global Secondary Index (GSI) for User-Specific Access

Time-Range Queries

Performance Optimization

Provisioned Throughput Calculations

Adaptive Capacity and Bursting

Efficient Updates with Condition Expressions

Advanced Features

Stream Processing with Lambda Triggers

Time-to-Live (TTL) for Ephemeral Data

Error Handling and Retries

Cost Considerations

Was this article helpful?