Monitoring AWS MediaConvert transcoding jobs requires accurate visibility into job states, queue times, and failure conditions. By using CloudWatch metrics, alarms, and logs, developers can track operational health, detect errors early, and automate responses to failures or bottlenecks. This setup supports fine-grained diagnostics for media workflows.
CloudWatch Metrics for MediaConvert
AWS Elemental MediaConvert automatically publishes detailed metrics to CloudWatch, enabling real-time monitoring of transcoding jobs. Key metrics include JobsCompleted, JobsErrored, JobsWarning, and QueueTime, all available at one-minute granularity. Each metric is tagged with the job's Queue ARN and Status (COMPLETE, ERROR, PROGRESSING), allowing granular filtering. The OutputDuration metric tracks processed video seconds per job, useful for cost analysis.
MediaConvert sends custom events to CloudWatch Events (now EventBridge) during state transitions. These include AWS::MediaConvert::JobStateChange events with payloads containing job IDs, status messages, and error codes. Developers can parse these events to trigger Lambda functions for failure handling or post-processing workflows.
Creating Custom CloudWatch Alarms
Threshold-based alarms notify teams when transcoding jobs exceed expected parameters. For example, an alarm triggering when the JobsErrored count exceeds zero in five minutes would use this CloudFormation template:
Resources:
TranscodingErrorAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: "MediaConvert-Job-Errors"
ComparisonOperator: GreaterThanThreshold
EvaluationPeriods: 1
MetricName: JobsErrored
Namespace: AWS/MediaConvert
Period: 300
Statistic: Sum
Threshold: 0
AlarmActions:
- arn:aws:sns:us-east-1:123456789012:TranscodingAlerts
Dimensions:
- Name: Queue
Value: "arn:aws:mediaconvert:us-east-1:123456789012:queues/Default"Alarms can also monitor job duration anomalies. The QueueTime metric, compared against a baseline using CloudWatch Anomaly Detection, identifies jobs stuck in the queue:
import boto3
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_alarm(
AlarmName='AbnormalQueueTime',
MetricName='QueueTime',
Namespace='AWS/MediaConvert',
Statistic='Average',
ComparisonOperator='GreaterThanUpperThreshold',
ThresholdMetricId='m1',
EvaluationPeriods=2,
TreatMissingData='notBreaching',
Metrics=[
{
'Id': 'm1',
'Expression': 'ANOMALY_DETECTION_BAND(m2, 2)',
'Label': 'QueueTime (Expected)'
},
{
'Id': 'm2',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/MediaConvert',
'MetricName': 'QueueTime',
'Dimensions': [
{
'Name': 'Queue',
'Value': 'arn:aws:mediaconvert:us-east-1:123456789012:queues/Default'
}
]
},
'Period': 300,
'Stat': 'Average'
},
'ReturnData': False
}
]
)Logging and Diagnostic Patterns
Each MediaConvert job generates structured logs stored in CloudWatch Logs under a unique job-specific path. These logs include encoder output, file validation results, and information on packaging steps like DRM.
The logs help developers isolate causes of failure or confirm that encoding settings were executed as expected. Lambda functions can be set to listen for new logs, decompress them, and use pattern matching to extract error signatures for automated reporting.
/aws/mediaconvert/job/<job-id>These contain:
- FFmpeg encoder output
- Input/output file validation results
- DRM packaging status
- Error stack traces
A Lambda function subscribed to these logs can extract critical errors using pattern matching:
import reimport zlibdef lambda_handler(event, context): logs = zlib.decompress(event['awslogs']['data'], 16+zlib.MAX_WBITS) errors = re.findall(r'ERROR\s+#(\w+):\s(.+)', logs) if errors: publish_to_sns(errors)Custom Metric Filters
CloudWatch Logs can be extended with metric filters to count specific log patterns as numeric metrics. This allows tracking of custom issues such as codec-specific encoding errors. For example, filtering logs for H.265 encoding failures enables the creation of a metric named H265ErrorCount. This metric can be visualized, alerted on, or used to identify trends in codec behavior.
aws logs put-metric-filter \
--log-group-name "/aws/mediaconvert/jobs" \
--filter-name "H265EncodeErrors" \
--filter-pattern '[codec=H265, level=ERROR]' \
--metric-transformations \
'[
{
"metricName": "H265ErrorCount",
"metricNamespace": "Custom/MediaConvert",
"metricValue": "1"
}
]'Dashboard Configuration
CloudWatch dashboards provide a unified view of transcoding activity by combining metric graphs and log queries into a single interface. These dashboards can display job counts over time, job errors per queue, and filterable log outputs for error messages. Widgets are defined in JSON and support custom queries, periods, and dimensions. This allows teams to monitor real-time system status and investigate job history efficiently.
{
"widgets": [
{
"type": "metric",
"x": 0,
"y": 0,
"width": 12,
"height": 6,
"properties": {
"metrics": [
["AWS/MediaConvert", "JobsCompleted", "Queue", "arn:aws:mediaconvert:us-east-1:123456789012:queues/Default"],
["AWS/MediaConvert", "JobsErrored", "Queue", "arn:aws:mediaconvert:us-east-1:123456789012:queues/Default"],
["AWS/MediaConvert", "JobsWarning", "Queue", "arn:aws:mediaconvert:us-east-1:123456789012:queues/Default"]
],
"period": 300,
"stat": "Sum",
"region": "us-east-1",
"title": "Job Status Counts"
}
},
{
"type": "log",
"x": 0,
"y": 6,
"width": 12,
"height": 6,
"properties": {
"query": "SOURCE '/aws/mediaconvert/jobs' | filter @message like /ERROR/",
"region": "us-east-1",
"title": "Error Logs"
}
}
]
}Cost Optimization
Monitoring MediaConvert with CloudWatch incurs costs for metrics, logs, and dashboard queries. These costs can be managed by limiting log retention periods based on severity, reducing metric granularity after a certain retention window, and using metric math where applicable.
| Item | Cost |
| Custom Metrics | $0.30 per metric per month (first 10,000) |
| Log Storage | $0.50 per GB ingested |
| Dashboard Queries | $0.0001 per query (metrics >15 minutes) |
Reducing costs involves:
- Setting log retention policies (e.g., 7 days for debug logs, 1 year for errors)
- Aggregating metrics at 5-minute resolution after 15 days
- Using metric math instead of separate custom metrics
Integration with Incident Management
CloudWatch alarms can trigger predefined remediation steps using AWS Systems Manager Automation. For instance, when consecutive job failures are detected, an automation document can invoke a Lambda function to restart failed jobs.
---
documentType: Automation
description: Restart failed MediaConvert jobs
schemaVersion: '0.3'
parameters:
jobId:
type: String
mainSteps:
- name: restartJob
action: aws:invokeLambdaFunction
inputs:
FunctionName: restartMediaConvertJob
Payload: '{"jobId": "{{jobId}}"}'This workflow executes when a CloudWatch alarm detects consecutive job failures, invoking a Lambda that restarts jobs using the MediaConvert API.

