Benchmarking video encoding on NVIDIA GPUs involves measuring the performance of the NVENC hardware engine across different conditions. This includes evaluating throughput in frames per second, GPU encoder utilization, and encoding latency.
Results vary depending on resolution, bitrate, rate control modes, and selected NVENC presets. The process uses standardized input, consistent encoding commands, and system-level profiling tools to ensure repeatable and comparable results.
Benchmark Goals and Scope
The primary focus is to measure:
- Average encoded frames per second (FPS)
- Peak GPU NVENC engine utilization
- Encoding latency per frame
- Impact of resolution and bitrate on throughput
- Behavior across different presets and rate control modes
Only NVENC hardware encoding is considered; software encoders or decode steps are excluded.
Benchmark Setup and Tools
Hardware Requirements:
- NVIDIA GPU with NVENC support (Turing and newer preferred)
- Driver version → 450.xx
- Sufficient VRAM for 4K workloads
- Dedicated system with no GUI rendering load
Software Stack:
- FFmpeg compiled with-- enable-nvenc
- CUDA Toolkit (for monitoring with nvprof, nvidia-smi)
- Raw YUV test videos (e.g., NV12 format)
Test Video Preparation
Consistent benchmarking requires the use of standardized raw input files, typically in NV12 or YUV420p format. Synthetic test videos can be generated with FFmpeg, ensuring uniform content and frame counts for repeatable tests. Converting videos to NV12 ensures compatibility with NVENC and removes variability from source material.
ffmpeg -f lavfi -i testsrc=size=1920x1080:rate=30 -frames:v 300 -pix_fmt yuv420p test.yuvExplanation:
- -f lavfi -i testsrc: Generates a color pattern test video.
- -size=1920x1080: Target resolution.
- -rate=30: Frame rate.
- -frames:v 300: Total number of frames.
- -pix_fmt yuv420p: Ensures format compatibility.
Convert to NV12 if required:
ffmpeg -i test.yuv -pix_fmt nv12 test_nv12.yuvExplanation:
- -pix_fmt nv12: Converts video to NVENC-compatible pixel format.
- Required because NVENC expects NV12 (planar Y + interleaved UV).
FFmpeg Encoding Benchmark Command
This is a command-line example for encoding raw video input with FFmpeg using NVENC, specifying parameters like pixel format, resolution, preset, rate control, and bitrate. Output is discarded to null to eliminate disk I/O as a bottleneck, ensuring that only encoding performance is measured. Tests should be repeated for different resolutions and presets to fully characterize GPU performance
ffmpeg -f rawvideo -pix_fmt nv12 -s:v 1920x1080 -r 30 -i test_nv12.yuv \-c:v h264_nvenc -preset p1 -rc cbr -b:v 4M -f null -Explanation:
- -f rawvideo: Indicates raw uncompressed input.
- -pix_fmt nv12: Specifies NVENC-compatible pixel format.
- -s:v 1920x1080: Sets frame size.
- -r 30: Input frame rate.
- -i test_nv12.yuv: Input file path.
- -c:v h264_nvenc: Selects NVENC encoder for H.264 output.
- -preset p1: Uses the fastest preset (p1 = maximum throughput, p7 = best quality).
- -rc cbr: Enables constant bitrate mode.
- -b:v 4M: Sets average target bitrate to 4 Mbps.
- -f null -: Discards output to null sink; avoids disk I/O interference
Metrics Collection
During encoding, FFmpeg reports the average frames per second, which serves as the primary throughput metric. NVENC engine utilization can be monitored using nvidia-smi dmon to allow the assessment of hardware saturation and efficiency. These metrics together provide a comprehensive view of encoding performance under various settings.
a. Frames Per Second (FPS)
FFmpeg will print the average speed after encoding:
frame= 300 fps= 870 q=31.0 Lsize= ...Explanation:
- fps=850: Approximate frames encoded per second.
- speed=28.3x: Multiplier of real-time (30 fps ?? 28 = ~840 fps).
b. NVENC Utilization
Use:
nvidia-smi dmon -s uExplanation:
- -s u: Enables encoder (NVENC) and decoder (NVDEC) utilization sampling.
- Reports % utilization of the encoder unit over time.
Testing Multiple Presets and Bitrates
To assess performance under different encoder workloads, vary:
- Presets: p1 (fastest) to p7 (highest quality)
- Bitrates: 2M, 4M, 8M, 16M
- Resolutions: 720p, 1080p, 4K
Example matrix:
| GPU | Resolution | Preset | Bitrate | FPS | NVENC Util |
| RTX 3060 | 1080p | p1 | 4M | 850 | 95% |
| RTX 3060 | 1080p | p7 | 4M | 400 | 90% |
| RTX 3060 | 4k | p1 | 8M | 300 | 98% |
Comparing Multi-Stream Encoding
The capacity for simultaneous encoding is tested by running multiple FFmpeg processes in parallel, each encoding a separate stream. Observing FPS drop-off and NVENC utilization across streams reveals the GPU"s multi-session limits and helps identify potential bottlenecks in parallel workloads. Note that consumer GPUs have hardware-imposed limits on concurrent NVENC sessions.
ffmpeg -re -stream_loop -1 -f rawvideo -pix_fmt nv12 -s 1920x1080 -i test_nv12.yuv \-c:v h264_nvenc -gpu 0 -preset p1 -b:v 4M -f null -Explanation:
- -re: Simulates real-time input.
- -stream_loop -1: Loops the input infinitely.
- -gpu 0: Binds encoding to a specific GPU.
- Launch 2"4 instances and monitor FPS drop and nvidia-smi metrics.
Encoding Latency Measurement (SDK-only)
When using the Video Codec SDK directly, latency per frame can be measured by timestamping before and after each EncodeFrame call. Averaging these measurements over many frames quantifies real-time responsiveness, which is crucial for live streaming and low-latency applications.
auto t_start = std::chrono::high_resolution_clock::now();
encoder.EncodeFrame(...);
auto t_end = std::chrono::high_resolution_clock::now();
auto latency_ms = std::chrono::duration<double, std::milli>(t_end - t_start).count();
Explanation:
- t_start and t_end: Mark timestamps before and after frame encoding.
- latency_ms: Time taken to encode a single frame in milliseconds.
Recommendations for Consistent Benchmarking
- Disable desktop compositors (e.g., X server)
- Pin CPU cores to FFmpeg with taskset
- Set GPU to persistent mode: nvidia-smi -pm 1
- Repeat each test 3"5 times and average the results
- Use a dedicated SSD or RAMDISK to avoid I/O bottlenecks if outputting to disk

