Benchmarking NVIDIA GPUs for Video Encoding

Benchmarking video encoding on NVIDIA GPUs involves measuring the performance of the NVENC hardware engine across different conditions. This includes evaluating throughput in frames per second, GPU encoder utilization, and encoding latency.

Results vary depending on resolution, bitrate, rate control modes, and selected NVENC presets. The process uses standardized input, consistent encoding commands, and system-level profiling tools to ensure repeatable and comparable results.

Benchmark Goals and Scope

The primary focus is to measure:

Average encoded frames per second (FPS)
Peak GPU NVENC engine utilization
Encoding latency per frame
Impact of resolution and bitrate on throughput
Behavior across different presets and rate control modes

Only NVENC hardware encoding is considered; software encoders or decode steps are excluded.

Benchmark Setup and Tools

Hardware Requirements:

NVIDIA GPU with NVENC support (Turing and newer preferred)
Driver version → 450.xx
Sufficient VRAM for 4K workloads
Dedicated system with no GUI rendering load

Software Stack:

FFmpeg compiled with-- enable-nvenc
CUDA Toolkit (for monitoring with nvprof, nvidia-smi)
Raw YUV test videos (e.g., NV12 format)

Test Video Preparation

Consistent benchmarking requires the use of standardized raw input files, typically in NV12 or YUV420p format. Synthetic test videos can be generated with FFmpeg, ensuring uniform content and frame counts for repeatable tests. Converting videos to NV12 ensures compatibility with NVENC and removes variability from source material.

code

ffmpeg -f lavfi -i testsrc=size=1920x1080:rate=30 -frames:v 300 -pix_fmt yuv420p test.yuv

Explanation:

-f lavfi -i testsrc: Generates a color pattern test video.
-size=1920x1080: Target resolution.
-rate=30: Frame rate.
-frames:v 300: Total number of frames.
-pix_fmt yuv420p: Ensures format compatibility.

Convert to NV12 if required:

code

ffmpeg -i test.yuv -pix_fmt nv12 test_nv12.yuv

Explanation:

-pix_fmt nv12: Converts video to NVENC-compatible pixel format.
Required because NVENC expects NV12 (planar Y + interleaved UV).

FFmpeg Encoding Benchmark Command

This is a command-line example for encoding raw video input with FFmpeg using NVENC, specifying parameters like pixel format, resolution, preset, rate control, and bitrate. Output is discarded to null to eliminate disk I/O as a bottleneck, ensuring that only encoding performance is measured. Tests should be repeated for different resolutions and presets to fully characterize GPU performance

code

ffmpeg -f rawvideo -pix_fmt nv12 -s:v 1920x1080 -r 30 -i test_nv12.yuv \-c:v h264_nvenc -preset p1 -rc cbr -b:v 4M -f null -

Explanation:

-f rawvideo: Indicates raw uncompressed input.
-pix_fmt nv12: Specifies NVENC-compatible pixel format.
-s:v 1920x1080: Sets frame size.
-r 30: Input frame rate.
-i test_nv12.yuv: Input file path.
-c:v h264_nvenc: Selects NVENC encoder for H.264 output.
-preset p1: Uses the fastest preset (p1 = maximum throughput, p7 = best quality).
-rc cbr: Enables constant bitrate mode.
-b:v 4M: Sets average target bitrate to 4 Mbps.
-f null -: Discards output to null sink; avoids disk I/O interference

Metrics Collection

During encoding, FFmpeg reports the average frames per second, which serves as the primary throughput metric. NVENC engine utilization can be monitored using nvidia-smi dmon to allow the assessment of hardware saturation and efficiency. These metrics together provide a comprehensive view of encoding performance under various settings.

a. Frames Per Second (FPS)

FFmpeg will print the average speed after encoding:

code

frame= 300 fps= 870 q=31.0 Lsize= ...

Explanation:

fps=850: Approximate frames encoded per second.
speed=28.3x: Multiplier of real-time (30 fps ?? 28 = ~840 fps).

b. NVENC Utilization

Use:

code

nvidia-smi dmon -s u

Explanation:

-s u: Enables encoder (NVENC) and decoder (NVDEC) utilization sampling.
Reports % utilization of the encoder unit over time.

Testing Multiple Presets and Bitrates

To assess performance under different encoder workloads, vary:

Presets: p1 (fastest) to p7 (highest quality)
Bitrates: 2M, 4M, 8M, 16M
Resolutions: 720p, 1080p, 4K

Example matrix:

GPU	Resolution	Preset	Bitrate	FPS	NVENC Util
RTX 3060	1080p	p1	4M	850	95%
RTX 3060	1080p	p7	4M	400	90%
RTX 3060	4k	p1	8M	300	98%

Comparing Multi-Stream Encoding

The capacity for simultaneous encoding is tested by running multiple FFmpeg processes in parallel, each encoding a separate stream. Observing FPS drop-off and NVENC utilization across streams reveals the GPU"s multi-session limits and helps identify potential bottlenecks in parallel workloads. Note that consumer GPUs have hardware-imposed limits on concurrent NVENC sessions.

code

ffmpeg -re -stream_loop -1 -f rawvideo -pix_fmt nv12 -s 1920x1080 -i test_nv12.yuv \-c:v h264_nvenc -gpu 0 -preset p1 -b:v 4M -f null -

Explanation:

-re: Simulates real-time input.
-stream_loop -1: Loops the input infinitely.
-gpu 0: Binds encoding to a specific GPU.
Launch 2"4 instances and monitor FPS drop and nvidia-smi metrics.

Encoding Latency Measurement (SDK-only)

When using the Video Codec SDK directly, latency per frame can be measured by timestamping before and after each EncodeFrame call. Averaging these measurements over many frames quantifies real-time responsiveness, which is crucial for live streaming and low-latency applications.

code

auto t_start = std::chrono::high_resolution_clock::now();
encoder.EncodeFrame(...);
auto t_end = std::chrono::high_resolution_clock::now();
auto latency_ms = std::chrono::duration<double, std::milli>(t_end - t_start).count();

Explanation:

t_start and t_end: Mark timestamps before and after frame encoding.
latency_ms: Time taken to encode a single frame in milliseconds.

Recommendations for Consistent Benchmarking

Disable desktop compositors (e.g., X server)
Pin CPU cores to FFmpeg with taskset
Set GPU to persistent mode: nvidia-smi -pm 1
Repeat each test 3"5 times and average the results
Use a dedicated SSD or RAMDISK to avoid I/O bottlenecks if outputting to disk

Benchmarking NVIDIA GPUs for Video Encoding

Benchmark Goals and Scope

Benchmark Setup and Tools

Hardware Requirements:

Software Stack:

Test Video Preparation

FFmpeg Encoding Benchmark Command

Metrics Collection

a. Frames Per Second (FPS)

b. NVENC Utilization

Testing Multiple Presets and Bitrates

Comparing Multi-Stream Encoding

Encoding Latency Measurement (SDK-only)

Recommendations for Consistent Benchmarking

Was this article helpful?