The MP4 container (ISO/IEC 14496-14) is a media file format based on the ISO Base Media File Format (ISO/IFF) that stores video, audio, subtitles, and metadata. It exists in two primary variants, Standard MP4 and Fragmented MP4 (fMP4). Standard MP4 places metadata in a single moov atom located at the end of the file to delay playback initiation until the entire file is downloaded. Hence, this is limiting its suitability for streaming applications.
Fragmented MP4 splits content into independently decodable segments (`moof` + `mdat`) with a separate initialization segment (`init.mp4`). This allows low-latency streaming, adaptive bitrate switching, and live broadcast compatibility.
While Standard MP4 (non-fragmented) suits offline playback with monolithic file organization, Fragmented MP4 (fMP4) adopts a segmented architecture optimized for adaptive streaming.
File Structure and Initialization Behavior
Standard MP4
Standard MP4 stores all metadata in a single moov atom at the file start or end. The mdat atom contains raw media samples in interleaved or contiguous form. Players must download the entire moov before playback to avoid introducing latency for large files.
Example Structure:
ftyp → moov → mdatHex-level moov Header (Simplified):
0000001C 6D6F6F76 // moov atom (28 bytes)Fragmented MP4
fMP4 splits content into segments, each with its moof (Movie Fragment) and mdat atoms. A separate initialization segment (init.mp4) contains the moov atom. This enables progressive playback:
init.mp4 (moov) → seg1.m4s (moof+mdat) → seg2.m4s (moof+mdat)FFmpeg command to generate fMP4:
ffmpeg -i input.mp4 -movflags frag_keyframe+empty_moov -f mp4 output_fragmented.mp4Buffering and Playback Start Time
Standard MP4
Requires full moov download before playback. For files with trailing moov, players issue a secondary HTTP request to fetch metadata, adding round-trip delays.
Fragmented MP4
Playback starts after receiving init.mp4 and the first fragment (moof+mdat). No upfront moov fetch is needed for subsequent segments to reduce startup latency.
Use in Adaptive Bitrate (ABR) Streaming
Standard MP4
Poorly suited for ABR. Single moov and interleaved mdat complicate segment extraction. Players must parse the entire file to isolate quality levels.
Fragmented MP4
Designed for ABR. Each segment is self-contained, and DASH/HLS manifests reference init.mp4 and media segments. The sidx (Segment Index) atom enables byte-range addressing:
# DASH manifest snippet
<Representation bandwidth="1000000" mimeType="video/mp4">
<BaseURL>video_1M/</BaseURL>
<SegmentBase indexRange="0-147">
<Initialization sourceURL="init.mp4"/>
</SegmentBase>
</Representation>
Explanation:
- Defines a DASH Representation element for a video stream with a 1 Mbps bitrate.
- Specifies mimeType="video/mp4" to indicate the media format.
- Uses <BaseURL> to point to the segment directory video_1M/.
- Includes an <SegmentBase> element with an indexRange="0-147" for byte-range addressing.
Segment Addressability and CDN Behavior
Standard MP4
Range requests inefficiently fetch non-aligned data. CDNs struggle to cache partial content due to interleaved mdat.
Fragmented MP4
Supports byte-range fetches per fragment. The sidx atom maps segment boundaries for efficient CDN caching:
# Pseudocode: Parse sidx for segment offsets
sidx_box = parse_box(buffer)
for entry in sidx_box.entries:
print(f"Segment {entry.reference_id} starts at {entry.offset}")
Explanation:
- Parses the sidx box from a media file buffer using parse_box.
- Iterates through each entry in sidx_box.entries.
- Prints the starting offset of each segment using the reference_id and offset values.
Container Overhead and Compatibility
Standard MP4
Universally supported but inefficient for streaming. Muxing requires rewriting moov for edits.
Fragmented MP4
Lower overhead for live workflows. Safari requires CMAF-compliant fMP4 for HLS. FFmpeg flags like -movflags frag_keyframe employ streaming-compatible output.
Use in Live Streaming Workflows
Standard MP4
Unsuitable for live. The monolithic structure prevents incremental updates.
Fragmented MP4
Encoders append segments sequentially. A live HLS manifest references new moof+mdat pairs:
# Live HLS manifest
#EXTM3U
#EXT-X-MEDIA-SEQUENCE:123
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
seg123.m4s
#EXTINF:4.000,
seg124.m4sExplanation:
- Starts with #EXTM3U, indicating the beginning of an HLS playlist.
- #EXT-X-MEDIA-SEQUENCE:123 sets the sequence number of the first media segment to 123.
- #EXT-X-TARGETDURATION:4 defines the maximum segment duration as 4 seconds.
- #EXTINF:4.000, declares that the following segment has a duration of 4 seconds.
- Lists 2 media segments: seg123.m4s and seg124.m4s, each corresponding to 4-second chunks of video.
Comparison Table
| Aspect | Standard MP4 | Fragmented MP4 (fMP4) |
| Structure | Single moov + mdat | init.mp4 (moov) + multiple moof + mdat segments |
| Playback Start | Requires full moov download before playback | Starts after init.mp4 and the first segment; lower startup latency |
| Adaptive Bitrate (ABR) | Not ABR-friendly; Hard to Extract Segments | Optimized for ABR; segments are self-contained and referenced in DASH/HLS |
| CDN Efficiency | Poor Byte-Range Caching Due to Interleaved Data | Efficient byte-range addressing using sidx atom |
| Live Streaming | Unsuitable; Monolithic Structure Inhibits Real-Time Use | Designed for Live; Supports Sequential Segment Appending and HLS Updates |
| Tooling & Compatibility | Widely supported; High Overhead for Edits | Required for CMAF/HLS (e.g., Safari); FFmpeg supports with -movflags. |
