Technical Blog

Content Creation Demystified: Open Source to the Rescue

CableLabs Admin

Jun 9, 2015

There are four main concepts involved in securing premium audio/visual content:

  1. Authentication
  2. Authorization
  3. Encryption
  4. Digital Rights Management (DRM)

Authentication is the act of verifying the identity of the individual requesting playback (e.g. username/password). Authorization involves ensuring that individual’s right to view that content (e.g. subscription, rental). Encryption is the scrambling of audio/video samples; decryption keys (licenses) must be acquired to enable successful playback on a device. Finally, DRM systems are responsible for the secure exchange of encryption keys and the association of various “rights” with those keys. Digital rights may include things such as:

  • Limited time period (expiration)
  • Supported playback devices
  • Output restrictions (e.g. HDMI, HDCP)
  • Ability to make copies of downloaded content
  • Limited number of simultaneous views

This article will focus on the topics of encryption and DRM and describe how open source software can be used to create protected, adaptive bitrate test content for use in evaluating web browsers and player applications. I will close this article by describing the steps for how protective streams can be created.

One Encryption to Rule Them All

Encryption is the process of applying a reversible mathematical operation to a block of data to disguise its contents. Numerous encryption algorithms have been developed, each with varying levels of security, inputs and outputs, and computational complexity. In most systems, a random sequence of bytes (the key) is combined with the original data (the plaintext) using the encryption algorithm to produce the scrambled version of the data (the ciphertext). In symmetric encryption systems, the same key used to encrypt the data is also used for decryption. In asymmetric systems, keys come in pairs; one key is used to encrypt or decrypt data while its mathematically related sibling is used to perform the inverse operation. For the purposes of protecting digital media, we will be focusing solely on symmetric key systems.

With so many encryption methods to choose from, one could reasonably expect that web browsers would only support a small subset. If the sets of algorithms implemented in different browsers do not overlap, multiple copies of the media would have to be stored on streaming servers to ensure successful playback on most devices. This is where the ISO Common Encryption standards come in. These specifications identify a single encryption algorithm (AES-128) and two block chaining modes (CTR, CBC) that all encryption and decryption engines must support. Metadata in the stream describes what keys and initialization vectors are required to decrypt media sample data. Common Encryption also supports the concept of “subsample encryption”. In this situation, unencrypted samples are interspersed within the total set of samples and Common Encryption metadata is defined to describe the byte offsets where encrypted samples begin and end.

In addition to encryption-based metadata, Common Encryption defines the Protection Specific System Header (PSSH). This bit of metadata is an ISOBMFF box structure that contains data proprietary to a particular DRM that will guide the DRM system in retrieving the keys that are needed to decrypt the media samples. Each PSSH box contains a “system ID” field that uniquely identifies the DRM system to which the contained data applies. Multiple PSSH boxes may appear in a media file indicating support for multiple DRM systems. Hence, the magic of Common Encryption; a standard encryption scheme and multi-DRM support for retrieval of decryption keys all within a single copy of the media.

The main Common Encryption standard (ISO 23001-7) specifies metadata for use in ISO BMFF media containers. ISO 23001-9 defines Common Encryption for the MPEG Transport Stream container. In many cases, the box structures defined in the main specification are simply inserted into MPEG TS packets in an effort avoid introducing completely new structures that essentially hold the same data.

Creating Protected Adaptive Bitrate Content

CableLabs has developed a process for creating encrypted MPEG-DASH adaptive bitrate media involving some custom software combined with existing open source tools. The following sections will introduce the software and go through the process of creating protected streams. These tools and some accompanying documentation are available on GitHub. A copy of the documentation is also hosted here.

EME Content Tools

Adaptive Bitrate Transcoding

The first step in the process is to transcode source media files into several, lower bitrate versions. This can be simply reducing the bitrate, but in most cases the resolution should be lowered. To accomplish this, we use the popular FFMpeg suite of utilities. FFMpeg is a multi-purpose audio/video recorder, converter, and streaming library with dozens of supported formats. An FFMpeg installation will need to have the x264 and fdk_aac codec libraries enabled. If the appropriate binaries are not available, it can be built .

CableLabs has provided an example script that can be used as a guide to generating multi-bitrate content. There are some important items in this script that should be noted.

One of the jobs of an ABR packager is to split the source stream up into “segments.” These segments are usually between 2 and 10 seconds in duration and are in frame-accurate alignment across all the bitrate representations of media. For bitrate switching to appear seamless to the user, the player must be able to switch between the different bitrate streams at any segment boundary and be assured that the video decoder will be able to accept the data. To ensure that the packager can split the stream at regular boundaries, we need to make sure that our transcoder is inserting I-Frames (video frames that have no dependencies on other frames) at regular intervals during the transcoding process. The following arguments to x264 in the script accomplish that task:

[highlighter line=0]

-x264opts "keyint=$framerate:min-keyint=$framerate:no-scenecut"

We use the framerate detected in the source media to instruct the encoder to insert new I-Frames at least once ever second. Assuming our packager will segment using an integral number of seconds, the stream will be properly conditioned. The no-scenecut argument tells the encoder not to insert random I-Frames when it detects a scene change in the source material. We detect the framerate of the source video using the ffprobe utility that is part of FFMpeg.

framerate=$((`./ffprobe $1 -select_streams v -show_entries stream=avg_frame_rate -v quiet -of csv="p=0"`))

At the bottom of the script, we see the commands that perform the transcoding using bitrates, resolutions, and codec profile/level selections that we require.

transcode_video "360k" "512:288" "main" 30 $2 $1
transcode_video "620k" "704:396" "main" 30 $2 $1
transcode_video "1340k" "896:504" "high" 31 $2 $1
transcode_video "2500k" "1280:720" "high" 32 $2 $1
transcode_video "4500k" "1920:1080" "high" 40 $2 $1

transcode_audio "128k" $2 $1
transcode_audio "192k" $2 $1

For example, the first video representation is 360kb/s with a resolution of 512x288 pixels using AVC Main Profile, Level 3.0. The other thing to note is that the script transcodes audio and video separately. This is due to the fact that the DASH-IF guidelines forbid multiplexed audio/video segements (see section 3.2.1 of the DASH-IF Interoperability Points v3.0 for DASH AVC/264).


Next, we must encrypt the video and/or audio representation files that we created. For this, we use the MP4Box utility from GPAC. MP4Box is perfect for this task because it supports the Common Encryption standard and is highly customizable. Not only will perform AES-128 CTR or CBC mode encryption, but it can do subsample encryption and insert multiple, user-specified PSSH boxes into the output media file.

To configure MP4Box for performing Common Encryption, the user creates a “cryptfile”. The cryptfile is an XML-based description of the encryption parameters and PSSH boxes. Here is an example cryptfile:

The top-level <GPACDRM> element indicates that we are performing AES-128 CTR mode Common Encryption. The <DRMInfo> elements describe the PSSH boxes we would like to include. Finally, a single <CrypTrack> elements is specified for each track in the source media we would like to encrypt. Multiple encryption keys for each track may be specified, in which case the number of samples to encrypt would be indicated with a single key before moving to the next key in the list .

Since PSSH boxes are really just containers for arbitrary data, MP4Box has defined a set of XML elements specifically to define bitstreams in the <DRMInfo> nodes. Please see the MP4Box site for a complete description of the bitstream description syntax.

Once the cryptfile has been generated, you simply pass it as an argument on the command line to MP4Box. We have created a simple script to help encrypt multiple files at once (since you may most likely be encrypting each bitrate representation file for your ABR media).

Creating MP4Box Cryptfiles with DRM Support

In any secure system, the keys necessary to decrypt protected media will most certainly be kept separate from the media itself. It is the DRM system’s responsibility to retrieve the decryption keys and any rights afforded to the user for that content by the content owner. If we wish to encrypt our own content, we will also need to ensure that the encryption keys are available on a per-DRM basis for retrieval when the content is played back.

CableLabs has developed custom software to generate MP4Box cryptfiles and ensure that the required keys are available on one or more license servers. The software is written in Java and can be run on any platform that supports a Java runtime. A simple Apache Ant buildfile is provided for compiling the software and generating executable JAR files. Our tools currently support the Google Widevine and Microsoft PlayReady DRM systems with a couple of license server choices for each. The first Adobe Access CDM is just now being released in Firefox and we expect to update the tools in the coming months to support Adobe DRM. Support for W3C ClearKey encryption is also available, but we will focus on the commercial DRM systems for the purposes of this article.

The base library for the software is CryptfileBuilder. This set of Java classes provides abstractions to facilitate the construction and output of MP4Box cryptfiles. All other modules in the toolset are dependent upon this library. Each DRM-specific tool has detailed documentation available on the command line (-h arg) and on our website.

Microsoft PlayReady Test Server

The code in our PlayReady software library provides 2 main pieces of functionality:

  1. PlayReady PSSH generator for MP4Box cryptfiles
  2. Encryption key generator for use with the Microsoft PlayReady test license server.

Instead of allowing clients to ingest their own keys, the license server uses an algorithm based on a “key seed” and a 128-bit key ID to derive decryption keys. The algorithm definition can be found in this document from Microsoft (in the section titled “Content Key Algorithm”). Using this algorithm, the key seed used by the test server, and a key ID of our choosing, we can derive the content key that will be returned by the server during playback of the content.

Widevine License Portal

Similar to PlayReady, our Widevine toolset provides a PSSH generator for the Widevine DRM system. Widevine, however, does not provide a generic test server like the one from Microsoft. Users will need to contact Widevine to get their own license portal on their servers. With that portal, you will get a signing key and initialization vector for signing your requests. You provide this information as input to the Widevine cryptfile generator.

The Widevine license server will generate encryption keys and key IDs based on a given “content ID” and media type (e.g. HD, SD, AUDIO, etc). Their API has been updated since our tools were developed and they now support ingest of “foreign keys” that our tool could generate itself, but we don’t currently support that.


The real power of Common Encryption is made apparent when you add support for multiple DRM systems in a single piece of content. With the license servers used in our previous examples, this was not possible because we were not able to select our own encryption keys (as explained earlier, Widevine has added support for “foreign keys”, but our tools have not been updated to make use of them). With that in mind, a new licensing system is required to provide the functionality we seek.

CableLabs has partnered with CastLabs to integrate support for their DRMToday multi-DRM licensing service in our content creation tools. DRMToday provides a full suite of content protection services including encryption and packaging. For our needs, we only rely on the multi-DRM licensing server capabilities. DRMToday provides a REST API that our software uses to ingest Common Encryption keys into their system for later retrieval by one of the supported DRM systems.

MPEG-DASH Segmenting and Packaging

The final step in the process is to segment our encrypted media files and generate a MPEG-DASH manifest (.mpd). For this, we once again use MP4Box, but this time we use the -dash argument. There are many options in MP4Box for segmenting media files, so please run MP4Box -h dash to see the full list of configurations.

For the purposes of this article, we will focus on generating content that meets the requirements of the DASH-IF DASH-AVC264 “OnDemand” profile. Our repository contains a helper script that will take a set of MP4 files and generate DASH content according to the DASH-IF guidelines. Run this script to generate your segmented media files and produce your manifest.

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".

Technical Blog

Adaptive Bitrate and MPEG-DASH

CableLabs Admin

May 26, 2015

The basic solution that streaming video provides is that an entire video file does not need to be downloaded before it is viewed. Little chunks of media may be grabbed for playback in order to achieve the same effect. The one caveat is that the media must be received as fast of the player consumes it. Furthermore, all things being equal, the higher the quality of a particular piece of content, the more bytes it occupies on disk. So it holds true that to stream higher quality video a faster network connection is required. But all things are not equal when it comes to network speeds seen by the multitude of connected devices in the world today.

The introduction of Adaptive Bitrate (ABR) streaming technology has revolutionized the delivery of streaming media. There are two fundamental ideas behind ABR:

  1. Multiple copies of the content at various quality levels are stored on the server.
  2. The client device detects its current network conditions and requests lower quality content when network speeds are slower and higher quality content when network speeds are faster.

These principles are quite simple, but there are many technical challenges involved in producing a functional design for an ABR system. First, the media segments on the server must be created in such a way that the client application is allowed to switch between higher and lower quality versions at any time without seeing a disruptive change in the presentation (e.g. video “jumps” or audio “clicks, pops”). Second, there must be a way for the client to “discover” the characteristics of the ABR content so that it knows what sorts of quality choices are available. And finally, the client itself must be implemented so that it can smartly detect network speed changes and potentially switch to a different quality stream.

Today, a large portion of video streaming over the internet is using one of several ABR formats. Apple’s HTTP Live Streaming (HLS) and MPEG’s Dynamic Adaptive Streaming over HTTP (DASH) are the predominant technologies. Adobe’s HTTP Dynamic Streaming (HDS) and Microsoft’s Smooth Streaming (MSS) were once quite popular, but have fallen out of favor recently. As you can see, most ABR technologies rely on HTTP as the network protocol for serving and accessing data due to the near ubiquitous support in servers and clients. All ABR technologies specify some sort of descriptive file or “manifest” which describes the locations, quality, and types of content available to the client.

Of all the ABR formats described previously, only MPEG-DASH was developed through an open, standards-based process in an effort to incorporate input from all industries and organizations that plan to deploy it. CableLabs, once again, played a critical role in representing the needs of our members during this process. We will focus on the details of MPEG-DASH technology for the remainder of the article.


The MPEG-DASH specification (ISO/IEC 23009-1) was first published in early 2012 and has undergone several updates since. As in other formats, media segments are stored on a standard web server and downloaded using the HTTP or HTTPS protocols. While DASH is audio/video codec agnostic, there are profiles in the specification that indicate how media is to be segmented on the server for ISOBMFF and MPEG2 Transport Stream container formats. Additionally, both live and on-demand media types have been given special consideration.

The DASH manifest file is known as a Media Presentation Description, or MPD. It is XML-based and contains all the information necessary for the client to download and present a given piece of content.


The root element in the manifest is named MPD. This contains high-level information about the content as a whole.   MPDs can be “static” or “dynamic”. A static MPD is what would be used for a typical on-demand movie. The client can parse the manifest once and expect to have all the information it needs to present the content in its entirety. A “dynamic” MPD indicates that the contents of the manifest may change over time, such as would be expected for live or interactive content. For dynamic manifests, the MPD node indicates the maximum time the client should wait before it requests a new copy.

Within the root MPD element is one or more Period elements. A Period represents a window of time in which media is expected to be presented. A Period can reference an absolute point in time, as would be the case for live media. Alternatively, it can simply indicate duration for the media items contained within it. When multiple Periods are present in an MPD, it is not necessary to specify a start time for each Period in order for them to be played in the sequence that they appear. Periods may even appear in a manifest prior to its associated media segments being installed on the server. This allows clients to prepare themselves for upcoming presentations.

Each Period contains one or more AdaptationSet elements. An AdaptationSet describes a single media element available for selection in the presentation. There may be one AdaptationSet for HD video and another one for SD video. Another reason to use multiple AdaptationSets is when there are multiple copies of the media that use different video codecs, which would enable playback on clients that only support one codec or the other. Additionally, clients may want to be able to select between multiple language audio tracks or between multiple video viewpoints. Each AdaptationSet contains attributes that allow the client to identify the type and format of media available in the set so that it can make appropriate choices regarding which to present.

[highlighter line=0]


At the next level of the MPD is the Representation. Every AdaptationSet contains a Representation element for each quality-level (bandwidth value) of media available for selection. Different video resolutions and bitrates may be available for selection and the Representation element tells the client exactly how to find media segments for that quality level. There exists several different mechanisms to describe the exact duration and name of each media file in the Representation (SegmentTemplate, SegmentTimeline, etc.), but we won’t dive into that level of detail in this article.

Content Protection

Since this blog series is focused on the topic of premium subscription content that is required to be protected, I want to briefly discuss the features incorporated in the DASH specifications that describe support for encrypted media.

There are actually four separate documents that make up the DASH specification set. Part 1 (ISO/IEC 23009-1) is the base DASH specification.   Part 2 (ISO/IEC 23009-2) describes the requirements for conformance software to validate the specification. Part 3 (ISO/IEC 23009-3) provides guidelines for implementing the DASH spec. Finally, part 4 (ISO/IEC 23009-4) describes content protection for segment encryption and authentication. In segment encryption, the entire segment file is encrypted and hashed so that its contents can be protected and its integrity validated. This is different than the “sample encryption” that is the focus of this blog series. For sample encryption, only audio and video sample information is encrypted, leaving container and codec metadata “in the clear”.

The ContentProtection element indicates that one or more media components are protected in some way. ContentProtection elements can be specified at either the Representation or AdaptationSet level in the MPD. The schemeIdUri attribute of the ContentProtection element uniquely identifies the particular system used to protect the content. I will dive further into the usage of this element when we cover CommonEncryption in Part 4 of the blog series.





One of the drawbacks that come with MPEG-DASH being developed in an open standards body is the relative complexity of the final product. Individuals and organizations from around the globe have contributed their own requirements to the standardization effort. This leads to a specification that covers a vast array of use-cases and deployment scenarios. When it comes to producing an actual implementation, however, the standard becomes somewhat unwieldy. Several new standards bodies have now emerged to whittle down the base DASH spec to a subset that meets the needs of its members and reduce the complexity to the point that functioning workflows can actually be developed.

One such standards body is the DASH Industry Forum, or DASH-IF.    What started out as marketing and promotions group for MPEG-DASH, has grown into a full-blown standards organization with members such as Adobe, Microsoft, Qualcomm, and Samsung. The goal of this group is drive adoption of MPEG-DASH through the development of “interoperability points” that describe manifest restrictions, audio/video codecs, media containers, protection schemes, and more to produce a more implementable subset of the overall DASH spec.

The DASH-IF Interoperability Guidelines for DASH-AVC/264 (latest version is 3.0) sets forth the following requirements and restriction on the base DASH specification:

  • AVC/H.264 Video Codec (Main or High profiles)
  • HE-AACv2 Audio Codec
  • Fragmented ISOBMFF Segments
  • MPEG-DASH “live” and “ondemand” profiles
  • Segments are IDR-aligned across Representations and have specific tolerances with regard to variation in segment durations.
  • CommonEncryption
  • SMPTE-TT Closed Captions

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".


Standards-Based, Premium Content for the Modern Web

CableLabs Admin

May 19, 2015

How Premium Video Content Streams over the Internet
These days, a person is just as likely to be watching a movie on their laptop, tablet, or mobile phone, as they are to be sitting in front of a television. Cable operators are eager to provide premium video content to these types of devices but there are high costs involved in supporting the wide array of devices owned by their customers. A multitude of technological obstacles stand in the way of delivering a secure, high-quality, reliable viewing experience to the small-screen. This four-part blog series describes an open, standards-based approach to providing premium, adaptive bitrate, audio/video content in HTML and how open source software can assist in the evaluation and deployment of these technologies.

Part 1 - Open Web Standards: Encrypted Media Extensions (EME)

HTML5 extension APIs enable JavaScript applications to facilitate encryption key requests between a DRM-specific Content Decryption Model (CDM) embedded in the browser and a remote license server. Using EME, the app may choose between multiple DRM systems available in the browser to meet the requirements of both the content and its legal distribution rights. The mechanism by which each DRM conducts its business is opaque to the application since it simply functions as a proxy for messages to and from the license server.



Part 2 - Streaming Media Formats: Adaptive Bitrate (ABR) Media and MPEG-DASH

In the world of streaming video, a provider’s worst nightmare is the infamous “buffering circle” animation. On slower network connections, devices will not be able to download media segments for high-resolution, high-bitrate video fast enough to prevent the player from buffering. Adaptive Bitrate (ABR) media formats were designed to alleviate this problem by chopping the media up into segments, providing multiple resolutions and bitrates for each segment, and then allowing the client application to choose which segment to download based on current network conditions. Multiple ABR formats exist today, but we will focus on the MPEG open standard, Dynamic Adaptive Streaming over HTTP (DASH). While the main DASH specification enables a vast array of media and manifest choices, several standards bodies (like the DASH Industry Forum) have been established to define subsets of DASH that make it easier to implement and test deployable solutions.



Part 3 - Web Media Playback: dash.js

The W3C Media Source Extensions (MSE) and Encrypted Media Extensions (EME) APIs provide all the tools necessary to play adaptive bitrate, premium video content in modern web browsers. However, we still need sophisticated HTML/JavaScript applications that can make use of these APIs. What began as a reference implementation for the Dash Industry Forum’s interoperability specification, the dash.js open source media player has since been adopted as the basis for several commercial applications. dash.js contains a configurable adaptation rules engine and full support for encrypted content playback using EME on a variety of browsers and operating systems.



Part 4 - Tools for Creating Premium Content: Content Creation with CommonEncryption

The ISO CommonEncryption standard specifies a single encryption mechanism (AES-128) and a limited selection of block modes for protected content. No matter the mechanism used to obtain decryption keys, a media engine that recognizes the CommonEncryption format and has access to the keys can decrypt the media samples contained within. In addition to the cipher algorithm and block mode, the CommonEncryption specifications indicate how DRM-specific data may be carried within the media to assist in the retrieval of decryption keys. CableLabs has developed open-source tools that can be used to create encrypted, MPEG-DASH content for several commercial DRM systems supported by EME browsers today.


Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.