Comments
Technical Blog

Content Creation Demystified: Open Source to the Rescue

Jun 9, 2015

There are four main concepts involved in securing premium audio/visual content:

  1. Authentication
  2. Authorization
  3. Encryption
  4. Digital Rights Management (DRM)

Authentication is the act of verifying the identity of the individual requesting playback (e.g. username/password). Authorization involves ensuring that individual’s right to view that content (e.g. subscription, rental). Encryption is the scrambling of audio/video samples; decryption keys (licenses) must be acquired to enable successful playback on a device. Finally, DRM systems are responsible for the secure exchange of encryption keys and the association of various “rights” with those keys. Digital rights may include things such as:

  • Limited time period (expiration)
  • Supported playback devices
  • Output restrictions (e.g. HDMI, HDCP)
  • Ability to make copies of downloaded content
  • Limited number of simultaneous views

This article will focus on the topics of encryption and DRM and describe how open source software can be used to create protected, adaptive bitrate test content for use in evaluating web browsers and player applications. I will close this article by describing the steps for how protective streams can be created.

One Encryption to Rule Them All

Encryption is the process of applying a reversible mathematical operation to a block of data to disguise its contents. Numerous encryption algorithms have been developed, each with varying levels of security, inputs and outputs, and computational complexity. In most systems, a random sequence of bytes (the key) is combined with the original data (the plaintext) using the encryption algorithm to produce the scrambled version of the data (the ciphertext). In symmetric encryption systems, the same key used to encrypt the data is also used for decryption. In asymmetric systems, keys come in pairs; one key is used to encrypt or decrypt data while its mathematically related sibling is used to perform the inverse operation. For the purposes of protecting digital media, we will be focusing solely on symmetric key systems.

With so many encryption methods to choose from, one could reasonably expect that web browsers would only support a small subset. If the sets of algorithms implemented in different browsers do not overlap, multiple copies of the media would have to be stored on streaming servers to ensure successful playback on most devices. This is where the ISO Common Encryption standards come in. These specifications identify a single encryption algorithm (AES-128) and two block chaining modes (CTR, CBC) that all encryption and decryption engines must support. Metadata in the stream describes what keys and initialization vectors are required to decrypt media sample data. Common Encryption also supports the concept of “subsample encryption”. In this situation, unencrypted samples are interspersed within the total set of samples and Common Encryption metadata is defined to describe the byte offsets where encrypted samples begin and end.

In addition to encryption-based metadata, Common Encryption defines the Protection Specific System Header (PSSH). This bit of metadata is an ISOBMFF box structure that contains data proprietary to a particular DRM that will guide the DRM system in retrieving the keys that are needed to decrypt the media samples. Each PSSH box contains a “system ID” field that uniquely identifies the DRM system to which the contained data applies. Multiple PSSH boxes may appear in a media file indicating support for multiple DRM systems. Hence, the magic of Common Encryption; a standard encryption scheme and multi-DRM support for retrieval of decryption keys all within a single copy of the media.

The main Common Encryption standard (ISO 23001-7) specifies metadata for use in ISO BMFF media containers. ISO 23001-9 defines Common Encryption for the MPEG Transport Stream container. In many cases, the box structures defined in the main specification are simply inserted into MPEG TS packets in an effort avoid introducing completely new structures that essentially hold the same data.

Creating Protected Adaptive Bitrate Content

CableLabs has developed a process for creating encrypted MPEG-DASH adaptive bitrate media involving some custom software combined with existing open source tools. The following sections will introduce the software and go through the process of creating protected streams. These tools and some accompanying documentation are available on GitHub. A copy of the documentation is also hosted here.

EME Content Tools

Adaptive Bitrate Transcoding

The first step in the process is to transcode source media files into several, lower bitrate versions. This can be simply reducing the bitrate, but in most cases the resolution should be lowered. To accomplish this, we use the popular FFMpeg suite of utilities. FFMpeg is a multi-purpose audio/video recorder, converter, and streaming library with dozens of supported formats. An FFMpeg installation will need to have the x264 and fdk_aac codec libraries enabled. If the appropriate binaries are not available, it can be built .

CableLabs has provided an example script that can be used as a guide to generating multi-bitrate content. There are some important items in this script that should be noted.

One of the jobs of an ABR packager is to split the source stream up into “segments.” These segments are usually between 2 and 10 seconds in duration and are in frame-accurate alignment across all the bitrate representations of media. For bitrate switching to appear seamless to the user, the player must be able to switch between the different bitrate streams at any segment boundary and be assured that the video decoder will be able to accept the data. To ensure that the packager can split the stream at regular boundaries, we need to make sure that our transcoder is inserting I-Frames (video frames that have no dependencies on other frames) at regular intervals during the transcoding process. The following arguments to x264 in the script accomplish that task:

[highlighter line=0]

-x264opts "keyint=$framerate:min-keyint=$framerate:no-scenecut"

We use the framerate detected in the source media to instruct the encoder to insert new I-Frames at least once ever second. Assuming our packager will segment using an integral number of seconds, the stream will be properly conditioned. The no-scenecut argument tells the encoder not to insert random I-Frames when it detects a scene change in the source material. We detect the framerate of the source video using the ffprobe utility that is part of FFMpeg.

framerate=$((`./ffprobe $1 -select_streams v -show_entries stream=avg_frame_rate -v quiet -of csv="p=0"`))

At the bottom of the script, we see the commands that perform the transcoding using bitrates, resolutions, and codec profile/level selections that we require.

transcode_video "360k" "512:288" "main" 30 $2 $1
transcode_video "620k" "704:396" "main" 30 $2 $1
transcode_video "1340k" "896:504" "high" 31 $2 $1
transcode_video "2500k" "1280:720" "high" 32 $2 $1
transcode_video "4500k" "1920:1080" "high" 40 $2 $1

transcode_audio "128k" $2 $1
transcode_audio "192k" $2 $1

For example, the first video representation is 360kb/s with a resolution of 512x288 pixels using AVC Main Profile, Level 3.0. The other thing to note is that the script transcodes audio and video separately. This is due to the fact that the DASH-IF guidelines forbid multiplexed audio/video segements (see section 3.2.1 of the DASH-IF Interoperability Points v3.0 for DASH AVC/264).

Encryption

Next, we must encrypt the video and/or audio representation files that we created. For this, we use the MP4Box utility from GPAC. MP4Box is perfect for this task because it supports the Common Encryption standard and is highly customizable. Not only will perform AES-128 CTR or CBC mode encryption, but it can do subsample encryption and insert multiple, user-specified PSSH boxes into the output media file.

To configure MP4Box for performing Common Encryption, the user creates a “cryptfile”. The cryptfile is an XML-based description of the encryption parameters and PSSH boxes. Here is an example cryptfile:


The top-level <GPACDRM> element indicates that we are performing AES-128 CTR mode Common Encryption. The <DRMInfo> elements describe the PSSH boxes we would like to include. Finally, a single <CrypTrack> elements is specified for each track in the source media we would like to encrypt. Multiple encryption keys for each track may be specified, in which case the number of samples to encrypt would be indicated with a single key before moving to the next key in the list .

Since PSSH boxes are really just containers for arbitrary data, MP4Box has defined a set of XML elements specifically to define bitstreams in the <DRMInfo> nodes. Please see the MP4Box site for a complete description of the bitstream description syntax.

Once the cryptfile has been generated, you simply pass it as an argument on the command line to MP4Box. We have created a simple script to help encrypt multiple files at once (since you may most likely be encrypting each bitrate representation file for your ABR media).

Creating MP4Box Cryptfiles with DRM Support

In any secure system, the keys necessary to decrypt protected media will most certainly be kept separate from the media itself. It is the DRM system’s responsibility to retrieve the decryption keys and any rights afforded to the user for that content by the content owner. If we wish to encrypt our own content, we will also need to ensure that the encryption keys are available on a per-DRM basis for retrieval when the content is played back.

CableLabs has developed custom software to generate MP4Box cryptfiles and ensure that the required keys are available on one or more license servers. The software is written in Java and can be run on any platform that supports a Java runtime. A simple Apache Ant buildfile is provided for compiling the software and generating executable JAR files. Our tools currently support the Google Widevine and Microsoft PlayReady DRM systems with a couple of license server choices for each. The first Adobe Access CDM is just now being released in Firefox and we expect to update the tools in the coming months to support Adobe DRM. Support for W3C ClearKey encryption is also available, but we will focus on the commercial DRM systems for the purposes of this article.

The base library for the software is CryptfileBuilder. This set of Java classes provides abstractions to facilitate the construction and output of MP4Box cryptfiles. All other modules in the toolset are dependent upon this library. Each DRM-specific tool has detailed documentation available on the command line (-h arg) and on our website.

Microsoft PlayReady Test Server

The code in our PlayReady software library provides 2 main pieces of functionality:

  1. PlayReady PSSH generator for MP4Box cryptfiles
  2. Encryption key generator for use with the Microsoft PlayReady test license server.

Instead of allowing clients to ingest their own keys, the license server uses an algorithm based on a “key seed” and a 128-bit key ID to derive decryption keys. The algorithm definition can be found in this document from Microsoft (in the section titled “Content Key Algorithm”). Using this algorithm, the key seed used by the test server, and a key ID of our choosing, we can derive the content key that will be returned by the server during playback of the content.

Widevine License Portal

Similar to PlayReady, our Widevine toolset provides a PSSH generator for the Widevine DRM system. Widevine, however, does not provide a generic test server like the one from Microsoft. Users will need to contact Widevine to get their own license portal on their servers. With that portal, you will get a signing key and initialization vector for signing your requests. You provide this information as input to the Widevine cryptfile generator.

The Widevine license server will generate encryption keys and key IDs based on a given “content ID” and media type (e.g. HD, SD, AUDIO, etc). Their API has been updated since our tools were developed and they now support ingest of “foreign keys” that our tool could generate itself, but we don’t currently support that.

DRMToday

The real power of Common Encryption is made apparent when you add support for multiple DRM systems in a single piece of content. With the license servers used in our previous examples, this was not possible because we were not able to select our own encryption keys (as explained earlier, Widevine has added support for “foreign keys”, but our tools have not been updated to make use of them). With that in mind, a new licensing system is required to provide the functionality we seek.

CableLabs has partnered with CastLabs to integrate support for their DRMToday multi-DRM licensing service in our content creation tools. DRMToday provides a full suite of content protection services including encryption and packaging. For our needs, we only rely on the multi-DRM licensing server capabilities. DRMToday provides a REST API that our software uses to ingest Common Encryption keys into their system for later retrieval by one of the supported DRM systems.

MPEG-DASH Segmenting and Packaging

The final step in the process is to segment our encrypted media files and generate a MPEG-DASH manifest (.mpd). For this, we once again use MP4Box, but this time we use the -dash argument. There are many options in MP4Box for segmenting media files, so please run MP4Box -h dash to see the full list of configurations.

For the purposes of this article, we will focus on generating content that meets the requirements of the DASH-IF DASH-AVC264 “OnDemand” profile. Our repository contains a helper script that will take a set of MP4 files and generate DASH content according to the DASH-IF guidelines. Run this script to generate your segmented media files and produce your manifest.

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".

Comments
Technical Blog

Encrypted Media Extensions Provide a Common Ground

May 19, 2015

Background
Carriage agreements between content owners and Multichannel Video Programming Distributors (MVPDs) are likely to contain clauses that require the MVPD to provide ample protection against content theft. In the traditional, QAM-based delivery model of cable television networks, the desired level of protection was relatively easy to implement and manage due to the fact that the operator controlled all parts of the ecosystem. Tight integration of the headend, network, and client set-top-boxes with an off-the-shelf or homegrown Conditional Access System (CAS) provided the protection that was needed.

In the age of internet video, the client playback device is typically owned by the consumer and utilizes a variety of operating systems and hardware configurations. In the beginning, companies like Adobe and Microsoft developed native applications to support decryption and rendering of premium video content. These native apps were later converted to web-browser extensions, which enabled HTML-embedded encrypted content. With the advent of adaptive bitrate (ABR) streaming paradigms such as Apple’s HTTP Live Streaming, Microsoft’s Smooth Streaming, and MPEG-DASH, these “black-box” media players left many content distributors wanting even more control over the playback experience. In response, the World Wide Web Consortium (W3C) developed the Media Source Extensions API to allow JavaScript applications to provide individual audio, video, and data media samples to the browser. With this powerful new tool, web applications had the power to decide when and how to switch between various bitrates. While this solved the problem of putting adaptive bitrate control in the hands of the application, it did not provide a secure method of playing encrypted content.

In 2012, W3C began work on standardizing the Encrypted Media Extensions (EME). These new JavaScript APIs allow a web application to facilitate the exchange of decryption keys between a digital rights management (DRM) system embedded in the web browser (the Content Decryption Module or CDM) and a key source or license server located somewhere on the network. CableLabs has played an active role in the W3C working group to ensure the needs of the cable industry are met in the development of the EME specification. The EME APIs have undergone several significant transformations over its three-year history, but we are now seeing some stability in the architecture and browser vendors are beginning to produce some complete and robust implementations.

EME Workflow

The process by which a JavaScript web application utilizes the EME APIs goes something like this:

  1. (OPTIONAL) The browser media engine notifies the app that it has encountered encrypted media samples for which it has no appropriate decryption key.
  2. App requests access to a DRM system available in the browser that supports specific operational and technical requirements associated with the content.
  3. App assigns the selected DRM system to an HTMLMediaElement.
  4. App creates one or more key sessions associated with the selected DRM system, each of which will manage one or more decryption keys (licenses)
    1. The app instructs the key session to generate a license request message by providing it with initialization data. The browser may provide this data by means of the event in Step 1, or it may be acquired by the app through other means (i.e. in an ABR manifest file).
    2. The CDM for the selected DRM system will generate a data blob (license request) and deliver it to the app.
    3. The app sends the license request to a license server.
    4. Upon receiving a response to its license request, the app passes the response message back to the CDM. The CDM adds to the key session any decryption keys contained within the response.
  5. The CDM and/or browser media engine will use keys stored in the key session to decrypt media samples as they are encountered.

EME

Initialization Data

Protected content intended for playback in an EME-enabled web browser must be accompanied with data that instructs a particular DRM implementation how to fetch the licenses required to decrypt it. This may include information such as key IDs, license server URLs, and digital rights assigned to the content. The contents of the initialization data packet are, in most cases, not to be parsed by the application. However, it is necessary to specify the method of carrying initialization data in a variety of media containers so as to allow browser media engines to extract it from a stream for delivery to the application. The W3C maintains a registry of currently defined stream and initialization data formats.

Key System Attributes

The first step in the EME process is to find a key system that satisfies the requirements of the content and the application. The next sections describe the criteria available to the app to allow it to select from a set of multiple DRMs implemented in a browser via the Navigator.requestMediaKeySystemAccess() API.

DRM System

EME was designed with the understanding that a single browser may support one or more DRM systems. Additionally, with ISO CommonEncryption, a single piece of content could be protected with multiple DRM systems. In EME, Each DRM is associated with an identifying key system string (e.g. “com.microsoft.playready”, “org.w3.clearkey”) and a Universally Unique Identifier (UUID). While the key system string will be unique within a particular browser implementation, the UUID should be unique across all browser implementations. The DASH Industry Forum has created a registry of UUIDs to maintain this uniqueness across DRM vendors. The application must select a DRM system that is supported by both the content and the browser.

Content Types

Assessing content type support crosses the line between the CDM and the media engine in the browser. The content’s container type must certainly be supported by the browser since it will need to parse the container to learn about the content (i.e. is it encrypted? How may tracks? etc.). The audio and video codec information is also important and will require support by the browser and/or CDM. In certain DRM robustness models, decrypted media samples may not be allowed outside of the protected memory of the CDM or graphics drivers. In that case, it would be up to the CDM to coordinate decode and display of the media.

Key Session Persistence

When creating a key session, applications are able to indicate that licenses associated with that session are to be persisted across multiple loads of the application. In order to ensure that these types of sessions can be created, the application can request access only to CDMs that can support persistence.

Distinctive Identifiers

One of the big arguments against the inclusion of “black-box” CDMs in the world of open-source software is the possibility that the CDM would use unique or near-unique attributes of the user or device to “track” an individual or small groups of individuals. In attempt to address this privacy-related concern, a portion of the EME specification is dedicated to defining these distinctive identifiers and indicating when and where they might be used by a CDM. When requesting access to a particular key system, the application may choose to select only from CDMs in the browser that do not use distinctive identifiers. CDMs that have an explicit dependence on the use of distinctive identifiers may not be available for selection by an application (and thus, may prevent playback of certain content) if the app indicates them as off-limits.

MediaKeySession

Key sessions provide the means for initiating the license retrieval process and for storing the keys upon receipt. The application begins the process by providing initialization data to the CDM (MediaKeySession.generateRequest()). The CDM parses the data and generates a license request in its own secure, proprietary format and notifies the application (MediaKeyMessageEvent). Upon receipt of the license request, the application forwards it on to a license server that it knows can handle the request. In a production environment, it is possible that the license request will be packaged with other business-specific data such as requests for user authentication and/or authorization. When successful, the DRM server will respond with a license message which the application will forward on to the CDM (MediaKeySession.update()).

During the normal course of media playback, it is possible that the CDM will need to make an unsolicited request to the DRM server (e.g. to verify that a given license is still valid). The application simply continues to function as a proxy, sending the message to the license server and updating the CDM with the response.

Key Session Persistence

As described earlier, key sessions can be established as “persistent”. In this case, the CDM stores all keys (and other data) associated with the session to a private store on the device. The stored sessions are uniquely associated with the web application that created them. Each key session is assigned a unique identifier that the application can use to recall the session data at a later time. MediaKeySession provides several APIs to allow the application to manage the persistence of key sessions.

  • MediaKeySession.close() – Closes the key session and makes its keys unavailable for decrypting media, but leaves any persistent store of the session unaffected.
  • MediaKeySession.load()– Takes a sessionID and loads the data associated with that ID into an empty MediaKeySession object. The keys that were persisted with that session are once again available to decrypt content.
  • MediaKeySession.remove() – Closes the key session AND removes any persistent storage of that key session from the CDM. The associated session ID is now no longer valid.

MediaKeys

Once the application has found a key system that meets both its needs and the needs of the content, it can create MediaKeys. MediaKeys is a container for one or more key sessions. MediaKeys facilitates the association of decryption keys with the HTMLMediaElement that will be used to view the encrypted content. Even if keys have been fetched from a license server and stored in the CDM, the media will not be decrypted until those keys have been associated with the media element.

ClearKey

Also included in the EME specification are the details for a test DRM system known as ClearKey. ClearKey is exactly as its name implies: a system in which decryption keys are “in the clear” at some point during their journey to the CDM. Browser support for ClearKey is mandated by the EME spec. Its intended use is primarily as a means to evaluate an EME implementation in a browser when either content or a CDM for a “real” DRM system is not available. The formats for ClearKey license request and response messages are detailed in the spec. The mechanism by which an application attains ClearKey keys for a given piece of content is left up to the developer.

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".

Comments