Network of the Future Technology DeepdiveExplore the Tech


The Cost of Codecs: Royalty-Bearing Video Compression Standards and the Road that Lies Ahead

Feb 1, 2016

Since the dawn of digital video, royalty-bearing codecs have always been the gold standard for much of the distribution and consumption of our favorite TV shows, movies, and other personal and commercial content. Since the early 90’s, the Moving Picture Expert Group (MPEG) has defined the formats and technologies for video compression that allow all those pixels and frames to be squeezed into smaller and smaller files. For better or for worse, along with these technologies comes the intellectual property of the individuals and organizations that created them. In the United States and in many other countries, this means patents and royalties.


The MPEG Licensing Authority (MPEG-LA) was established after the development of the MPEG-2 codec to manage patents essential to the standard and to create a sensible consolidated royalty structure through which intellectual property holders could be compensated. With over 600 patents from 27 different companies considered to be essential to MPEG-2, MPEG-LA simplified the process of negotiating licensing agreements from each patent owner by anyone wishing to develop products utilizing MPEG-2. The royalty structure that MPEG-LA and the patent owners adopted for MPEG-2 was fairly simple: a one-time flat fee per codec. See table below.

The successor codec to MPEG-2, MPEG-4 Advanced Video Coding (AVC), was completed in 2003. Shortly thereafter, MPEG-LA released its licensing terms, which contained per-unit royalties and a ceiling on the amount that any one licensee must pay in a year. However, it seemed that not all parties who claimed to have essential patents for AVC were satisfied with the royalty structure established by MPEG-LA. Thus, a new AVC patent pool, managed by Via Licensing, a spin-off of Dolby Laboratories, Inc., emerged in 2004 and announced their separate licensing terms. This created a significant amount of confusion amongst those trying to license the technology. Would you end up having to license from both parties to protect yourself from future litigation? Some patent owners were in both pools; would you have to pay twice for these patents? In the end, Via Licensing signed an agreement with MPEG-LA and its licensors merged with the MPEG-LA licensors to create a single licensing authority. The royalty structure established by the MPEG-LA group for MPEG-4 added new fees to the mix. In addition to a one-time device fee (includes software and hardware applications), per-subscriber and per-title fees (e.g., VOD, digital media) were added. These additional fees have deterred many cable operators and other MVPDs from adopting MPEG-4. Royalties were also established for over-the-air broadcast services, while free Internet broadcast video (e.g YouTube, Facebook) remained exempt. See table below.

The latest MPEG video codec is called High Efficiency Video Coding (HEVC), and with it comes the latest licensing terms from MPEG-LA. A flat royalty per device is incurred for those shipping over 100,000 devices to end-users. Companies with device volumes under 100,000 will pay no royalties. In addition, the royalty cap for organizations was raised substantially. Most companies distributing HEVC encoders/decoders will not be charged royalties because they will easily fall under the 100,000-unit limit. Larger companies that sell mobile devices (Samsung, LG, etc.), OS vendors (Microsoft), and large media software vendors (Adobe, Google) will likely be the ones paying HEVC royalties. Learning from the lackluster adoption rate of MPEG-4 by content distributors, there are no royalties on content distribution. This seems like a step in the right direction.

The following table details the royalty structures established from the days of MPEG-2 up to the proposed terms for HEVC.
Download PDF version

Trouble on the Horizon

As one might expect, many companies who hold patents that they claim are essential to HEVC were not happy with the MPEG-LA terms and have instead joined a new licensing organization, HEVCAdvance. In early 2015, when HEVCAdvance first announced their licensing terms, it caused quite a stir in the industry. Not only did they have much higher per-device rates, but there was also a very vague “attributable revenue” royalty for content distributors and any other organization that utilizes HEVC to facilitate revenue generation. On top of all that, there were no royalty caps.

In December, 2015, due in no small part to the media and industry backlash that followed the release of the original license terms, HEVCAdvance released a modified terms and royalties summary. Royalties for devices were lowered slightly, but still significantly higher than what MPEG-LA has established – and with no yearly cap. Additionally, there are slightly different royalty rates depending on the type of device (e.g., mobile phone, set-top-box, 4K UHD+ TV). No royalties are assessed for free, over-the-air and Internet broadcast video. However, per-title and per-subscriber royalties remain. See table above. The HEVCAdvance terms also assess royalties with no caps for past sales of per-title and subscription-based units. Finally, there is an incentive program in which early adopters of the HEVCAdvance license can receive limited discounts on royalties for past and future sales.

It is important to note that the companies that participate in each patent pool (HEVCAdvance, MPEG-LA) are not the same. If the two patent pools are not reconciled or merged, licensees may have to sign agreements with both organizations to better protect themselves against infringement. This will result in a substantial financial impact to the device vendors (TVs, Browsers, Operating Systems, Media Players, Blu-Ray Players, Set-Top Boxes, GameConsoles, etc). Of course, content distributors are also unhappy with the HEVCAdvance terms that revert back to the content-based royalties of MPEG-4, and a higher cap of $5M per year. As a result, you may expect the content distributors to pass a portion of their subscription- and title-based royalties on to their customers which results in increased prices for consumers like you and me.

Update (2/3/16): Technicolor announced that they will withdraw from HEVCAdvance and license its patents directly to device manufacturers. While this is an encouraging sign (and good for content distributors), I'm curious as to why they did not just join MPEG-LA. If more and more companies take the route of individual licensing, it may make the process of acquiring licenses for HEVC more complex.

The Royalty-Free Codec Revolution

In response to the growing concerns regarding the costs of using and deploying MPEG-based codecs, numerous companies and organizations have developed, or are planning to develop, alternatives that are starting to show some promise.

Google VPx

Google, through its acquisition of On2 Technologies, has developed the VP8 and VP9 codecs to compete with AVC and HEVC, respectively. Google has released the VP8 codec, and all IP contained therein, under a cross-license agreement that provides users royalty-free rights to use the technology. In 2013, Google signed an agreement with MPEG-LA and 11 MPEG-LA AVC licensors to acquire a license to VP8 essential patents from this group. This makes VP8 a relatively safe choice, legally speaking, for use as an alternative to AVC. The agreement between Google and MPEG-LA also covers one “next-generation” codec. According to my discussions with Google, this is meant to be the VP9 codec. The VP8 cross-license FAQ indicates that a VP9 cross-license is in progress, but nothing official exists to-date.

Most of the content served by Google’s YouTube is compressed with VP8/VP9 technology. Additionally, numerous software and hardware vendors have added native support for VPx codecs. Chrome, Firefox, and Opera browsers all have software-based VP8 and VP9 decoders. Recently, Microsoft announced that they would be adding VP9 decode support to their Edge browser in an upcoming release. On the hardware side, Google has signed agreements with almost every major chipset vendor to add VP8/VP9 functionality to their designs.

From a performance standpoint, we feel that VP8/AVC and VP9/HEVC are very similar. Numerous experiments have been run to try to assess the relative performance of the codecs and, while it is difficult to do an apples-to-apples comparison, they seem very close. If you’d like to look for yourself, here is a sampling the many tests we found:

  • Mukherjee, et al., Proceedings of the International Picture Coding Symposium, pp. 390-393, Dec. 2013, San Jose, CA, USA.
  • Grois, et al., Proceedings of the International Picture Coding Symposium, pp. 394-397, Dec. 2013, San Jose, CA, USA.
  • Bankoski, et al., Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1-6, July 2011, Barcelona, ES.
  • Rerabek, et al., Proceedings of the SPIE Applications of Digital Image Processing, XXXVII, vol. 9217, August 2014, San Diego, CA, USA
  • Feller, et al., Proceedings of the IEEE International Conference on Consumer Electronics, pp. 57-61, Sept. 2011, Berlin, DE.
  • Kufa, et al., Proceedings of the 25th International Radioelektronika Conference, pp.168-171, April 2015, Pardubice, CZ
  • Uhrina, et al., 22nd Telecommunications Forum (TELFOR), pp. 905-908, Nov. 2014, Belgrade, RS
An Alliance of Titans

At least in part in reaction to the HEVCAdvance license announcement in 2015, a collection of the largest media and Internet companies in the world announced that they were forming a group with the sole goal of helping develop a royalty-free video codec for use on the web. The Alliance for Open Media (AOM), founded by Google, Cisco, Amazon, Netflix, Intel, Microsoft, and Mozilla, aims to combine their technological and legal strength to ensure the development of a next-generation compression technology that will be both superior in performance over HEVC and free of charge.

Several of the companies in AOM have already started work on a successor codec. Google has been steadily progressing on VP10. While Cisco has introduced a new codec called Thor. Finally, Mozilla has long been working on a codec named Daala, which takes a drastically different approach to the standard Discrete Cosine Transform (DCT) algorithm used in most compression systems. Cisco, Google, and Mozilla have all stated that they would like to combine the best of their separate technologies into the next generation codec envisioned by AOM. It is likely that this work would be done in the IETF NetVC working group, but not much has been decided as of yet.

The Future is Free?

Not sure we can say this statement with utmost confidence; but it seems clear that the confusing landscape of IP-encumbered video codecs is driving industry towards a future where digital video--the predominant leader in video traffic on the internet--may truly be free from patent royalties.

Greg Rutz is a Lead Architect in the Advanced Technology Group at CableLabs. 

Special thanks to Dr. Arianne Hinds, Principal Architect at CableLabs, and Jud Cary, Deputy General Counsel at CableLabs for their contributions to this article.

Technical Blog

Content Creation Demystified: Open Source to the Rescue

Jun 9, 2015

There are four main concepts involved in securing premium audio/visual content:

  1. Authentication
  2. Authorization
  3. Encryption
  4. Digital Rights Management (DRM)

Authentication is the act of verifying the identity of the individual requesting playback (e.g. username/password). Authorization involves ensuring that individual’s right to view that content (e.g. subscription, rental). Encryption is the scrambling of audio/video samples; decryption keys (licenses) must be acquired to enable successful playback on a device. Finally, DRM systems are responsible for the secure exchange of encryption keys and the association of various “rights” with those keys. Digital rights may include things such as:

  • Limited time period (expiration)
  • Supported playback devices
  • Output restrictions (e.g. HDMI, HDCP)
  • Ability to make copies of downloaded content
  • Limited number of simultaneous views

This article will focus on the topics of encryption and DRM and describe how open source software can be used to create protected, adaptive bitrate test content for use in evaluating web browsers and player applications. I will close this article by describing the steps for how protective streams can be created.

One Encryption to Rule Them All

Encryption is the process of applying a reversible mathematical operation to a block of data to disguise its contents. Numerous encryption algorithms have been developed, each with varying levels of security, inputs and outputs, and computational complexity. In most systems, a random sequence of bytes (the key) is combined with the original data (the plaintext) using the encryption algorithm to produce the scrambled version of the data (the ciphertext). In symmetric encryption systems, the same key used to encrypt the data is also used for decryption. In asymmetric systems, keys come in pairs; one key is used to encrypt or decrypt data while its mathematically related sibling is used to perform the inverse operation. For the purposes of protecting digital media, we will be focusing solely on symmetric key systems.

With so many encryption methods to choose from, one could reasonably expect that web browsers would only support a small subset. If the sets of algorithms implemented in different browsers do not overlap, multiple copies of the media would have to be stored on streaming servers to ensure successful playback on most devices. This is where the ISO Common Encryption standards come in. These specifications identify a single encryption algorithm (AES-128) and two block chaining modes (CTR, CBC) that all encryption and decryption engines must support. Metadata in the stream describes what keys and initialization vectors are required to decrypt media sample data. Common Encryption also supports the concept of “subsample encryption”. In this situation, unencrypted samples are interspersed within the total set of samples and Common Encryption metadata is defined to describe the byte offsets where encrypted samples begin and end.

In addition to encryption-based metadata, Common Encryption defines the Protection Specific System Header (PSSH). This bit of metadata is an ISOBMFF box structure that contains data proprietary to a particular DRM that will guide the DRM system in retrieving the keys that are needed to decrypt the media samples. Each PSSH box contains a “system ID” field that uniquely identifies the DRM system to which the contained data applies. Multiple PSSH boxes may appear in a media file indicating support for multiple DRM systems. Hence, the magic of Common Encryption; a standard encryption scheme and multi-DRM support for retrieval of decryption keys all within a single copy of the media.

The main Common Encryption standard (ISO 23001-7) specifies metadata for use in ISO BMFF media containers. ISO 23001-9 defines Common Encryption for the MPEG Transport Stream container. In many cases, the box structures defined in the main specification are simply inserted into MPEG TS packets in an effort avoid introducing completely new structures that essentially hold the same data.

Creating Protected Adaptive Bitrate Content

CableLabs has developed a process for creating encrypted MPEG-DASH adaptive bitrate media involving some custom software combined with existing open source tools. The following sections will introduce the software and go through the process of creating protected streams. These tools and some accompanying documentation are available on GitHub. A copy of the documentation is also hosted here.

EME Content Tools

Adaptive Bitrate Transcoding

The first step in the process is to transcode source media files into several, lower bitrate versions. This can be simply reducing the bitrate, but in most cases the resolution should be lowered. To accomplish this, we use the popular FFMpeg suite of utilities. FFMpeg is a multi-purpose audio/video recorder, converter, and streaming library with dozens of supported formats. An FFMpeg installation will need to have the x264 and fdk_aac codec libraries enabled. If the appropriate binaries are not available, it can be built .

CableLabs has provided an example script that can be used as a guide to generating multi-bitrate content. There are some important items in this script that should be noted.

One of the jobs of an ABR packager is to split the source stream up into “segments.” These segments are usually between 2 and 10 seconds in duration and are in frame-accurate alignment across all the bitrate representations of media. For bitrate switching to appear seamless to the user, the player must be able to switch between the different bitrate streams at any segment boundary and be assured that the video decoder will be able to accept the data. To ensure that the packager can split the stream at regular boundaries, we need to make sure that our transcoder is inserting I-Frames (video frames that have no dependencies on other frames) at regular intervals during the transcoding process. The following arguments to x264 in the script accomplish that task:

-x264opts "keyint=$framerate:min-keyint=$framerate:no-scenecut"

We use the framerate detected in the source media to instruct the encoder to insert new I-Frames at least once ever second. Assuming our packager will segment using an integral number of seconds, the stream will be properly conditioned. The no-scenecut argument tells the encoder not to insert random I-Frames when it detects a scene change in the source material. We detect the framerate of the source video using the ffprobe utility that is part of FFMpeg.

framerate=$((`./ffprobe $1 -select_streams v -show_entries stream=avg_frame_rate -v quiet -of csv="p=0"`))

At the bottom of the script, we see the commands that perform the transcoding using bitrates, resolutions, and codec profile/level selections that we require.

transcode_video "360k" "512:288" "main" 30 $2 $1
transcode_video "620k" "704:396" "main" 30 $2 $1
transcode_video "1340k" "896:504" "high" 31 $2 $1
transcode_video "2500k" "1280:720" "high" 32 $2 $1
transcode_video "4500k" "1920:1080" "high" 40 $2 $1

transcode_audio "128k" $2 $1
transcode_audio "192k" $2 $1

For example, the first video representation is 360kb/s with a resolution of 512x288 pixels using AVC Main Profile, Level 3.0. The other thing to note is that the script transcodes audio and video separately. This is due to the fact that the DASH-IF guidelines forbid multiplexed audio/video segements (see section 3.2.1 of the DASH-IF Interoperability Points v3.0 for DASH AVC/264).


Next, we must encrypt the video and/or audio representation files that we created. For this, we use the MP4Box utility from GPAC. MP4Box is perfect for this task because it supports the Common Encryption standard and is highly customizable. Not only will perform AES-128 CTR or CBC mode encryption, but it can do subsample encryption and insert multiple, user-specified PSSH boxes into the output media file.

To configure MP4Box for performing Common Encryption, the user creates a “cryptfile”. The cryptfile is an XML-based description of the encryption parameters and PSSH boxes. Here is an example cryptfile:

The top-level <GPACDRM> element indicates that we are performing AES-128 CTR mode Common Encryption. The <DRMInfo> elements describe the PSSH boxes we would like to include. Finally, a single <CrypTrack> elements is specified for each track in the source media we would like to encrypt. Multiple encryption keys for each track may be specified, in which case the number of samples to encrypt would be indicated with a single key before moving to the next key in the list .

Since PSSH boxes are really just containers for arbitrary data, MP4Box has defined a set of XML elements specifically to define bitstreams in the <DRMInfo> nodes. Please see the MP4Box site for a complete description of the bitstream description syntax.

Once the cryptfile has been generated, you simply pass it as an argument on the command line to MP4Box. We have created a simple script to help encrypt multiple files at once (since you may most likely be encrypting each bitrate representation file for your ABR media).

Creating MP4Box Cryptfiles with DRM Support

In any secure system, the keys necessary to decrypt protected media will most certainly be kept separate from the media itself. It is the DRM system’s responsibility to retrieve the decryption keys and any rights afforded to the user for that content by the content owner. If we wish to encrypt our own content, we will also need to ensure that the encryption keys are available on a per-DRM basis for retrieval when the content is played back.

CableLabs has developed custom software to generate MP4Box cryptfiles and ensure that the required keys are available on one or more license servers. The software is written in Java and can be run on any platform that supports a Java runtime. A simple Apache Ant buildfile is provided for compiling the software and generating executable JAR files. Our tools currently support the Google Widevine and Microsoft PlayReady DRM systems with a couple of license server choices for each. The first Adobe Access CDM is just now being released in Firefox and we expect to update the tools in the coming months to support Adobe DRM. Support for W3C ClearKey encryption is also available, but we will focus on the commercial DRM systems for the purposes of this article.

The base library for the software is CryptfileBuilder. This set of Java classes provides abstractions to facilitate the construction and output of MP4Box cryptfiles. All other modules in the toolset are dependent upon this library. Each DRM-specific tool has detailed documentation available on the command line (-h arg) and on our website.

Microsoft PlayReady Test Server

The code in our PlayReady software library provides 2 main pieces of functionality:

  1. PlayReady PSSH generator for MP4Box cryptfiles
  2. Encryption key generator for use with the Microsoft PlayReady test license server.

Instead of allowing clients to ingest their own keys, the license server uses an algorithm based on a “key seed” and a 128-bit key ID to derive decryption keys. The algorithm definition can be found in this document from Microsoft (in the section titled “Content Key Algorithm”). Using this algorithm, the key seed used by the test server, and a key ID of our choosing, we can derive the content key that will be returned by the server during playback of the content.

Widevine License Portal

Similar to PlayReady, our Widevine toolset provides a PSSH generator for the Widevine DRM system. Widevine, however, does not provide a generic test server like the one from Microsoft. Users will need to contact Widevine to get their own license portal on their servers. With that portal, you will get a signing key and initialization vector for signing your requests. You provide this information as input to the Widevine cryptfile generator.

The Widevine license server will generate encryption keys and key IDs based on a given “content ID” and media type (e.g. HD, SD, AUDIO, etc). Their API has been updated since our tools were developed and they now support ingest of “foreign keys” that our tool could generate itself, but we don’t currently support that.


The real power of Common Encryption is made apparent when you add support for multiple DRM systems in a single piece of content. With the license servers used in our previous examples, this was not possible because we were not able to select our own encryption keys (as explained earlier, Widevine has added support for “foreign keys”, but our tools have not been updated to make use of them). With that in mind, a new licensing system is required to provide the functionality we seek.

CableLabs has partnered with CastLabs to integrate support for their DRMToday multi-DRM licensing service in our content creation tools. DRMToday provides a full suite of content protection services including encryption and packaging. For our needs, we only rely on the multi-DRM licensing server capabilities. DRMToday provides a REST API that our software uses to ingest Common Encryption keys into their system for later retrieval by one of the supported DRM systems.

MPEG-DASH Segmenting and Packaging

The final step in the process is to segment our encrypted media files and generate a MPEG-DASH manifest (.mpd). For this, we once again use MP4Box, but this time we use the -dash argument. There are many options in MP4Box for segmenting media files, so please run MP4Box -h dash to see the full list of configurations.

For the purposes of this article, we will focus on generating content that meets the requirements of the DASH-IF DASH-AVC264 “OnDemand” profile. Our repository contains a helper script that will take a set of MP4 files and generate DASH content according to the DASH-IF guidelines. Run this script to generate your segmented media files and produce your manifest.

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".

Technical Blog

Web Media Playback: Solved with dash.js

Jun 2, 2015

(Disclaimer: the author is a regular contributor to the dash.js project)
A crucial piece of any standards development process is the creation of a reference implementation to validate the feasibility of the many requirements set forth within the standard. After all, it makes no sense to create a standard if it is impossible to create fully compliant products. The DASH Industry Forum recognized this and created the dash.js project.

dash.js is an open-source, JavaScript media player library coupled with a few example applications. It relies on the W3C Media Source Extensions (MSE) for adaptive bitrate playback and Encrypted Media Extensions (EME) for protected content support. While dash.js started out as a reference implementation, it is has been adopted by many organizations for use in commercial products. The DASH-IF Interoperability Points specification has achieved relative stability with respect to the base functionality, so the development team is focused on adding features and improving performance to further increase its usefulness to companies producing web media players.

One of the benefits of dash.js is in its richness of features. It supports both live and on-demand content, multi-period manifests, and subtitles, to name a few. The player is highly extensible with an adaptive-bitrate rules engine, configurable buffer scheduling, and a metrics reporting system. The example application provided with the library source displays the current buffer levels and current representation index for both audio and video, and it allows the user to manually initiate bitrate changes or let the player rules handle it automatically. Finally, the app contains a buffer level graph, manifest display and debug logging window.

CableLabs has been an active contributor to the dash.js project. Much of the EME support in dash.js was designed and implemented by CableLabs to ensure that the application will support the wide variety of API implementations found in production desktop web browsers today. Additionally, CableLabs has created and hosted test content in the dash.js demo application to ensure that others can observe the dash.js EME implementation in action and evaluate support for protected media on their target browsers.

Content Protection

The media library contains extensive support for playback of protected content. The EME specification has seen many modifications and updates over the years and browser vendors have selected various points during its development to release their products. In order to support as many of these browsers as possible, dash.js has developed a set of APIs (MediaPlayer.models.ProtectionModel) as an abstraction layer to interface with the underlying implementation, whatever that may be. The APIs are designed to mimic the functionality of the most recent EME spec. Several implementations of this API have been developed to translate back to the EME versions that were found in production browsers. The media player will detect browser support and instantiate the proper JavaScript class automatically.
The MediaPlayer.models.ProtectionModel and MediaPlayer.controllers.ProtectionController classes provide the application with access to the EME system both inside and outside the player. ProtectionModel provides management of MediaKeySessions and protection system event notification for a single piece of content. Most actions performed on the model are made using ProtectionController.   Applications can use the media player library to instantiate these classes outside of the playback environment to pre-fetch licenses. Once licenses have been pre-fetched, the app can attach the protection objects to the player to associate the licenses with the HTMLMediaElement that will handle playback.

An EME demo app is provided with dash.js (samples/dash-if-reference-player/eme.html) that provides some visibility and control into the EME operations taking place in the browser. The app allows the user to pre-fetch licenses, manage key session persistence, and playback associated content. It also shows the selected key system, and the status of all licenses associated with key sessions.


Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".

Technical Blog

Adaptive Bitrate and MPEG-DASH

May 26, 2015

The basic solution that streaming video provides is that an entire video file does not need to be downloaded before it is viewed. Little chunks of media may be grabbed for playback in order to achieve the same effect. The one caveat is that the media must be received as fast of the player consumes it. Furthermore, all things being equal, the higher the quality of a particular piece of content, the more bytes it occupies on disk. So it holds true that to stream higher quality video a faster network connection is required. But all things are not equal when it comes to network speeds seen by the multitude of connected devices in the world today.

The introduction of Adaptive Bitrate (ABR) streaming technology has revolutionized the delivery of streaming media. There are two fundamental ideas behind ABR:

  1. Multiple copies of the content at various quality levels are stored on the server.
  2. The client device detects its current network conditions and requests lower quality content when network speeds are slower and higher quality content when network speeds are faster.

These principles are quite simple, but there are many technical challenges involved in producing a functional design for an ABR system. First, the media segments on the server must be created in such a way that the client application is allowed to switch between higher and lower quality versions at any time without seeing a disruptive change in the presentation (e.g. video “jumps” or audio “clicks, pops”). Second, there must be a way for the client to “discover” the characteristics of the ABR content so that it knows what sorts of quality choices are available. And finally, the client itself must be implemented so that it can smartly detect network speed changes and potentially switch to a different quality stream.

Today, a large portion of video streaming over the internet is using one of several ABR formats. Apple’s HTTP Live Streaming (HLS) and MPEG’s Dynamic Adaptive Streaming over HTTP (DASH) are the predominant technologies. Adobe’s HTTP Dynamic Streaming (HDS) and Microsoft’s Smooth Streaming (MSS) were once quite popular, but have fallen out of favor recently. As you can see, most ABR technologies rely on HTTP as the network protocol for serving and accessing data due to the near ubiquitous support in servers and clients. All ABR technologies specify some sort of descriptive file or “manifest” which describes the locations, quality, and types of content available to the client.

Of all the ABR formats described previously, only MPEG-DASH was developed through an open, standards-based process in an effort to incorporate input from all industries and organizations that plan to deploy it. CableLabs, once again, played a critical role in representing the needs of our members during this process. We will focus on the details of MPEG-DASH technology for the remainder of the article.


The MPEG-DASH specification (ISO/IEC 23009-1) was first published in early 2012 and has undergone several updates since. As in other formats, media segments are stored on a standard web server and downloaded using the HTTP or HTTPS protocols. While DASH is audio/video codec agnostic, there are profiles in the specification that indicate how media is to be segmented on the server for ISOBMFF and MPEG2 Transport Stream container formats. Additionally, both live and on-demand media types have been given special consideration.

The DASH manifest file is known as a Media Presentation Description, or MPD. It is XML-based and contains all the information necessary for the client to download and present a given piece of content.


The root element in the manifest is named MPD. This contains high-level information about the content as a whole.   MPDs can be “static” or “dynamic”. A static MPD is what would be used for a typical on-demand movie. The client can parse the manifest once and expect to have all the information it needs to present the content in its entirety. A “dynamic” MPD indicates that the contents of the manifest may change over time, such as would be expected for live or interactive content. For dynamic manifests, the MPD node indicates the maximum time the client should wait before it requests a new copy.

Within the root MPD element is one or more Period elements. A Period represents a window of time in which media is expected to be presented. A Period can reference an absolute point in time, as would be the case for live media. Alternatively, it can simply indicate duration for the media items contained within it. When multiple Periods are present in an MPD, it is not necessary to specify a start time for each Period in order for them to be played in the sequence that they appear. Periods may even appear in a manifest prior to its associated media segments being installed on the server. This allows clients to prepare themselves for upcoming presentations.

Each Period contains one or more AdaptationSet elements. An AdaptationSet describes a single media element available for selection in the presentation. There may be one AdaptationSet for HD video and another one for SD video. Another reason to use multiple AdaptationSets is when there are multiple copies of the media that use different video codecs, which would enable playback on clients that only support one codec or the other. Additionally, clients may want to be able to select between multiple language audio tracks or between multiple video viewpoints. Each AdaptationSet contains attributes that allow the client to identify the type and format of media available in the set so that it can make appropriate choices regarding which to present.


At the next level of the MPD is the Representation. Every AdaptationSet contains a Representation element for each quality-level (bandwidth value) of media available for selection. Different video resolutions and bitrates may be available for selection and the Representation element tells the client exactly how to find media segments for that quality level. There exists several different mechanisms to describe the exact duration and name of each media file in the Representation (SegmentTemplate, SegmentTimeline, etc.), but we won’t dive into that level of detail in this article.

Content Protection

Since this blog series is focused on the topic of premium subscription content that is required to be protected, I want to briefly discuss the features incorporated in the DASH specifications that describe support for encrypted media.

There are actually four separate documents that make up the DASH specification set. Part 1 (ISO/IEC 23009-1) is the base DASH specification.   Part 2 (ISO/IEC 23009-2) describes the requirements for conformance software to validate the specification. Part 3 (ISO/IEC 23009-3) provides guidelines for implementing the DASH spec. Finally, part 4 (ISO/IEC 23009-4) describes content protection for segment encryption and authentication. In segment encryption, the entire segment file is encrypted and hashed so that its contents can be protected and its integrity validated. This is different than the “sample encryption” that is the focus of this blog series. For sample encryption, only audio and video sample information is encrypted, leaving container and codec metadata “in the clear”.

The ContentProtection element indicates that one or more media components are protected in some way. ContentProtection elements can be specified at either the Representation or AdaptationSet level in the MPD. The schemeIdUri attribute of the ContentProtection element uniquely identifies the particular system used to protect the content. I will dive further into the usage of this element when we cover CommonEncryption in Part 4 of the blog series.





One of the drawbacks that come with MPEG-DASH being developed in an open standards body is the relative complexity of the final product. Individuals and organizations from around the globe have contributed their own requirements to the standardization effort. This leads to a specification that covers a vast array of use-cases and deployment scenarios. When it comes to producing an actual implementation, however, the standard becomes somewhat unwieldy. Several new standards bodies have now emerged to whittle down the base DASH spec to a subset that meets the needs of its members and reduce the complexity to the point that functioning workflows can actually be developed.

One such standards body is the DASH Industry Forum, or DASH-IF.    What started out as marketing and promotions group for MPEG-DASH, has grown into a full-blown standards organization with members such as Adobe, Microsoft, Qualcomm, and Samsung. The goal of this group is drive adoption of MPEG-DASH through the development of “interoperability points” that describe manifest restrictions, audio/video codecs, media containers, protection schemes, and more to produce a more implementable subset of the overall DASH spec.

The DASH-IF Interoperability Guidelines for DASH-AVC/264 (latest version is 3.0) sets forth the following requirements and restriction on the base DASH specification:

  • AVC/H.264 Video Codec (Main or High profiles)
  • HE-AACv2 Audio Codec
  • Fragmented ISOBMFF Segments
  • MPEG-DASH “live” and “ondemand” profiles
  • Segments are IDR-aligned across Representations and have specific tolerances with regard to variation in segment durations.
  • CommonEncryption
  • SMPTE-TT Closed Captions

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".

Technical Blog

Encrypted Media Extensions Provide a Common Ground

May 19, 2015

Carriage agreements between content owners and Multichannel Video Programming Distributors (MVPDs) are likely to contain clauses that require the MVPD to provide ample protection against content theft. In the traditional, QAM-based delivery model of cable television networks, the desired level of protection was relatively easy to implement and manage due to the fact that the operator controlled all parts of the ecosystem. Tight integration of the headend, network, and client set-top-boxes with an off-the-shelf or homegrown Conditional Access System (CAS) provided the protection that was needed.

In the age of internet video, the client playback device is typically owned by the consumer and utilizes a variety of operating systems and hardware configurations. In the beginning, companies like Adobe and Microsoft developed native applications to support decryption and rendering of premium video content. These native apps were later converted to web-browser extensions, which enabled HTML-embedded encrypted content. With the advent of adaptive bitrate (ABR) streaming paradigms such as Apple’s HTTP Live Streaming, Microsoft’s Smooth Streaming, and MPEG-DASH, these “black-box” media players left many content distributors wanting even more control over the playback experience. In response, the World Wide Web Consortium (W3C) developed the Media Source Extensions API to allow JavaScript applications to provide individual audio, video, and data media samples to the browser. With this powerful new tool, web applications had the power to decide when and how to switch between various bitrates. While this solved the problem of putting adaptive bitrate control in the hands of the application, it did not provide a secure method of playing encrypted content.

In 2012, W3C began work on standardizing the Encrypted Media Extensions (EME). These new JavaScript APIs allow a web application to facilitate the exchange of decryption keys between a digital rights management (DRM) system embedded in the web browser (the Content Decryption Module or CDM) and a key source or license server located somewhere on the network. CableLabs has played an active role in the W3C working group to ensure the needs of the cable industry are met in the development of the EME specification. The EME APIs have undergone several significant transformations over its three-year history, but we are now seeing some stability in the architecture and browser vendors are beginning to produce some complete and robust implementations.

EME Workflow

The process by which a JavaScript web application utilizes the EME APIs goes something like this:

  1. (OPTIONAL) The browser media engine notifies the app that it has encountered encrypted media samples for which it has no appropriate decryption key.
  2. App requests access to a DRM system available in the browser that supports specific operational and technical requirements associated with the content.
  3. App assigns the selected DRM system to an HTMLMediaElement.
  4. App creates one or more key sessions associated with the selected DRM system, each of which will manage one or more decryption keys (licenses)
    1. The app instructs the key session to generate a license request message by providing it with initialization data. The browser may provide this data by means of the event in Step 1, or it may be acquired by the app through other means (i.e. in an ABR manifest file).
    2. The CDM for the selected DRM system will generate a data blob (license request) and deliver it to the app.
    3. The app sends the license request to a license server.
    4. Upon receiving a response to its license request, the app passes the response message back to the CDM. The CDM adds to the key session any decryption keys contained within the response.
  5. The CDM and/or browser media engine will use keys stored in the key session to decrypt media samples as they are encountered.


Initialization Data

Protected content intended for playback in an EME-enabled web browser must be accompanied with data that instructs a particular DRM implementation how to fetch the licenses required to decrypt it. This may include information such as key IDs, license server URLs, and digital rights assigned to the content. The contents of the initialization data packet are, in most cases, not to be parsed by the application. However, it is necessary to specify the method of carrying initialization data in a variety of media containers so as to allow browser media engines to extract it from a stream for delivery to the application. The W3C maintains a registry of currently defined stream and initialization data formats.

Key System Attributes

The first step in the EME process is to find a key system that satisfies the requirements of the content and the application. The next sections describe the criteria available to the app to allow it to select from a set of multiple DRMs implemented in a browser via the Navigator.requestMediaKeySystemAccess() API.

DRM System

EME was designed with the understanding that a single browser may support one or more DRM systems. Additionally, with ISO CommonEncryption, a single piece of content could be protected with multiple DRM systems. In EME, Each DRM is associated with an identifying key system string (e.g. “”, “org.w3.clearkey”) and a Universally Unique Identifier (UUID). While the key system string will be unique within a particular browser implementation, the UUID should be unique across all browser implementations. The DASH Industry Forum has created a registry of UUIDs to maintain this uniqueness across DRM vendors. The application must select a DRM system that is supported by both the content and the browser.

Content Types

Assessing content type support crosses the line between the CDM and the media engine in the browser. The content’s container type must certainly be supported by the browser since it will need to parse the container to learn about the content (i.e. is it encrypted? How may tracks? etc.). The audio and video codec information is also important and will require support by the browser and/or CDM. In certain DRM robustness models, decrypted media samples may not be allowed outside of the protected memory of the CDM or graphics drivers. In that case, it would be up to the CDM to coordinate decode and display of the media.

Key Session Persistence

When creating a key session, applications are able to indicate that licenses associated with that session are to be persisted across multiple loads of the application. In order to ensure that these types of sessions can be created, the application can request access only to CDMs that can support persistence.

Distinctive Identifiers

One of the big arguments against the inclusion of “black-box” CDMs in the world of open-source software is the possibility that the CDM would use unique or near-unique attributes of the user or device to “track” an individual or small groups of individuals. In attempt to address this privacy-related concern, a portion of the EME specification is dedicated to defining these distinctive identifiers and indicating when and where they might be used by a CDM. When requesting access to a particular key system, the application may choose to select only from CDMs in the browser that do not use distinctive identifiers. CDMs that have an explicit dependence on the use of distinctive identifiers may not be available for selection by an application (and thus, may prevent playback of certain content) if the app indicates them as off-limits.


Key sessions provide the means for initiating the license retrieval process and for storing the keys upon receipt. The application begins the process by providing initialization data to the CDM (MediaKeySession.generateRequest()). The CDM parses the data and generates a license request in its own secure, proprietary format and notifies the application (MediaKeyMessageEvent). Upon receipt of the license request, the application forwards it on to a license server that it knows can handle the request. In a production environment, it is possible that the license request will be packaged with other business-specific data such as requests for user authentication and/or authorization. When successful, the DRM server will respond with a license message which the application will forward on to the CDM (MediaKeySession.update()).

During the normal course of media playback, it is possible that the CDM will need to make an unsolicited request to the DRM server (e.g. to verify that a given license is still valid). The application simply continues to function as a proxy, sending the message to the license server and updating the CDM with the response.

Key Session Persistence

As described earlier, key sessions can be established as “persistent”. In this case, the CDM stores all keys (and other data) associated with the session to a private store on the device. The stored sessions are uniquely associated with the web application that created them. Each key session is assigned a unique identifier that the application can use to recall the session data at a later time. MediaKeySession provides several APIs to allow the application to manage the persistence of key sessions.

  • MediaKeySession.close() – Closes the key session and makes its keys unavailable for decrypting media, but leaves any persistent store of the session unaffected.
  • MediaKeySession.load()– Takes a sessionID and loads the data associated with that ID into an empty MediaKeySession object. The keys that were persisted with that session are once again available to decrypt content.
  • MediaKeySession.remove() – Closes the key session AND removes any persistent storage of that key session from the CDM. The associated session ID is now no longer valid.


Once the application has found a key system that meets both its needs and the needs of the content, it can create MediaKeys. MediaKeys is a container for one or more key sessions. MediaKeys facilitates the association of decryption keys with the HTMLMediaElement that will be used to view the encrypted content. Even if keys have been fetched from a license server and stored in the CDM, the media will not be decrypted until those keys have been associated with the media element.


Also included in the EME specification are the details for a test DRM system known as ClearKey. ClearKey is exactly as its name implies: a system in which decryption keys are “in the clear” at some point during their journey to the CDM. Browser support for ClearKey is mandated by the EME spec. Its intended use is primarily as a means to evaluate an EME implementation in a browser when either content or a CDM for a “real” DRM system is not available. The formats for ClearKey license request and response messages are detailed in the spec. The mechanism by which an application attains ClearKey keys for a given piece of content is left up to the developer.

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.

This post is part of a technical blog series, "Standards-Based, Premium Content for the Modern Web".


Standards-Based, Premium Content for the Modern Web

May 19, 2015

How Premium Video Content Streams over the Internet
These days, a person is just as likely to be watching a movie on their laptop, tablet, or mobile phone, as they are to be sitting in front of a television. Cable operators are eager to provide premium video content to these types of devices but there are high costs involved in supporting the wide array of devices owned by their customers. A multitude of technological obstacles stand in the way of delivering a secure, high-quality, reliable viewing experience to the small-screen. This four-part blog series describes an open, standards-based approach to providing premium, adaptive bitrate, audio/video content in HTML and how open source software can assist in the evaluation and deployment of these technologies.

Part 1 - Open Web Standards: Encrypted Media Extensions (EME)

HTML5 extension APIs enable JavaScript applications to facilitate encryption key requests between a DRM-specific Content Decryption Model (CDM) embedded in the browser and a remote license server. Using EME, the app may choose between multiple DRM systems available in the browser to meet the requirements of both the content and its legal distribution rights. The mechanism by which each DRM conducts its business is opaque to the application since it simply functions as a proxy for messages to and from the license server.



Part 2 - Streaming Media Formats: Adaptive Bitrate (ABR) Media and MPEG-DASH

In the world of streaming video, a provider’s worst nightmare is the infamous “buffering circle” animation. On slower network connections, devices will not be able to download media segments for high-resolution, high-bitrate video fast enough to prevent the player from buffering. Adaptive Bitrate (ABR) media formats were designed to alleviate this problem by chopping the media up into segments, providing multiple resolutions and bitrates for each segment, and then allowing the client application to choose which segment to download based on current network conditions. Multiple ABR formats exist today, but we will focus on the MPEG open standard, Dynamic Adaptive Streaming over HTTP (DASH). While the main DASH specification enables a vast array of media and manifest choices, several standards bodies (like the DASH Industry Forum) have been established to define subsets of DASH that make it easier to implement and test deployable solutions.



Part 3 - Web Media Playback: dash.js

The W3C Media Source Extensions (MSE) and Encrypted Media Extensions (EME) APIs provide all the tools necessary to play adaptive bitrate, premium video content in modern web browsers. However, we still need sophisticated HTML/JavaScript applications that can make use of these APIs. What began as a reference implementation for the Dash Industry Forum’s interoperability specification, the dash.js open source media player has since been adopted as the basis for several commercial applications. dash.js contains a configurable adaptation rules engine and full support for encrypted content playback using EME on a variety of browsers and operating systems.



Part 4 - Tools for Creating Premium Content: Content Creation with CommonEncryption

The ISO CommonEncryption standard specifies a single encryption mechanism (AES-128) and a limited selection of block modes for protected content. No matter the mechanism used to obtain decryption keys, a media engine that recognizes the CommonEncryption format and has access to the keys can decrypt the media samples contained within. In addition to the cipher algorithm and block mode, the CommonEncryption specifications indicate how DRM-specific data may be carried within the media to assist in the retrieval of decryption keys. CableLabs has developed open-source tools that can be used to create encrypted, MPEG-DASH content for several commercial DRM systems supported by EME browsers today.


Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.


Video Coding Platforms: Then, Now, and On the Horizon

Jun 5, 2014

Up until a few years ago, the encoder/transcoder market was dominated by systems based on ASIC (Application Specific Integrated Circuit) architectures that provided excellent performance with minimal power consumption.  These audio/video compression workhorses have allowed cable operators to provide more and more high-quality video to their customers in an increasingly bandwidth-constrained head end environment.  Fast-forward to today, and what you will see is a dizzying array of encoder vendors with solutions running on a wide variety of platforms.  This technological evolution provides both opportunity and a source of apprehension for decision makers trying to determine which vendor solution fits best for the current and future needs of their video delivery organizations.

ASIC encoder platforms are the tried-and-true staples of the encoder industry.  Video compression involves the execution of numerous repetitive mathematical calculations for each and every frame of the video.  Once  your compression algorithms have been proven and hardened, laying them down on silicon is the fastest and most energy-efficient way to get bits in one end and produce far fewer bits out the other.  But what happens when you come up with an improvement to your algorithm?  The path from design to production at scale can be on the order of 6 months or more.  And what about the opportunity to choose between one algorithm or another depending on the characteristics of a particular frame of video?  Space on an integrated circuit is high value real estate and it can often be cost-prohibitive to embed logic that will only be used part of the time during execution.  With established, and in the eyes of some, aging compression standards such as MPEG-2/H.262, there are fewer opportunities for algorithmic improvements.  Most of the innovations within the bounds of that specification have already been realized and the risk to embedding those solutions is minimal.  However, more recent compression standards such as AVC/H.264 and HEVC/H.265 are still providing compression engineers with ample room to innovate.

With this in mind, encoder vendors are looking towards solutions that give them a little more freedom to improve their solutions without requiring a release of a brand new hardware platform.  Programmable hardware such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units) are an increasingly popular choice in encoder solutions.  These chips allow you the performance benefits of running your algorithms in hardware with the added flexibility of being able to tweak and improve your solutions with a simple software upgrade.  It still requires some pretty specialized talent to be able to program these devices to their maximum potential, but the advent of modern hardware-specific languages, such as OpenCL, has opened the doors for quicker and more accessible innovations using these silicon-based platforms.


Only in the last year or so has the next evolution in encoder technology really been enabled.  Moore’s Law continues to hold true and recent advancements in general-purpose computing CPUs have made it possible for encoding solutions to be delivered in pure software.  Couple that with the continued integration of GPU functionality on-die and you have the ability to deploy a powerful video encoding solution running on the same hardware that will power the data-centers of tomorrow.

One caveat of moving to general-purpose hardware is related to power consumption.  Most likely, these platforms will not be able to compete with ASIC-based systems when it comes to energy usage, but the algorithmic flexibility of software-based encoders, coupled with the cost benefits of running on mass-produced hardware may outweigh the cost to power it.  Another aspect of this evolution is the incredible sophistication of open source encoding libraries such as x264.  With the innovative power of the community behind the development of these codecs, it is possible that operators might find some use for them in certain areas of their production environments.

Finally, one of the most recent trends in video compression solutions is not related to codec technology itself, but in the virtualization of encoder/transcoder platforms.  Virtualization is not new to the computing industry, but its introduction into video workflows is definitely a novel, if not expected, approach.  The idea of dynamically associating hardware resources to encoding processes according to demand, complexity, or other business needs enables new levels of flexibility in meeting the large-scale encoding needs of today’s cable operators, both large and small.  Couple that with the perks of being able to integrate these general purpose computing platforms into your existing, IT-managed datacenter operations and you get significant opportunity for cost savings in your backend video infrastructure deployments.


It is an exciting time in the encoder field, with drastic changes in deployment architectures sweeping through the industry.  We here at CableLabs continue to apply solid science in our evaluations of these platforms to help our members make the most informed decisions possible.  Projects such as our recent MPEG2 encoder evaluation and our upcoming AVC/H.264 encoder shoot-out apply stringent testing methodologies to identify the absolute best products available on the market today.

Greg Rutz is a Lead Architect at CableLabs working on several projects related to digital video encoding/transcoding and digital rights management for online video.