Watching streaming video is perhaps the most dominant activity on Internet in terms of bandwidth usage and popularity with average Internet users. Both live streaming and the streaming of the stored video contents require understanding of video formats, streaming servers, content distribution networks (CDNs) and streaming protocols.

This is a very confusing area even for those of us that have been working in this area for quite some time. Hopefully, the information below will help you clarify some of the basic concepts.

In streaming, a client media player can begin playing the data (such as a movie) before the entire file has been transmitted. Live streaming, which refers to content delivered live over the Internet, requires a form of source media (e.g. a video camera, an audio interface, screen capture software), an encoder to digitize the content, a media publisher, and a content distribution network (CDN) to distribute and deliver the content.

Selecting a streaming technology involves serious considerations such as understanding streaming protocols and formats as it has impact on the cost of the infrastructure and the cost of bandwidth.

Streaming Video Formats

When planning for streaming video, there are two things you need to understand: The video file format and the streaming method. In this article, I describe video file formats and in the next article, I will cover streaming protocols and methods.

A file format is the structure in which information is stored (encoded) in a computer file. When dealing with video, a large amount of data is required to store a video signal accurately, this information is often compressed and written into a container file.

There are many different formats for saving video, which can be confusing. Some formats are optimal for capture of video and some are best used for editing workflow.

And, then there are quite a few video formats that are used for delivery and distribution of video either on a CD or over the Internet.

Video files are significantly more complex that still image files. Unlike image files, video files include complex structure with a mix-match of audio, video, and other data components. You can use different combination of audio and video signals in a file. You can tell a lot about most still image files by the file extension (names like .png, .gif or .jpeg), but that does not hold for video. The file type (such as .MOV) is just a container, and could be filled with really low quality web video, or it might have ultra-high-quality 3-D video and five channels of theater-quality audio.

Just looking at the file extension, you know almost nothing more than that it is a video file.

The digital video world depends on hundreds of different codecs to make and play files. And just because codecs make files with the same encoding family – such as H.264 – does not mean that they can be played interchangeably.

Most people get confused between video container formats and video code formats. It is important to understand these differences. Once you understand some of the basic structural elements of video containers, you will be able to work with any video formats and file contents.

The anatomy of a video file

While there is a wide range of video formats, in general, each format has the following characteristics.

A container type: Container is what we typically associate with the file format. Containers “contain” the various components of a video: the stream of images, the sound, metadata and anything else. For example, you could have multiple soundtracks and subtitles included in a video file, if the container format allows it. Example of popular containers are OGG, Matroska, AVI, MPEG, Quick Time Mov.

The video and audio signal: This the actual video and audio data, which has characteristics described in the next section.

A Codec: Codec refers to the software that is used to encode and decode the video signal. Codecs are ways of “coding” and “decoding” streams. Their job is typically to compress data (and decompress it when playing it back) so that you can store and transmit files with a smaller filesize. There are many codecs available out there, each with their strengths, weaknesses and peculiarities, and choosing the right codec with the right settings for the right situation is close to be a form of art in itself.

Examples of popular codec both for audio and video, along with methods to encode metadata are listed below:

Video = H.264, H.263, VP6, Sorenson Spark, MPEG-4, MPEG-2, etc.

Audio = AAC, MP3, Vorbis, etc.

Metadata = XML, RDF, XMP, etc.

There are 2 types of codecs – lossy and losseless. These behave exactly as their name suggests. Lossy compression compresses data in such a way that it looses some of the original data but achieves much smaller file sizes. Lossless compression is just the opposite.

Most video compression formats are lossy as these codec retains ability to minimize the size of data while maintain the signal quality. For example, you can stream a 1080p resolution video at various levels of quality. Lower quality will be represented by degraded image/sound (color fading, blurring, lack of clarity etc), but with the same number of pixels (which is all 1080p defines). This is true even of over-the-air and cable broadcasts. Some video broadcasters may try to squeeze more data down a smaller channel by compressing the data, technically still meeting the 1080p requirements.

Understanding The characteristics of a video signal

Every video file has some attributes that describe what makes up the video signal. These characteristics include:

Frame size: This is the pixel dimension of the frame. The bigger frame means more bandwidth and better quality.

The Aspect Ratio: This is the ratio of width to height. 16:9 is the common HD standard today.

Frame rate: This is the speed at which the frames are captured and intended for playback. Common frame rates include: 15 fps (mostly used for screen capturing), 24 fps, 25 fps, 29.97 fps and 60 fps.

Bitrate: Bitrate is a measurement of the number of bits that are transmitted over a set length of time. Your overall bitrate is a combination of your video stream and audio stream in your file with the majority coming from your video stream.

File size = bitrate (Megabits per second) x duration

In general, the higher the bitrate, the better the quality and bigger the file size will be.

The bitrate for an HD Blu-ray video is typically in the range of 20 mbps, standard-definition DVD is usually 6 mbps, high-quality web video often runs at about 2 mbps, and video for phones is typically given in the kilobits (kbps). For example, these are the targets, we usually see for H.264 streaming:

LD 240p 3G Mobile @ H.264 baseline profile 350 kbps (3 MB/minute)
LD 360p 4G Mobile @ H.264 main profile 700 kbps (6 MB/minute)
SD 480p WiFi @ H.264 main profile 1200 kbps (10 MB/minute)
HD 720p @ H.264 high profile 2500 kbps (20 MB/minute)
HD 1080p @ H.264 high profile 5000 kbps (35 MB/minute)

The audio sample rate: This is how often the audio is sampled when converted from an analog source to a digital file. The above bit rates are for video only. Audio adds additional bandwidth on the order of 128-256kbps (assuming that you’re encoding as MP3 or AAC.)

The quality of video and bandwidth used by video depends on the characteristics of the video signals. So, you may have a file type both with a low quality video and high quality video.

Codec compatibility

The large number of codecs in use make video compatibility a very complicated arena. You can’t tell what codec is used by the file extension, and your system software may only give you partial information. Your video editing software may be able to tell you what codec was used to make the file, or you may need to get some specialized software.

H.264 codec

One of the most versatile codec families in use today is H.264. (H.264 is also called MPEG-4 Part 10 and AVC). It offers excellent compression with high quality, and it is extremely versatile. When H.264 is used with a high bit rate, it provides really excellent quality, as you will see if you play a Blu-ray disc. And it’s also useful when compression is the most important feature. It is the codec used by web streaming by services such as Vimeo.

In addition to a great quality/compression ratio, H.264 offers a tremendous amount of flexibility as to the video and audio signal characteristics. It can support 3-D video if you wish, as well as a number of different audio encoding schemes.

In one variation or other, H.264 is supported by many different devices and services. Among these are:

Blu-ray disc, Apple iPhones, High End Digital Cameras, Vimeo and YouTube web services

Decoding for H.264 video is now built in to the Intel’s Sandy Bridge i3/i5/i7 processors, which should enable even faster workflow with the format

Container formats that support H.264

One reason that H.264 is so popular is that it works with so many container types. Among the types that support H.264 encoding are: MOV,AVC, AVCHD, MPEG, Divx, Adobe Flash FLV

The Next Generation Video Codecs: Looking Beyond 1080p

Increasing demands for high quality 1080p HD and ultra-HD resolution contents such as 4K video has necessitated development of new generation of codecs. They need to provide higher quality video at lower bit rates. There is a limit to size that we can reasonable stream or store on today’s handheld devices.

The two major players in this space are High Efficiency Video Codec (HEVC, otherwise known as H.265) and VP9 respectively.

HEVC — short for High Efficiency Video Coding and also called H.265 — is developed by a consortium of companies called the Moving Picture Experts Group (MPEG). VP9 is a codec developed by Google and used in encoding Youtube videos. On the other hand, Apple has included the support of HEVC in iPhone 6 devices. Apple has not announced any plan to support VP9, despite VP9 being made available by Google as open source and royalty free to all.

VP9 and HEVC both are excellent codecs and in performance are deemed to be very comparable. Both formats are designed to make today’s video more efficient, but the bigger aim is to enable higher-resolution 4K video, aka Ultra HD or UHD. Today’s mainstream high-definition 1080p video has a resolution of 1,920×1,080, but 4K video has a resolution of 3,840×2,160. Because 4k videos nearly quadruples the number of pixels, a more efficient codec is essential to be able to stream these videos or store them on the devices.

Only time will tell which codec most manufacturers will accept. Google has money to push whatever they want these days specially given their hold on Youtube and Android. But for now, HEVC (aka H.265) has a lead since it has been developed by a standards group supported by many leading electronics companies.

I personally feel HEVC is a better option to ensure compatibility and support with most play devices and technologies.

Final Words

I hope that above information will help you in planning your video streaming applications.

In the next article, I plan to cover the streaming technologies and protocols.

Brijesh Kumar, Ph.D., Chief Technology and Products Officer

Musings on Technology, Products and Strategy

Making Sense of Video Streaming Formats