Our low-latency encoders and decoders have native support for ST-2110. But how does it actually work? Unfortunately most articles online go way into technical jargon and can’t see the forest for the trees. So let’s try and simplify this:
You’re at a dinner party with friends and family and someone asks you what you do. You talk about ST 2110 and get blank stares. How can you explain it in simple terms – so that even a six-year-old can understand what you’re saying?
Let’s take a step back from the technical details and have a look at the big picture. For your kids’ sake, but also for ours. It deepens our understanding, or at least makes us look at it in a slightly different way, when we step back and simplify.
The traditional way: Sending video using SDI
Live video – such as sports or news – needs to be sent around a building to the difference place it’s needed. This transportation generally uses Serial Digital Interface (SDI) cables. These are old-fashioned cables that transmit signals of what once upon a time was considered a huge amount of data – gigabits per second. Nowadays a mobile phone can easily process that much data. One cable sends the pictures and sound for one TV signal. But sometimes, you want to show different angles of the same event – so you have to use multiple cables. This requires expensive equipment that is specific to the TV industry – and you end up having a huge mess of cabling.
A better way: Sending video using the Internet Protocol
For today’s standards, a few gigabits per second is not so much. The Internet is used to dealing with much larger quantities of data, and is growing very quickly, with applications such as YouTube, Netflix, social media, big data, and AI. So the logical step would be to use Internet technology (not the Internet itself, which is a public network, but the same technology) to send pictures and sound around a facility. This is known as “Internet Protocol” or “IP” technology.
How does that work? You take a TV signal – which includes both pictures and sound – and chop it up into thousands of little pieces, known as packets, that you then send over a wired network.
This offers two huge advantages. The first one: a single wire can take hundreds of TV signals instead of just one. The second one: you get to use equipment that is widely available, because it’s from a much bigger industry (indeed, some AI data centres are bigger than football fields!). Because of its wide availability, the equipment is also much cheaper.
The challenge: clock drift
But there’s a catch, and it’s known as “clock drift.”
What’s that?
Well, if you set the clock on your microwave and the clock on your oven at exactly the same time, after a few days there will be a slight difference – may be just a few seconds. That’s known as “clock drift” and it happens for two reasons. One is that the clock inside each device is slightly different. And the other is because it can be affected by external factors such as heat or altitude. The challenge of ST-2110 is to make sure that everyone uses the exact same time, so that you can connect packets of video with the corresponding packets of sound, and string those packets into the correct sequence.
Let’s look at this a bit more closely. TV pictures are a series of still images that are put together in sequence, so that it looks like one moving image. The same goes for TV sounds: they are a series of separate sounds put together in sequence. But when the receiver gets all these packets of images and sounds, how do they know in what order they should be stringed? And how do they know which image should go with which sound? If you get it wrong, the viewer might end up seeing someone kick a ball but only hearing that kick a couple of seconds later.
To get the sequence right, you use the information from a clock on the camera or microphone that generates the image or sound.
But things get more complicated if you have two (or more) sequences – the same event filmed from different angles. When you switch from one angle to another, you need to cut at a specific time and match that time in the new source of images (and sounds). This means that the time indicated on one series of images (and sounds) must exactly match that indicated on the other series.
How we measure time
To complicate matters further, the actual time that we all refer to is not measured as smoothly as many of us think.
You know that we have leap years to correct for the fact that the Earth does not take exactly 365 days to go around the sun. But a leap year adds one day every four years, which is just a little bit too much. It would be perfect if the Earth took 365.25 days to go around the sun. Instead, it takes 365.2422 days. This means that we need further correction now and then, by eliminating a second here and there. And that’s exactly what we do: we have leap seconds to adjust time. But when this happens, there is a glitch on all the TV receivers.
How ST-2110 measures time
To avoid the glitches caused by leap seconds, ST-2110 uses GPS signals as its source of time. This has no leap seconds and is very high quality.
How does it work?
It does not depend on the Earth’s rotation. Instead, it pretends that the world started on January 1st, 1970, and that since then, video has been produced and sent at regular intervals – like the regular beat of a piece of music. When your video packet is created, it must wait for the next “beat” to be sent. It is “time-stamped” with that beat (which represents the number of “beats” since January 1st, 1970) so that the receiver, which reads the time-stamp of each packet, knows:
- How to align the various packets, and
- How to match the image with the corresponding sound (the image and sound that have the same “time stamp”).
This system is known as “Precision Time Protocol”, or simply “PTP” for short. In contrast to SDI, where the time has to travel via yet another (even older) cable, the PTP signal travels over the same cable. This timing system is complicated and one of the reasons a device like a mobile phone can’t process this data.
One benefit of this system is that the receiver can choose whether it needs to receive only sound packets (if it’s only dealing with sound), or both sound and image packets, reducing the complexity of a device.
So there you have it. ST-2110 is all about chopping up images and sound, like millions of pieces in a puzzle, and then putting them together again in the correct sequence. And to do that, each piece of the puzzle needs to have a time stamp on it – which does not correspond to our normal way of measuring time.
The rest of ST-2110 is just an implementation detail.
Learn more about how we provide the densest encoding and decoding platform for ST-2110: https://www.obe.tv/portfolio/encoding-and-decoding/