Super User

Super User

 

Website URL: http://www.obe.tv/

By James Darnley

Introduction

As in previous blog posts, we work extensively on using high SIMD instructions on Intel CPUs to speed up video processing in open source libraries such as FFmpeg and Upipe.

Recently we have been considering using Intel’s new instruction set AVX-512 and its wider vector registers, 512-bit 64-byte ZMM registers, to see if we can eke more speed out of the code anywhere.  While we were gearing up to to test this, incorporating a very new assembler and an update to the x264asm compat layer, Cloudflare published its own findings on using these features in On the dangers of Intel's frequency scaling.

Briefly put they showed that only using a little bit of code that uses ZMM registers can slow everything else down. The processor will reduce its operating frequency when it hits a ZMM instruction to reduce power consumption and heat output.

Because of that we decided to not try testing any ZMM registers.  Like Cloudflare we don’t spend enough of time in assembly functions to be able to take the CPU clock speed hit.  However the new instructions and EVEX prefix are available for narrower XMM and YMM registers and increases these to 32 registers.  Specifically this requires the AVX-512 Vector Length Extensions (VL) feature which the Skylake-X and new Xeon processors have.  If you can make use of the new features they may provide you with some speed gains.

Where to Start

Where would one begin?  There are so many new features that it can be hard to know.  There are op-masks, op-mask registers, blends, compares, permutes, conversions, scatters, and more.

I will start by covering a couple of instructions I have emulated in the past: maximum and minimum of packed signed quadwords; arithmetic right shift of packed quadwords; convert quadwords to doublewords.  These now exist as single instructions.  AVX-512 has added or extended many functions for quadwords, see Intel's Optimization Reference Manual (pdf) section 15.13.

Arithmetic shift right of quadwords could be emulated with a pregenerated sign-extend mask and pxor; pcmpgtq; pand; psrlq; por and a spare register.  5 instructions only 1 of which could be done in parallel with the others, plus however many are needed to create the mask.  For the function I needed this the shift was constant for the duration of the function so it was a once-only cost to create the mask.  The five instructions could have a latency of 7 cycles whereas vpsraq is 1, 4, or 8 cycles, depending on the precise form used, according to Intel’s own documents about latency (pdf).

Maximum and minimum of packed signed quadwords can be emulated with pcmpgtq; pand; pandn; por and a spare register.  4 instructions, 5 if a memory operand is needed for the minimum, none can be done in parallel.  The four instructions to emulate could have a 6 cycle latency whereas vpmaxsq is 3 cycles or 10 with a memory operand.

Convert quadwords to doublewords: it now exists.  AVX-512 adds many down convert instructions for doublewords and quadwords with truncation, signed and unsigned saturation.  These are a bit like the reverse operation of the pmovsx and pmovzx instructions, move with sign or zero extend from SSE 4.1.  The min/max mentioned above was to work around this particular limitation.  I needed to pack and saturate the quadwords so I was clipping with min/max and then shuffling or blending values back together.

It would need a rewrite of the function to make good use of the new features because the rather ugly logic is partly a result of the limitations of older instruction sets.  It would also need a rewrite because the older blend instructions do not have an EVEX encoded form so cannot use the new 16 registers.  Because the x264asm compat layer, which Upipe and FFmpeg use, prefers the new registers AVX-512 isn't a simple drop-in replacement for this.

Op-masks

Which brings me onto op-masks.  Op-masks are a feature that could see a great deal of use in code which has run-time branching, conditionals, or masking.  Blends can now done with op-masks.

The EVEX encoding means instructions now have a form like this vpaddw m1 {k1}, m2, m3 in which k1 is the op-mask.  k1 is one of eight dedicated op-mask registers.  They are manipulated using dedicated instructions, see the instruction set reference of Intel’s Software Development Manuals, the instructions begin with a 'K'.  They can also be set using the result of the various compare instructions.  In this example each word in m1 will only be changed to the result of m2+m3 if the corresponding bit in k1 is set otherwise it is left unchanged.  The lowest word will check bit 0 up to the highest word which will check bit 15.

It is similar for a move, which you can turn into a blend with an op-mask.  New move instructions have been added vmovdqu8; vmovdq16; vmovdq32; vmovdq64.  With movdqu16 m1 {k1}, m2 each word value in the destination will only be changed to the source value if the corresponding bit is set.  Either the destination or the source could also be a memory location, like with the older moves.  This is a conditional move of packed values.

Another feature of these op-masks is the zeroing bit of the EVEX encoding.  In the form vpaddw m1 {k1}{z}, m2, m3 the instruction will will change m1 to be m2+m3 where the corresponding bit is in k1.  However when the bit is not set then the corresponding word value will be set to zero.  This benefits by not depending on the values in m1 before the instruction.  If you can use the zero values then it will be useful in that fashion too.

These op-masks are probably the biggest reason to rewrite functions because of the conditionals they let you use.  With the op-mask registers freeing vector registers from holding masks and with the new instructions freeing more registers that may have been used in emulation and with the added 16 registers there are now more registers than I know what to do with.  Most of the functions I've worked on were not short on registers, at least on x86-64.  I could store more constants in them rather than loading from memory but that only gets you a small speedup in most cases.

Summary

For those looking for a summary or a TL;DR of what they should look at in their own code I think you should focus these areas:

  • Any function that stores intermediate data into memory because of register pressure.
  • Any function with conditionals, any function with a compare instruction.
  • Any function that uses quadwords, uint64_t, or int64_t data types.

 

 

2017 has certainly been a packed year, with a number of interesting projects, including an end-to-end IP, end-to-end IT broadcast, and a slew of events.

For quite some time, we have been championing the merits and feasibility of IP for contribution. Whilst many in the industry are still hanging back, unsure it is quite there yet, we have already proven that it is. This year, things started to take off in a big way, with many others embracing the change. This was obvious walking around the halls at IBC, but in particular, this was noticeable from the Broadcast Tech IP Summit in October. Whilst there was a room full of traditional broadcasters and a focus on whether IP is feasible, it was obvious that the rhetoric has shifted somewhat over recent months, even in that traditional broadcast space. Most broadcasters are looking to go IP even if they are not quite there yet.

From our point of view, we have worked on quite a few interesting projects. This includes working with MSTV Live Broadcasting, which provides live coverage of a wide range of sports events across Europe. Uniquely, this small company is providing these feeds using a satellite service designed for newsgathering and an IP connection. Using a combination of low cost, high quality tools, MSTV is able to ensure a compelling experience for its entire, growing, viewer base.

2017 also saw the completion of a long-term project we have been working on for Sky in the UK. We have, of course, been working with Sky for some time and it has been contributing a number of feeds over IP and using our encoders. However, this project went beyond all of that and saw the creation of an all IP, all IT master control facility. It really is using entirely off-the-shelf standard IT hardware. What is more, Sky is now using it on a daily basis to contribute live news coverage. As you can imagine, this was a massive project and involved a number of vendors. We already know that IP brings scalability, flexibility, and massive cost efficiencies, especially if you use standard off-the-shelf equipment. Above all, this project proves that IP is entirely possible and reliable.

All that said, we are not quite there yet, but I expect things to develop further as we move into 2018. These early examples will pave the way and I believe we will see many more developments throughout 2018. This will include more broadcasters switching to IP for contribution feeds and I think we will likely see more announcements of plans to build all IP facilities. I also believe we will start to see more and more IP OB trucks being used in the field. Current IP OB trucks have some legacy kit inside too. I believe over the coming months more of that will be replaced with IP technology.

We will in particular see some interesting innovations in IP for remote production. This is where IP connectivity is good enough to backhaul all traffic over an IP network to a facility, which naturally saves a great deal on travel expenses. Following on from that, we will begin to see the emergence of interactive personalised live events. Currently broadcasters sending feeds over the web simply show the same thing that is on TV, but only with 60 second delay, because it is on the web. However, IP opens up the opportunity to create a much more immersive experience as broadcasters now have more than the single world feed. We will start to see these versions become much more interesting and interactive, giving consumers a real reason for watching the web versions vs on the TV.

So, whilst 2017 was the year of (almost) IP with a few broadcasters dipping their toes in the water, I expect 2018 may turn out to be the year we actually begin to see a shift, in mindsets and workflows.  

 

London, 18th December 2017 – Open Broadcast Systems will be demonstrating its latest encoding and decoding solutions for IP contribution at CABSAT from 14th to 16th January 2018, the company’s first time exhibiting in the region.

The company’s products allow for cost-effective encoding and decoding of video for broadcast contribution. All its solutions are software-based, running as apps on standard IT hardware. This means it can deliver a high level of cost efficiencies to its customers, as well as being able to build bespoke solutions in very short timeframes.

At CABSAT, Open Broadcast Systems will demonstrate the highest density Integrated Receiver Decoder (IRD) currently on the market. It allows for 16 channels to be simultaneously decoded on a 1U chassis - potentially replacing half a rack of equipment with a single server.

Kieran Kunhya, Managing Director, Open Broadcast Systems, said: “IP is not a future wish, it is already delivering vast amounts of video content every day. Not only does switching to IP deliver cost efficiencies, but it also reduces the complexity of broadcast workflows."

Open Broadcast Systems will be exhibiting on stand ZB6-C31 from 14th – 16th January 2018.

About Open Broadcast Systems
Open Broadcast Systems is revolutionising the provision of advanced broadcast technology, moving the industry towards a flexible, cost efficient, software-driven future. Its cutting-edge and end-to-end encoding and decoding software is accelerating the delivery of premium content over IP, improving quality at the same time as reducing costs. High quality solutions developed by Open Broadcast Systems deliver services to millions of people every day, including many major sporting and breaking news events.

Its products adapt to the pressures and challenges of the modern broadcast environment, agile solutions can be developed and installed in extremely short timeframes, without compromising on quality.

For more information, please visit http://www.obe.tv

Media Contact:
Helen Weedon
Radical Moves PR
Tel: +44 1570 434632
Mob: +44 7733 231922
This email address is being protected from spambots. You need JavaScript enabled to view it.

IBC, Amsterdam, 21st August 2017Open Broadcast Systems, the leader in software-based broadcast technology, will be demonstrating its latest solutions for IP contribution at IBC 2017, at stand 7.J38u.

This includes its high density Integrated Receiver Decoder (IRD) which allows for 16 channels to be simultaneously decoded on a 1U chassis - potentially replacing half a rack of equipment with a single server. It is the highest density IRD currently on the market and is completely software based, running as apps on standard IT hardware. In addition to traditional SDI, the decoder now supports decoding to uncompressed IP outputs using a new 25GbE interface. This is the only encoder/decoder pair to support 25GbE, allowing for ultra-high density deployments to handle the content explosion.

Additionally, the company will be demonstrating its new Remote Production solution.  This will allow broadcasters to reduce the complexity of off site production by sending multiple video paths back to base over a variety of networks, managed or unmanaged.

The company is also showing ultra-low latency VC-2 compression, further demonstrating that software solutions can deliver latency comparable with legacy broadcast hardware.

‘We are pioneering the evolution of broadcast. We are developing IP solutions which mean anyone can deploy content quickly and in a cost efficient manner. Open Broadcast Systems is passionate about being the driving force of innovation in the industry, and we look forward to demonstrating this at IBC.’ Commented Kieran Kunhya, Managing Director, Open Broadcast Systems.

Open Broadcast Systems will be exhibiting on booth 7.J38u at IBC, from the 15th to 19th September.

Open Broadcast Systems has also been shortlisted in the CSI Awards in the Best Cable or Satellite IP Solution. The Awards are being presented on the 15th September.

 

About Open Broadcast Systems

Open Broadcast Systems is revolutionising the provision of advanced broadcast technology, moving the industry towards a flexible, cost efficient, software-driven future. Its cutting-edge and end-to-end encoding and decoding software is accelerating the delivery of premium content over IP, improving quality at the same time as reducing costs. High quality solutions developed by Open Broadcast Systems deliver services to millions of people every day, including many major sporting and breaking news events.

Its products adapt to the pressures and challenges of the modern broadcast environment, agile solutions can be developed and installed in extremely short timeframes, without compromising on quality.

For more information, please visit http://www.obe.tv.  

Media Contact:

Helen Weedon
Radical Moves PR
Tel: +44 1570 434632
This email address is being protected from spambots. You need JavaScript enabled to view it.   

 

 

Warning: like Part One of this series, these posts are very technical!

After converting the old MMX simple IDCT of FFmpeg from inline assembly to external (as described in Part One) I was to look at making the IDCT faster. A naive approach is to convert from directly using the mm registers to using xmm registers. This can usually be done with minimal changes just paying attention to packs, unpacks, and moves. This can make things faster on Skylake and related microarches from Intel. A discussion of why is beyond the scope of this post. The point is that you can measure that functions are faster if they use xmm registers.

A lot of what we do with our C-100 and C-200 decoders involves decoding MPEG-2 video which is still prevalent in spite of being over 25 years old. We're often also using professional profiles like 4:2:2 which hardware decoders on CPUs and GPUs can't cope with. Looking at the decode process in perf shows a slow IDCT written in MMX (yes, really!), so we sent one of engineers, James to make it faster and modernise the code. Here’s how he did it in his own words (be warned, it is about to get very technical!):

London, 8th June 2017 – Open Broadcast Systems, an advanced broadcast technology visionary, has won the Rising Star Award at the TVB Awards 2017. The awards were held at the Millennium Mayfair Hotel, London on the 7th June.

The Rising Star accolade is awarded to any organisation or individual making a tangible difference within the industry and displaying standout values worthy of recognition. Recipients of the award must embody at least one of the following descriptions: A game-changer, disruptive force for good, problem solver, a bright young thing or an excellent idea/welcome addition to the broadcast fold. 

Open Broadcast Systems’ encoders and decoders have enabled the successful contribution and production of a number of broadcasters over IP. Recent projects include the delivery of live television broadcasts over the public internet for BBC Scotland and also other broadcasters. All products from Open Broadcast Systems run on standard ‘off-the-shelf’ IT hardware, something which is almost unheard of within the broadcast industry, yet drastically reduces costs and time-to-market, without reducing quality.

“It’s a huge honour to have been presented with the Rising Star award and acknowledged as a disruptive force within the industry, enabling our customers to deliver broadcast quality using IT, commented Kieran Kunhya, Founder and Managing Director, Open Broadcast Systems. “The biggest barrier for our kind of solutions is the industry mindset and we look forward to continuing to change that.”

London, 2017 – Open Broadcast Systems today announced that its solutions have delivered more than fifty horse races over IP.

Globecast works with a major horse racing operator to directly transmit race feeds to Globecast’s MCR in Paris, after which it is then distributed to consumers.

Due to an increase in demand from racecourses situated outside of France, Globecast worked to identify a solution which did not involve the need to manually install dedicated broadcast infrastructure.

Open Broadcast Systems’ French Partner Ekla Ingenierie was chosen by Globecast to provide over the internet solutions to transmit and distribute race feeds from numerous locations, both cost-effectively and swiftly. A real-time web monitoring platform was also by developed Ekla Ingenierie for advanced IP network MPEG TS monitoring.

“Since November of last year, three racecourses in Spain, the Netherlands and Austria have transmitted more than fifty races over IP completely error free, solidifying our belief that IP was the perfect solution to our problem”, commented Patrick Lorent, Ad hoc project manager, Globecast. “

Open Broadcast Systems’ French Partner Ekla Ingenierie’s cost-effective solutions are enabling Globecast to continue to grow and provide good quality feeds to its customers through new contribution broadcast technology

“These races attract over 3.3 million television viewers every month, so we’re thrilled to be proving that IP is a workable solution even for premium broadcasts”, explained Kieran Kunhya, Founder, Open Broadcast Systems.  

 

About Open Broadcast Systems
Open Broadcast Systems is revolutionising the provision of advanced broadcast technology, moving the industry towards a flexible, cost efficient, software-driven future. Its cutting-edge and end-to-end encoding and decoding software is accelerating the delivery of premium content over IP, improving quality at the same time as reducing costs. High quality solutions developed by Open Broadcast Systems deliver services to millions of people every day, including many major sporting and breaking news events.

Its products adapt to the pressures and challenges of the modern broadcast environment, agile solutions can be developed and installed in extremely short timeframes, without compromising on quality.

For more information, please visit http://www.obe.tv 

Media Contact:
Helen Weedon
Radical Moves PR
Tel: +44 1570 434632
Mob: +44 7733 231922
This email address is being protected from spambots. You need JavaScript enabled to view it.

About Globecast

Part of the Orange Group, Globecast provides agile and seamless content acquisition, management and distribution services globally. The company constantly innovates in an evolving IP-centric environment to provide reliable and secure customer solutions.  Globecast has created the number one global hybrid fiber and satellite network for video contribution and distribution. This network enables multiplatform delivery including TV Everywhere OTT, Satellite, cable, Video on demand, CDN delivery as well as cloud-enabled media services. The company remains the trusted partner for coverage and international delivery of news, sports, and special events around the globe. Customers enjoy a seamless global experience on the ground from 12 interconnected Globecast owned facilities, including Los Angeles, London, Singapore, Paris, Rome, and Johannesburg. www.globecast.com 

Globecast Press contact:

Bazeli Mbo
This email address is being protected from spambots. You need JavaScript enabled to view it.
Tel: +33 1 5595 2604
http://globecast.com 

 

The advent of low-cost, long-range wireless links such as those from Ubiquiti Networks have revolutionalised internet connectivity for many, especially in remote areas. This case study will look at a 37 kilometre (23 mile) high data rate wireless link using the OBE C-100 Encoder and Decoder platform to deliver broadcast quality bidirectional video over IP.

Week’s 7 and 8 have been merged (again) owing to various trips abroad. One highlight was being able to visit the Mobile TV Group UHD/HDR truck. This truck was doing Basketball for FOX Sports and we learnt how they are working on testing HDR for various broadcasters. We also show how UHD streams were managed and the challenges with cable overload and managing 2SI (sample interleave) vs quadrants.

 

In Week 8 we started testing SMPTE 2022-6 with our colleagues @skynewstech:

 

We’re learning a lot about how to deploy software 2022-6 streams ourselves in a multivendor environment. More on this at our NAB BEITC speech "Don’t Just Go IP, Go IT".

Page 1 of 5