top | item 46973958

(no title)

lifis | 19 days ago

Seems the classic legacy overengineered thing that costs 100x production costs because it's a niche system, is 10x more complex than needed for to unnecessary perfectionism and uses 10-100x more people than needed due to employment inerta.

A more reasonable thing is to just use high quality cameras, connect to the venue fiber Internet connection, use normal networked transport like H.265 with MPEG-TS over RTP (sports fans certainly don't care about recompression quality loss...), do time sync by having A/V sync and good clocks on each device and aligning based on audio loud enough to be recorded by all devices, then mix, reencode and distribute on normal GPU-equipped datacenter servers using GPU acceleration

discuss

pjc50|19 days ago

The sort of systems which demand 100% reliability tend to be like that. "Disruption" in the middle of live sports broadcast is unpopular with customers.

TD-Linux|19 days ago

While I think you are oversimplifying the timing issue, you are not the first to think that about 2110.

https://stop2110.org/

geerlingguy|19 days ago

The engineer on the truck seemed to have the most annoyance with the PTP aspect of 2110, but it seemed nobody questioned the move to 2110, and at least as far as broadcast equipment goes, they're all in on 2110. As a small(ish) YouTuber, NDI is more exciting to me, but I'm not mixing dozens or hundreds of sources for a real time production, and can just re-record if I get a sync issue over the network.

Perfect is the enemy of the good, as always—reading through that site, it seems like no solution is perfect, and the main tradeoff from that authors perspective is bandwidth requirements for UHD.

It looks like most places are only hitting 1080p still, however. And the truck I was looking at could do 1080, but runs the NHL games at 720p.

jacquesm|19 days ago

Sounds like you've got it made then: produce the equivalent that fits in a minivan and laugh all the way to the bank.

rezonant|19 days ago

We're going to need a lot of popcorn to keep us eating as we wait

_kb|18 days ago

That's certainly true to an extent. Other commenters have already highlighted necessary complexities. There is absolutely a lot of very entrenched "ways-of-working" that add unnecessary complexity, as with every domain. Not everything is a technical problem though and the social / process side of this sort of setup is what can make it work at all.

The approach that you're hinting mostly describes the general direction of remote production (https://video.matrox.com/en/media/guides-articles/what-is-re...). The big traditional players are already across that (https://www.grassvalley.com/ampp/, https://www.rossvideo.com/use-cases/remote-production/), AWS also has a plethora of services to lock you into their stack (https://aws.amazon.com/media-services/), and there's interesting new players too (https://www.tryiris.ai). There's a heap of different workflows out there, and OB trucks like the one highlighted here are just one of those.

unknown|19 days ago

[deleted]

amluto|19 days ago

> do time sync by having A/V sync and good clocks on each device and aligning based on audio loud enough to be recorded by all devices

Why do you need good clocks? For audio, even with simultaneously playing speakers, you only need to synchronize within a couple of ms unless you need coherence or are a serious audiophile. If if want to maintain sync for an hour I suppose you need decently good clock.

But as long as you have any sort of wire, basically any protocol can synchronize well enough. Although synchronizing based on visual and audible sources is certainly an interesting idea. (Audio only is a completely nonstarter for a sporting event: the speed of sound is low and the venues are large. You could easily miss by hundreds of ms.)

> then mix, reencode and distribute on normal GPU-equipped datacenter servers using GPU acceleration

Really? Even ignoring latency, we’re talking quite a few Gbps sustained. A hiccup would suck, and if you’re not careful, you could easily spend multiple millions of dollars per day in egress and data handling fees if you use a big cloud. Just use a handful of on-site commodity machines.

pjc50|19 days ago

Frame sync. In order to reduce latency, these systems tend to be unbuffered, which means that the frames have to arrive at a very specific time, and you can't afford significant jitter or (worse) phase drift. If you have one source at 25.000FPS and one at 25.001FPS eventually you're going to be a frame out between them.

rezonant|19 days ago

> Why do you need good clocks? For audio, even with simultaneously playing speakers, you only need to synchronize within a couple of ms unless you need coherence or are a serious audiophile. If if want to maintain sync for an hour I suppose you need decently good clock.

There are many microphones involved in a production, and humans are quite good at detecting desync between audio/video when watching a presenter talk. You cannot fix desynchronization further down the chain if the desynchronization is variable for each source.

lukeh|18 days ago

You also need synchronization to mix sources (common in any production) without incurring the latency and resampling of asynchronous sample rate conversion.

rezonant|19 days ago

As someone who's spent a lot of time in this space and is quite interested in lowering the cost of entry and finding ways to simplify it, I'm afraid you've vastly oversimplified the problem.

> sports fans certainly don't care about recompression quality loss...

I think that's quite an assumption. In a modern video chain youd need to decompress and recompress the video from a camera many many times on the way to distribution. Every filter or combining element would need to have onboard decoding and encoding which would introduce significant latency, would be very difficult to maintain quality, and would introduce even more energy requirements than the systems we already deploy.

High quality cameras aren't any good if they throw away their quality at the source before they have an opportunity to be mixed in with the rest of the contribution elements. You certainly wouldn't compress the camera feeds down to what you'd expect to see on a consumer video feed (about 20Mbps for 4K on HEVC).

> normal networked transport like H.265 with MPEG-TS over RTP

If you want to, you can do that already using SMPTE ST 2110-22 which loops in the RTP payload standards defined by the IETF. ST 2110 itself is already using RTP as its core protocol by the way (for everything).

> do time sync by having A/V sync

What do you mean by this? In order to synchronize multiple elements you need a common source of time. Having "good clocks" on each device is not enough: they need to be synchronized to the level that audio matches up correctly, which is much more precise than video as audio uses sample frequencies in the 48Khz-96Khz range, whereas video of course is typically just 60Hz. Each clock needs a way to _become_ good by aligning themselves to some global standard. If you don't have a master clock like PTP, your options are... what... GPS? I mean you _could_ equip each device with its own GPS transponders, but if the cameras cant get a reliable GPS lock then you're out of luck.

> aligning based on audio loud enough to be recorded by all devices

Do you mean physically? Like actual audio being emitted into the space where the devices are? Because some of the devices will be in the stadium where theres very very loud noises on account of the crowd, and some of them will be in the backroom where that audio is not audible. Then you need to factor in the speed of sound, which is absolutely significant in a stadium or other large venue. None of this is particularly practical.

If you mean an audio sound that is sent to each device over a cable, well are we talking SDI (copper)? If so, we wouldnt use audio for that, we would use what's called Black Burst. But what generates the black burst? Typically, its the grandmaster clock. The black bursts on SDI need to be very precise, and that requires a dedicated piece of real time hardware.

If you mean sending it over ethernet, you now need to ensure you factor in the routing delays that will inevitably happen over an open unplanned network. To deal with those delays, we typically do two things. One, we use automatically planned networks, where the routers are aware of the media flows going over each link, and the topology is automatically rearranged in order to minimize or eliminate router buffering (aka software defined networks, typically using NMOS IS standards to handle discovering and accounting for the media essences).

Sesse__|19 days ago

> they need to be synchronized to the level that audio matches up correctly, which is much more precise than video as audio uses sample frequencies in the 48Khz-96Khz range, whereas video of course is typically just 60Hz

Typically video equipment expects the individual pixels to line up, save for some buffering (~1–10µs), not just the individual frame. So your synchronization requirement for video is in the gigahertz range (or about megahertz, if you take the buffering into account), not 60 Hz. (Of course, what matters is normally the absolute offset, not the frequency, but they tend to be somewhat inversely related.)