top | item 20378660

Apple’s Low Latency HLS differs from the community-developed solution

178 points| UkiahSmith | 6 years ago |mux.com | reply

106 comments

[+] jhallenworld|6 years ago|reply

I've been recently tasked to find a live video solution for an industrial device. In my case, I want to display video from a camera to a local LCD and simultaneously allow it to be live streamed over the web. By web, I mean that the most likely location of the client is on the same LAN, but this is not guaranteed. I figured this has to be a completely solved problem by now.

Anyway, so I've tried many of the recent protocols. I was really hoping that HLS would work, because it's so simple. For example, I can use the gstreamer "hlssink" to generate the files, and basically deliver video with a one-line shell script and any webserver. But the 7 second best case latency is unacceptable. I really want 1 second or better.

I looked at MPEG-DASH: it seems equivalent to HLS. Why would I use it when all of the MPEG-DASH examples fall back on HLS?

I looked at WebRTC, but I'm too nervous the build a product around the few sample client/server code bases I can find on github. They are not fully baked, and then I'm really depending on a non-standard solution.

I looked a Flash: but of course it's not desirable to use it these days.

So the solution that works for me happens to be the oldest: Motion JPEG, where I have to give up on using a good video compression (MPEG). I get below 1 second latency, and no coding (use ffmpeg + ffserver). Luckily Internet Explorer is dead enough that I don't have to worry about its non-support of it. It works everywhere else, including Microsoft-Edge. MJPEG is not great in that the latency can be higher if the client can't keep up. I think WebRTC is likely better here.

Conclusion: here we are in 2019 and the best low latency video delivery protocol is from the mid-90s. It's nuts. I'm open to suggestions in case I've missed anything.

[+] Arqu|6 years ago|reply

A fairly long time ago (3-4) years I was tasked to do something fairly similar (though running on Android as the end client). HLS was one of the better options but came at the same costs you describe here. However it was fairly easy to reduce the block size to be less to favor response vs resilience. Essentially you trade buffer size and bitrate switching quality for more precise scrolling through the video and faster start times.

I had to hack it quite severely to get fast load with fair resilience for my usecase as the devices are restricted in performance and can have fairly low bandwidth. Since you're looking at a relatively fast connection, simply reducing the chunk size should get you to the target.

As a follow up - I've spent a couple years working on a video product based on WebRTC. This either works for a PoC where you just hack things together or on a large scale where you have time and resources to fight odd bugs and work through a spectrum of logistical hoops in setting it up. So unless you plan to have a large-ish deployment with people taking care of it I would stick to HLS or other simpler protocols.

[+] rapsey|6 years ago|reply

> I looked a Flash: but of course it's not desirable to use it these days.

RTMP protocol has a lot of implementations and is still widely used for the backend part of transmitting video at a low latency (i.e. from the recorder to the server).

RTSP with or without interleaved stream is another option.

DASH/HLS is a solution for worldwide CDN delivery and browser based rendering. Poorly suited for low latency.

If you need low latency and browser based rendering you need something custom.

[+] sansnomme|6 years ago|reply

You can also consider tunneling over WebSockets. It's a lot easier than WebRTC especially you don't need the handshaking nonsense which often require self hosting STUN and TURN servers if you don't want to rely on third parties. IIRC the performance of WebSockets is good enough for companies like Xoom.

https://webrtchacks.com/zoom-avoids-using-webrtc/

Some VNC services like noVNC and Xpra also uses WebSockets.

[+] mmis1000|6 years ago|reply

You should probably try mixer. They rolled a low latency protocol by their own. It use websocket as a bilateral channel to allow the server push whatever it want to client directly. Achieving sub second delay (The model here looks more like webrtc instead of hls though)

[+] squeaky-clean|6 years ago|reply

I have no idea what the underlying tech is, but Steam Link can do extremely low latency on the same network and very low latency over the internet. It can also stream non-game applications, though I imagine automating steam is a nightmare.

[+] NietTim|6 years ago|reply

Me and my friends have our own little streaming website and manage to get 1~2 seconds delay.... It's nothing fancy, NGINX with the RTMP plugin from which we get the streams, it only passes them trough, once we added encoding we had a noticeable delay. This is flash tech that can be run as html 5 now, but I didn't see this within your list so perhaps you haven't looked at it.

[+] nine_k|6 years ago|reply

I wonder why serving an endless http stream of something like h264 is not an option? Can't ffmpeg produce it in real time at your resolution?

[+] iosjunkie|6 years ago|reply

Check out NDI. Near real-time local video streaming. It’s inexpensive/no cost if you are doing it all in software.

[+] dimatura|6 years ago|reply

Similar situation here, ended up with the same solution, after an initial attempt with HLS. jsmpeg (https://github.com/phoboslab/jsmpeg) made it pretty easy.

[+] colek42|6 years ago|reply

Try streaming TS packets over Websockets and decoding with FFMPEG compiled to WASM in the browser. I wrote, https://github.com/colek42/streamingDemo a couple years back, and despite the hacky code it worked really well. You could probably do much better today.

[+] el_isma|6 years ago|reply

We recently completed a project with similar requirements. We ended up using rtsp from the camera and packing it up in websockets using ffmpeg. We had sub second latency. The camera gave h264 so we could just repack that. We're giving a talk about the project on MonteVIDEO Tech meetup, though it will be in Spanish.

Contact me if you want to discuss it further.

[+] ipodopt|6 years ago|reply

https://docs.microsoft.com/en-us/azure/media-services/latest...

[+] rightos|6 years ago|reply

WebRTC options are improving, check out

https://github.com/ant-media/Ant-Media-Server

[+] Waterluvian|6 years ago|reply

How does Twitch do low latency mode? Isn't that 1-2 seconds?

[+] toast0|6 years ago|reply

Does it need to work in a browser or could you provide VLC or similar as the client and use the 90s protocol that actually addresses your use case?

[+] cheeze|6 years ago|reply

You want WebRTC

[+] rv11|6 years ago|reply

You should try webrtc. googles stadia is built upon webrtc, so I assume it should be able to give latency in miliseconds.

[+] baybal2|6 years ago|reply

Why can't you simply use plain HTTP streaming directly loaded by the browser with video tag?

[+] chime|6 years ago|reply

The major criticism the author has is the requirement for HTTP2 push for ALHLS, which many CDNs don't support. While I agree it is a valid criticism, I am glad Apple is forcing the CDNs to support push. Without the 800lb gorilla pushing everyone to upgrade, we would still be using pens on touchscreens.

I am not a fan when Apple obsoletes features that people love and use. But I always support when Apple forces everyone to upgrade because friction from existing providers is what keeps things slow and old. Once Apple requires X, everyone just sighs and updates their code, and 12mo later, we are better off for it.

That being said, I agree with author's disappointment that Apple mostly ignored LHLS instead of building upon it. Chunked encoding does sound better.

[+] spenczar5|6 years ago|reply

There are good reasons CDNs don't support http/2 push. It’s hard to load balance and hard to operate, since it requires a persistent TCP connection with the client for the entire duration of the stream, which can be hours. It has consequences that echo almost everywhere in the architecture.

[+] stefan_|6 years ago|reply

What exactly is the benefit of HTTP2 for HLS CDN use, particularly?

The obvious benefit of not using it is that you don't need your CDN to do TLS, likely to be utterly superfluous if video chunks are validated through the secure side-channel of the playlist already.

[+] makomk|6 years ago|reply

The main redeeming feature of traditional HLS is that it can use ordinary HTTP CDN infrastructure. If you're going to require video-streaming-specific functionality in CDNs anyway there is absolutely no justification for designing your protocol in this horrendously convoluted, inefficient, poorly-performing way.

[+] jedberg|6 years ago|reply

It's ironic that "live streaming" has gotten worse since it was invented in the 1930's. Up until TV went digital, the delay on analog TV was just the speed of light transmission time plus a little bit for broadcasting equipment. It was so small it was imperceptible. If you had a portable TV at the live event, you just heard a slight echo.

Now the best we can do is over 1 second, and closer to 3 seconds for something like satellite TV, where everything is in control of the broadcaster from end to end.

I suppose this is the tradeoff we make for using more generalized equipment that has much broader worldwide access than analog TV.

[+] KaiserPro|6 years ago|reply

Yes, and its driven by consumers.

Unless your content operates in a very small niche, "real time" is far less important that continuity.

In rough order of preference for the consumer:

1) it starts fast 2) it never stops playing 3) It looks colourful 4) Good quality sound 5) Good quality picture 10) latency

one of the main reason why "live" broadcast over digital TV has a stock latency of >1 second is FEC. (forward error correction) this allows a continuous stream of high quality over a noisy transport mechanism. (yes, there is the local operating rules for indecent behaviour, and switch and effects delays, which account for 10 seconds and >250ms respectively)

For IPtv its buffer. Having a stuttering stream will cause your consumers to switch off/go elsewhere. One of the reasons why realplayer held on for so long was that it was the only system that could dynamically switch bitrates seamless, and reliably.

There is a reason why netflix et al start off with a low quality stream, and then switchout to HD 30 seconds in, its that people want to watch it now, with no interruption. They have millions of data points to back that up.

[+] mantap|6 years ago|reply

Google seems to think they can implement video gaming over IP. And they probably can, my ping to them is only 9ms, less than a frame.

There is just a broad lack of interest in reducing latency past a certain point unless there is a business reason for it. People don't notice 1 second of latency.

[+] makomk|6 years ago|reply

It's not surprising if you think about how our ability to store video has changed over the years. The delay on analog TV is so low because the picture data had to go straight from the camera to the screen with basically no buffering since it was infeasible to store that much data. (PAL televisions buffered the previous scanline in an analog delay line for colour decoding purposes, but that was pretty much cutting edge at the time.) Now that we can buffer multiple frames cheaply, that makes it feasible to compress video and transmit it without the kind of dedicated, high-bandwidth, low-latency links required in the analog days. Which in turn makes it possible to choose from more than a handful of channels.

[+] Avamander|6 years ago|reply

We also lost the ability to broadcast things like rain, snow, fireworks, dark scenes and confetti with the loss of analog.

[+] gibolt|6 years ago|reply

Some delay from many producers is almost certainly intentional. Live content providers want to be able to have a second to cut a stream if something unexpected (profanity, nudity, injury...) occurs on set.

[+] dangus|6 years ago|reply

Analog TV is also massively less spectrum efficient. You can fit 4+ digital channels in the same spectrum as one analog TV channel.

And don't forget how low and inconsistent the quality of analog TV was compared to what we can broadcast digitally.

The real story here is that latency isn't actually important to live TV, so it's a no-brainer trade-off to make. If you look at other transmission technologies where latency is more important, like cellular data transmission, latency has only decreased over the years.

[+] Dylan16807|6 years ago|reply

> Now the best we can do is over 1 second

Mixer can do about .2 seconds.

[+] floatingatoll|6 years ago|reply

This title is unnecessarily inflammatory with intent to gain our sympathy to the position presented.

The technical writeup of this post are spot-on, though. I prefer less drama with my bias but I’m very glad I read this.

[+] CharlesW|6 years ago|reply

Thanks, that's exactly how I felt — that there’s a really good and useful article in here, but clouded by assumptions and an attempt to create controversy.

[+] floatingatoll|6 years ago|reply

Note: HN altered the title from the original to which my comment refers.

[+] nvahalik|6 years ago|reply

Looks like as far back as 2014 research has pointed to some big gains using HTTP/2 push: https://dl.acm.org/citation.cfm?id=2578277

[+] buraktamturk|6 years ago|reply

> A Partial Segment must be completely available for download at the full speed of the link to the client at the time it is added to the playlist.

So with this, you can not have a manifest file that point to next future chunks (e.g. for up to next 24 hours of live stream) and delay proccessing of http request until the chunk became available. Like HTTP Long Polling used for chunks.

> On the surface, LHLS maintains the traditional HLS paradigm, polling for playlist updates, and then grabbing segments, however, because of the ability to stream a segment back as it's being encoded, you actually don’t have to reload the playlist that often, while in ALHLS, you’ll still be polling the playlist many times a second looking for new parts to be available, even if they’re then pushed to you off the back of the manifest request.

Which could be avoided if Apple didn't enforced the availibilty of download "at the full speed" once it appeared in the manifest. (long polling of chunks)

LHLS doesn't have this issue as the manifest file itself is streamed with chunked responses hence it makes sense. (streaming manifest file)

> For the time being at least, you’ll have to get your application (and thus your low latency implementation) tested by Apple to get into the app store, signaled by using a special identifier in your application’s manifest.

And this makes me to think about the implementability of the 1st and 2nd point on ALHLS. Maybe the current "implementation" is compatible but not with the specs itself.

[+] Fripplebubby|6 years ago|reply

To your last point,

> Maybe the current "implementation" is compatible but not with the specs itself.

It's perhaps worth noting that this is a "preliminary specification" and an extension of HLS. HLS itself is an IETF standard (well - an "Internet Draft"): https://tools.ietf.org/html/draft-pantos-http-live-streaming...

[+] kevleyski|6 years ago|reply

Apple low latency test stream I set up if useful (uses CDN) https://alhls-switchmedia.global.ssl.fastly.net/lhls/master....

[+] sparker72678|6 years ago|reply

Any chance some of this is related to patent avoidance?

[+] unknown|6 years ago|reply

[deleted]

[+] ec109685|6 years ago|reply

> measuring the performance of a blocking playlist fetch along with a segment load doesn’t give you an accurate measurement, and you can’t use your playlist download performance as a proxy.

I don’t see why this would be the case. If you measure from the time the last bit of the playlist is returned to the last bit of the video segment is pushed to the client, you’ll be able to estimate bandwidth accurately.

[+] Fripplebubby|6 years ago|reply

> from the time the last bit of the playlist is returned to the last bit of the video segment

Based on my loose understanding of HTTP/2 server push and ALHLS, the sequence of events will be:

1. Client requests playlist for future media segment/"Part"

2. Server blocks (does not send response) until the segment is available

3. Server sends the playlist ("manifest") as the response body along with a push promise for the segment itself

The push then begins with the segment.

The push stream can presumably occur concurrently with the response body stream. So I don't think you can wait until every bit of the playlist comes in. Likewise, you can't use the playlist bits itself to gauge bandwidth because the server imposes latency by blocking.

[+] unknown|6 years ago|reply

[deleted]

[+] shmerl|6 years ago|reply

As usual, Apple pushes NIH, instead of supporting DASH which is the common standard. And they also tried to sabotage adoption of the later by refusing to support MSE on the client side that's needed for handling DASH.

[+] citruspi|6 years ago|reply

> As usual, Apple pushes NIH, instead of supporting DASH which is the common standard.

I mean... HLS predates DASH. It would've been hard for them to support a common standard which didn't even exist at the time. Initial release of HLS was in 2009[0], work started on DASH in 2010[1].

I'd also disagree with the characterization of DASH as "the commmon standard" - it's certainly a legitimate standard, but I feel like support for HLS is more ubiquitous than support for DASH (please correct me if I'm wrong).

[0] https://en.wikipedia.org/wiki/HTTP_Live_Streaming

[1] https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_ove...