This looks to be nicely architected. Looking forward to digging in.
In answer to the questions about TURN, this approach won't be lower latency than TURN but will scale better. TURN servers are just (nearly) passive relays, so the sending client needs to set up as many outbound streams as there are receiving clients in the session.
The advantage to the TURN approach is that you can do end-to-end encryption.
The server this post is about "forwards" streams. It's in a class of infrastructure traditionally called a "Selective Forwarding Unit." So the sending client can send one stream, and the SFU copies the stream and sends it to the receiving clients. The server needs more CPU and outgoing bandwidth as the receiver count grows, but the sending client doesn't need much of either.
Once you have streams piping through a server, you can do lots of other things, of course. This server can transcode to several formats, including HLS and RTMP, and can copy streams internally across a cluster to scale more than a single machine can.
Media servers are really fun to work on: lots of small(-ish) hard problems that touch low-level network protocols, memory and cpu optimization, architecture (because eventually you want a clean plugins interface, etc.).
If you're interested in this stuff, check out Mediasoup, an open source WebRTC SFU with a very nice design in which all the low-level stuff is c++ and all the high-level interfaces are exposed as nodejs objects.[0]
The sad part about selective forwarding units (SFU) is you loose end-to-end-encryption. So you only have encryption between you and the sfu and sfu to your participant. For hosted solutions this means a server could record all your conferences.
Therefore i am waiting for the PERC standard. It will allow to send the audio/video to server only once and the server will retransmit to all peers - everything still end-to-end encrpted.
From my understanding, the current challenge with ultra-low-latency is scale. Will it scale to 100, 1000, 100,000, concurrent users?
At the moment, everyone in the industry is working to figure out how to provide a viewer experience that is similar, or better than the current expectations of live streaming.
Live streaming is hard. Ultra-low-latency is just as hard.
However, I'm still not sure why we need ultra-low-latency. Will it improve the live experience? Maybe.
Aside from live-action sports or significant events, do we as content consumers really need ultra-low-latency?
I think even with turn there's lots of chatter back and forth. Connected clients can NACK for frames and ask for a bandwidth reduction. I think the expected optimal implementation is that the group of WebRTC participants normalize towards an optimal bitrate.
For 1->n streaming you just want very high quality broadcast->server and then presumably you want the downstream clients to not have the option to send packets back to the broadcaster. Transcoding is hard though, so consumers might be stuck with just the source video quality. I assume this lib just passes the frames on from the source (twitch did just that through almost all of its pre-acquisition growth).
Some of my details might be a little off, but I did spend a few months trying to build something very similar to this with gstreamer. WebRTC was hard to grok.
Actually it is not lower latency than centralized TURN server. The difference from TURN is that Ant Media Server behaves like peer to each connected clients. Not relaying the video/audio as TURN. Additionally, it can create adaptive bitrates on the server side. It can record streams and control other things on server side.
As far as I'm concerned, 500ms is normal latency (once setting aside network delay -- clearly a 200ms one way trip from Europe to NZ will add to that)
I'd say ultra low latency would be sub-frame, so <40ms. Low latency on the 1 frame to half a second, normal upto say 1.5s, and high latency beyond that.
But that's for point to point or multicast circuits, and ISPs don't like multicast as they can't charge for it.
There's also possible latency savings from having the video encoder be aware of network conditions on a per-frame basis: http://ex.camera but I don't think it's found its way to the "mainstream".
And here I was blissfully unaware that I'm experiencing reality 30 seconds delayed. Had no idea, other than obviously there's some # frames delay needed to do motion compression, and some transit latency.
I've used Wowza (video streaming server) for years and this is a direct competitor. The pricing is a little higher for Wowza, but Wowza is a mature product with tons of options for web streaming. The weakness of Wowza has been its support for WebRTC. I am using it, but it's not easy to stream from RTSP/RTMP to WebRTC. It also doesn't scale out for WebRTC
It will help apps which require interaction from users, like google stadia's game streaming. for other use cases, I think one does not care even if the feed is delayed by few seconds.
Another interesting use case someone mentioned here before is the world cup - you don't want a penalty shoot-out ruined by your neighbours cheering a few seconds before you see the goal.
A few seconds would also such for live streaming where the streamer interacts in chat.
It looks like they support RTMP and a few other streaming methods. Maybe it's just WebRTC for ingress and then it converts to other formats. Given the advertised 500ms of latency I imagine it's not WebRTC->WebRTC.
[+] [-] kwindla|7 years ago|reply
In answer to the questions about TURN, this approach won't be lower latency than TURN but will scale better. TURN servers are just (nearly) passive relays, so the sending client needs to set up as many outbound streams as there are receiving clients in the session.
The advantage to the TURN approach is that you can do end-to-end encryption.
The server this post is about "forwards" streams. It's in a class of infrastructure traditionally called a "Selective Forwarding Unit." So the sending client can send one stream, and the SFU copies the stream and sends it to the receiving clients. The server needs more CPU and outgoing bandwidth as the receiver count grows, but the sending client doesn't need much of either.
Once you have streams piping through a server, you can do lots of other things, of course. This server can transcode to several formats, including HLS and RTMP, and can copy streams internally across a cluster to scale more than a single machine can.
Media servers are really fun to work on: lots of small(-ish) hard problems that touch low-level network protocols, memory and cpu optimization, architecture (because eventually you want a clean plugins interface, etc.).
If you're interested in this stuff, check out Mediasoup, an open source WebRTC SFU with a very nice design in which all the low-level stuff is c++ and all the high-level interfaces are exposed as nodejs objects.[0]
[0]-https://mediasoup.org/
[+] [-] tpetry|7 years ago|reply
Therefore i am waiting for the PERC standard. It will allow to send the audio/video to server only once and the server will retransmit to all peers - everything still end-to-end encrpted.
[+] [-] sansnomme|7 years ago|reply
[+] [-] kodablah|7 years ago|reply
[+] [-] adreamingsoul|7 years ago|reply
At the moment, everyone in the industry is working to figure out how to provide a viewer experience that is similar, or better than the current expectations of live streaming.
Live streaming is hard. Ultra-low-latency is just as hard.
However, I'm still not sure why we need ultra-low-latency. Will it improve the live experience? Maybe. Aside from live-action sports or significant events, do we as content consumers really need ultra-low-latency?
[+] [-] maxmcd|7 years ago|reply
For 1->n streaming you just want very high quality broadcast->server and then presumably you want the downstream clients to not have the option to send packets back to the broadcaster. Transcoding is hard though, so consumers might be stuck with just the source video quality. I assume this lib just passes the frames on from the source (twitch did just that through almost all of its pre-acquisition growth).
Some of my details might be a little off, but I did spend a few months trying to build something very similar to this with gstreamer. WebRTC was hard to grok.
[+] [-] mekya|7 years ago|reply
[+] [-] rkagerer|7 years ago|reply
[+] [-] BeefySwain|7 years ago|reply
[+] [-] isostatic|7 years ago|reply
I'd say ultra low latency would be sub-frame, so <40ms. Low latency on the 1 frame to half a second, normal upto say 1.5s, and high latency beyond that.
But that's for point to point or multicast circuits, and ISPs don't like multicast as they can't charge for it.
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] rasz|7 years ago|reply
>Latency To Broadcaster: 7.71 sec.
>Latency Mode: Low Latency
[+] [-] naikrovek|7 years ago|reply
[+] [-] gugagore|7 years ago|reply
[+] [-] hexane360|7 years ago|reply
Edit: It's the same author
[+] [-] dboreham|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] mgamache|7 years ago|reply
[+] [-] selim17|7 years ago|reply
[+] [-] rv11|7 years ago|reply
[+] [-] IshKebab|7 years ago|reply
A few seconds would also such for live streaming where the streamer interacts in chat.
[+] [-] MuffinFlavored|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] pault|7 years ago|reply
[+] [-] fabioyy|7 years ago|reply
[+] [-] maxmcd|7 years ago|reply
[+] [-] ec109685|7 years ago|reply
[+] [-] DiseasedBadger|7 years ago|reply