I thoroughly appreciated this article as I've been building a short-form video content streaming service and the performance hasn't been what I expected.
Granted, I knew that my service needs to be able to scale at different bottlenecks, but a lot of "build your own video service!" tutorials start with:
- Build a backend, return a video file
- Build a frontend, embed the video
And that leaves a lot to be desired in terms of performance. I think the actual steps should be:
- Build a backend that consists of:
- Video Ingestion service
- Video Upload / Processing Service that saves the video into chunks
- Build a streaming service that returns video chunks
- Build a frontend that consists of:
- Build or use a video streaming library that can play video chunks as a stream
Edit: From the author's links, I found this website which is very informative: https://howvideo.works/
I helped work on howvideo.works, fun to see it helping people! The world of video is, I'd argue, one of those technical spaces that is extremely iceberg-y. You can get decently far enough using S3 + the HTML5 video tag, which I think creates a perception among some that video is just images but a little bigger, but that couldn't be further from the truth. You can really pick just about any step along the video pipeline from production to playback and go as deep for as many years as you'd like.
This is both a semi-shameless plug and probably a few levels deeper than what you're looking for, but I organize a conference for video developers called Demuxed. The YouTube channel[1] has 8 years worth of conference videos about streaming video (and the 9th year is happening in a couple of weeks). The bullet points you mentioned are definitely covered across a few talks, but it's certainly not in any kind of "how to" format.
I'm the writer of the article; thanks for your lovely comment. I skipped many essential parts of the architecture in the article to keep it concise. The following articles will be about the technical implementation of what I discussed in this one.
I've been using commercial streaming services for in app (cloudflare, bunny, vimeo), and found performance & bandwith use terrible. The HLS protocol for iOS doesnt work wel for 5-10 second clips, since it needs 1-4 seconds. Now using compress mp4 with progressive loading. Way better.
> Video Upload / Processing Service that saves the video into chunks
At this point you also need to chose what streaming protocol you want to use. You have mostly two choices, HLS if you want to get things done quick, or MPEG-DASH if you want more control (but you'd need a separate HLS pipeline for iOS anyway…)
> Build or use a video streaming library that can play video chunks as a stream
As someone who's worked on a web streaming player, I'd strongly recommend not to build one but to use an existing one (or, in short: use HLS.js)
This is nice if you only have to deliver in one format, but as soon as you want to show up on TVs you are stuck delivering in a lot of formats, and life gets complicated quickly.
Throw subtitles in multiple languages, and different audio tracks, into the mix, and all of a sudden streaming video becomes a nightmare.
Finally, if you are dealing with copyrighted materials, you have to be aware as to what country your user is physically residing in while accessing the videos, as you likely don't have a license to stream all your videos in every country all at once.
Throw this all into a blender and what is needed is a very fancy asset catalog management system, and that part right there ends up being annoyingly complicated.
Oh, this is just the tip of the iceberg. Many parts of on-demand video streaming are largely commoditized at this point. Add in support for linear (live) streaming and ad insertion and things start to get really interesting. :)
OP mentions that "I would love to be a little mouse and peek at YouTube’s complete architecture to see how far we are from them." You can occasionally find posts -often linked here- from another player in streaming video which you might have heard of, discussing technical architecture. For example, this might be a little lower level that you may be interested in as it relates to kernel optimizations to jack bit throughput rates, but I dig this sort of thing -
For general video streaming, Mux.com has greatly decreased my development time. Getting playback working is straightforward. And for advanced use cases, like real time editing and preview in a web browser, it works as expected and doesn’t get in the way.
Fuck K8. You literally don't need it. Maybe he needs it because he's building on google cloud.
AWS is easier, but you can do it with anything. The basic steps are:
1. Upload the file somewhere
2. Transcode it
3. Put the parts somewhere
4. Serve the parts
You should really transcode everything into HLS. It's 2023, and everything that matters supports it. If you want 4k you can use HLS or the other thing (which I keep forgetting the acronym for).
If you want to get fancy you can do rendition audio, which not everything supports. Rendition audio means sharing one audio stream amongst N number of video streams.
You can use FFMPEG to transcode, but I'd suggest using AWS MediaConvert. It's cheap, fast, and probably does everything you want. Using FFmpeg directly works, but why bother. You will get an option wrong and screw everything up. You don't want your video to not work on some random device that 50k people are using in some country you didn't think about.
He's using RabbitMQ but you should use SQS, because SQS can trigger lambdas...which means no polling required. But use whatever queue you want.
You can kick the process off by attaching a Lambda to S3, which will start the process when the file is uploaded.
You can kick your "availability activation" off by attaching a Lambda to the S3 output bucket.
Background: I help run a streaming service and built the backend pipeline.
This omits the entire "metadata management and analytics" side as well. That's left as an exercise for the user.
I would like to know what the costs come out to per minute of video encoded and how many outputs they're getting in order to compare this to something like Media Convert (AWS), Google Transcode or something dedicated like Mux.
For reference Google Transcoder is about $0.13 per minute of encoding (at four resolutions). Mux is at $0.032 / min, AWS Media Convert $0.0188.
I should note I know Mux's pricing well, use them a lot, happily. It gets a bit confusing with Google and Media Convert because I'm not sure how these costs map to the resulting bitrate renditions that get created and I've not got the time for a deeper dive to get a more straight apples to apples comparison (ignore scale discounts)
Cloud services like S3 and Azure Storage were invented specifically for hosting images and video. That’s their origin story, their foundation, their very reason for being.
Similarly, cloud functions / lambda were invented for background processing of blobs. The first demos were always of resizing images!
Building out this infrastructure yourself is a little insane. Unless you’re Netflix, don’t bother. Just dump your videos into blobs.
It’s like driving to your cousin’s place, but step one is building your own highway because you couldn’t be bothered to check the map to see if one already existed.
PS: Netflix serves video from BSD running directly on bare metal because at that scale efficiency matters! If efficiency doesn’t matter that much, use blobs. Kubernetes is going to be even worse.
Depending on proprietary software such as cloud offerings for something as essential to its requirements such as encoding is not sustainable, and will create technical debt as your software/company will rely on the profitable success of the cloud service.
But even if you'd don't use AWS use something else. Any video encoder service will be cheaper and more reliable than using k8 + your own hacked together docker instance. It's literally not worth it.
Do you really want to spend your time messing around with ffmpeg and setting the correct GOP values? Or trying to create your own b-frame track?
At some point you actually need to do what all these settings are for. Until that time you should use other services.
One thing people pointed out is minimizing egress fees.
You do that by using CloudFlare, fastly, or another CDN.
You can get bandwidth costs down to < .001/gb by committing to $1500/mo in bandwidth. The CDNs will pull from S3 once, then cache it forever (assuming you do it right).
serverless on aws is underrated if your workload can fit on it. i have an app that hasn't received a single ounce of maintenance in like 3 years that still "just works", collects revenue from stripe, does all the business logic on lambdas, generates downloadable print-friendly pdf's on lambdas, etc. the supporting tech is dynamo + triggers for lambda, s3 and related triggers for lambda. but it would be hard for a non-expert aws user to fathom this sort of architecture, so i don't fault others for falling down the nih rabbit hole.
While the article provides guidance on utilizing standard software and services to construct a basic video upload platform, it lacks deeper insights into advanced scaling techniques.
We’ve built a similar pipeline architecture for our product. One key thing I’ll mention is that we’re using Shaka-streamer which is a python wrapper around Shaka-packager (which in turn is a wrapper around ffmpeg). We queue our transcode jobs into a redis queue, and use k8s to scale the transcode workers based on queue volume. Lastly, as a few folks have mentioned, we have an experimental on-prem transcoding cluster with consumer grade HW that is pretty cheap.
If you’re interested in working on transcoding I’d highly recommend taking a look at Shaka-packager/streamer.
I used to work on system like this and even built the logic to use preemptible pool effectively just like OP. If I had to design it from scratch today I would use Temporal for job scheduling - their durable compute concept is perfect fit for this and we had a lot of trouble maturing the equivalent scheduling system trying to keep up with rapidly growing scale
Hello, I'm the writer of the article. Our solution gets videos from random people who present products we sent them. We get dodgy videos filmed on bad devices, and the process of contacting the user and getting him to re-upload another video in better quality is time-consuming for our team. We'd rather spend a little bit more in computing to try and save time overall. I hope this answers your question.
We've been working on an alternative infrastructure and saves up to 80% on transcoding & delivery costmore affordable solution at Livepeer Studio (https://livepeer.studio/).
It uses un-utilized infrastructure around the world and incentivizes independent network operators to join the network (kind of like a 2-sided marketplace for video-specific compute).
Please sign up for a free account and check it out! We'd love to get your feedback
I have to ask, why bother with Kubernetes and all the associated config and pain? Why not just start a new spot instance? I can’t see any reason for Kubernetes in this architecture even though it’s the title of the post.
Also personally I wouldn’t use rabbitmq … it’s pretty heavyweight… there’s lots of lightweight queues out there. Overall this architecture looks like it could be simplified.
Also, the post doesn’t mention if the video encoding uses GPU hardware acceleration. Makes a big difference especially if using spot instances …. ffmpeg in CPU is extremely computationally expensive.
Presumably all input videos need reencoding to convert them to HLS.
Hello, I'm the writer of the article. We are using Kubernetes for our whole architecture, consisting of around 40 microservices and cron jobs. I just wanted in this article to give an example of asynchronous architecture using Kubernetes and RabbitMQ.
We are using RabbitMQ because it's my company target solution. There might better so lighter solution that would fit us but having just one for every solution is easier to maintain.
Great comment about GPU hardware acceleration for encoding, I'm going to look this up.
I believe loads of auxiliary microservices have been omitted for brevity. Of course, those also don’t require Kubernetes, but maybe they have some standardised deployment system which keeps things manageable. Don’t forget about Observability and whatnot.
I am sure it's difficult for someone to build and scale video infrastructure. A few companies are doing it for you; plug in the APIs, and you're done.
Gumlet (https://www.gumlet.com): Per-title encoding (Netflix's approach) to optimize and transcode your videos to boost engagement rates. Moreover, securing your videos is easy with digital rights management solutions paired with Widevine and Fairplay. Made for developers, by developers.
Mux: Developer-friendly video infrastructure for your on-demand & live video needs.
I love Gumlet because of their pricing and support.
[+] [-] schott12521|2 years ago|reply
Granted, I knew that my service needs to be able to scale at different bottlenecks, but a lot of "build your own video service!" tutorials start with:
- Build a backend, return a video file
- Build a frontend, embed the video
And that leaves a lot to be desired in terms of performance. I think the actual steps should be:
- Build a backend that consists of:
- Build a frontend that consists of: Edit: From the author's links, I found this website which is very informative: https://howvideo.works/[+] [-] mmcclure|2 years ago|reply
This is both a semi-shameless plug and probably a few levels deeper than what you're looking for, but I organize a conference for video developers called Demuxed. The YouTube channel[1] has 8 years worth of conference videos about streaming video (and the 9th year is happening in a couple of weeks). The bullet points you mentioned are definitely covered across a few talks, but it's certainly not in any kind of "how to" format.
[1]: https://youtube.com/demuxed
[+] [-] alexandreolive|2 years ago|reply
[+] [-] wouldbecouldbe|2 years ago|reply
[+] [-] littlestymaar|2 years ago|reply
At this point you also need to chose what streaming protocol you want to use. You have mostly two choices, HLS if you want to get things done quick, or MPEG-DASH if you want more control (but you'd need a separate HLS pipeline for iOS anyway…)
> Build or use a video streaming library that can play video chunks as a stream
As someone who's worked on a web streaming player, I'd strongly recommend not to build one but to use an existing one (or, in short: use HLS.js)
[+] [-] Uehreka|2 years ago|reply
[+] [-] andrewstuart|2 years ago|reply
What does it do?
[+] [-] John23832|2 years ago|reply
[+] [-] com2kid|2 years ago|reply
Throw subtitles in multiple languages, and different audio tracks, into the mix, and all of a sudden streaming video becomes a nightmare.
Finally, if you are dealing with copyrighted materials, you have to be aware as to what country your user is physically residing in while accessing the videos, as you likely don't have a license to stream all your videos in every country all at once.
Throw this all into a blender and what is needed is a very fancy asset catalog management system, and that part right there ends up being annoyingly complicated.
[+] [-] dbrueck|2 years ago|reply
[+] [-] grzes|2 years ago|reply
[+] [-] dvliman|2 years ago|reply
The system differ in that it was not user generated video content. It was coming from the cameras in our fitness studio.
Here is the article if anyone intereste to read about: https://dev.to/dvliman/building-a-live-streaming-app-in-cloj...
[+] [-] devgoth|2 years ago|reply
[+] [-] thomasjudge|2 years ago|reply
https://www.youtube.com/watch?v=36qZYL5RlgY
[+] [-] cedws|2 years ago|reply
[+] [-] FaisalMahmoud|2 years ago|reply
[+] [-] mannyv|2 years ago|reply
AWS is easier, but you can do it with anything. The basic steps are:
1. Upload the file somewhere 2. Transcode it 3. Put the parts somewhere 4. Serve the parts
You should really transcode everything into HLS. It's 2023, and everything that matters supports it. If you want 4k you can use HLS or the other thing (which I keep forgetting the acronym for).
If you want to get fancy you can do rendition audio, which not everything supports. Rendition audio means sharing one audio stream amongst N number of video streams.
You can use FFMPEG to transcode, but I'd suggest using AWS MediaConvert. It's cheap, fast, and probably does everything you want. Using FFmpeg directly works, but why bother. You will get an option wrong and screw everything up. You don't want your video to not work on some random device that 50k people are using in some country you didn't think about.
He's using RabbitMQ but you should use SQS, because SQS can trigger lambdas...which means no polling required. But use whatever queue you want.
You can kick the process off by attaching a Lambda to S3, which will start the process when the file is uploaded.
You can kick your "availability activation" off by attaching a Lambda to the S3 output bucket.
Background: I help run a streaming service and built the backend pipeline.
This omits the entire "metadata management and analytics" side as well. That's left as an exercise for the user.
[+] [-] mikeryan|2 years ago|reply
For reference Google Transcoder is about $0.13 per minute of encoding (at four resolutions). Mux is at $0.032 / min, AWS Media Convert $0.0188.
I should note I know Mux's pricing well, use them a lot, happily. It gets a bit confusing with Google and Media Convert because I'm not sure how these costs map to the resulting bitrate renditions that get created and I've not got the time for a deeper dive to get a more straight apples to apples comparison (ignore scale discounts)
[1] https://cloud.google.com/transcoder/pricing
[2] https://www.mux.com/pricing/video
[3] https://aws.amazon.com/mediaconvert/pricing/
[+] [-] jiggawatts|2 years ago|reply
Cloud services like S3 and Azure Storage were invented specifically for hosting images and video. That’s their origin story, their foundation, their very reason for being.
Similarly, cloud functions / lambda were invented for background processing of blobs. The first demos were always of resizing images!
Building out this infrastructure yourself is a little insane. Unless you’re Netflix, don’t bother. Just dump your videos into blobs.
It’s like driving to your cousin’s place, but step one is building your own highway because you couldn’t be bothered to check the map to see if one already existed.
PS: Netflix serves video from BSD running directly on bare metal because at that scale efficiency matters! If efficiency doesn’t matter that much, use blobs. Kubernetes is going to be even worse.
[+] [-] mrd3v0|2 years ago|reply
[+] [-] mannyv|2 years ago|reply
Do you really want to spend your time messing around with ffmpeg and setting the correct GOP values? Or trying to create your own b-frame track?
At some point you actually need to do what all these settings are for. Until that time you should use other services.
[+] [-] jonnycoder|2 years ago|reply
[+] [-] TheFragenTaken|2 years ago|reply
[+] [-] c7DJTLrn|2 years ago|reply
[+] [-] mannyv|2 years ago|reply
You do that by using CloudFlare, fastly, or another CDN.
You can get bandwidth costs down to < .001/gb by committing to $1500/mo in bandwidth. The CDNs will pull from S3 once, then cache it forever (assuming you do it right).
[+] [-] whalesalad|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] _joel|2 years ago|reply
[+] [-] totallyunknown|2 years ago|reply
[+] [-] jspizziri|2 years ago|reply
If you’re interested in working on transcoding I’d highly recommend taking a look at Shaka-packager/streamer.
[+] [-] dilyevsky|2 years ago|reply
[+] [-] adityapatadia|2 years ago|reply
[+] [-] andrewstuart|2 years ago|reply
Also looks pretty complex.
The stabilization step presumably does a video encode …. that’s extremely expensive in terms of time, compute and money I wonder why it’s necessary.
[+] [-] alexandreolive|2 years ago|reply
[+] [-] klaussilveira|2 years ago|reply
[+] [-] ericxtang|2 years ago|reply
It uses un-utilized infrastructure around the world and incentivizes independent network operators to join the network (kind of like a 2-sided marketplace for video-specific compute).
Please sign up for a free account and check it out! We'd love to get your feedback
[+] [-] latchkey|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] tehlike|2 years ago|reply
Hetzner or other bare metal providers would probably be a better idea.
[+] [-] titan-node|2 years ago|reply
[+] [-] sonof0dn|2 years ago|reply
[+] [-] andrewstuart|2 years ago|reply
Also personally I wouldn’t use rabbitmq … it’s pretty heavyweight… there’s lots of lightweight queues out there. Overall this architecture looks like it could be simplified.
Also, the post doesn’t mention if the video encoding uses GPU hardware acceleration. Makes a big difference especially if using spot instances …. ffmpeg in CPU is extremely computationally expensive.
Presumably all input videos need reencoding to convert them to HLS.
[+] [-] alexandreolive|2 years ago|reply
We are using RabbitMQ because it's my company target solution. There might better so lighter solution that would fit us but having just one for every solution is easier to maintain.
Great comment about GPU hardware acceleration for encoding, I'm going to look this up.
[+] [-] mihaitodor|2 years ago|reply
[+] [-] pyrophane|2 years ago|reply
[+] [-] Maxou44|2 years ago|reply
[deleted]
[+] [-] akbansa|2 years ago|reply
Gumlet (https://www.gumlet.com): Per-title encoding (Netflix's approach) to optimize and transcode your videos to boost engagement rates. Moreover, securing your videos is easy with digital rights management solutions paired with Widevine and Fairplay. Made for developers, by developers.
Mux: Developer-friendly video infrastructure for your on-demand & live video needs.
I love Gumlet because of their pricing and support.