From a technology point of view, this is really cool.
From the view of someone that occasionally watches videos on YouTube, I am trying to figure out a nice way to say... I hate it. Or more specifically, I hate that it generates the voice, and basically enables video content spam.
What we don't need more of is cheap, easy to automatically generate videos that are basically spam and/or clickbait, trying to get views. The problem with auto-generated voices in videos like this is as a viewer I can't distinguish between work that someone put deliberate production time into, and something churned out by a content farm. The demo video even tricked me at first, I didn't realize it was a generated voice until a couple sentences in, at which point I had a visceral negative reaction, the same as when I accidently click on a content farm-generated video.
It seems a major feature is automatically syncing the narration to the slides. Perhaps a way to enhance this while avoiding spam generation is to use the generated voice only for internal timing, and generated a karaoke-like display for a narrator (human) to read? As a paid service, you could even provide professional voice-offers as an add-on.
> The problem with auto-generated voices in videos like this is as a viewer I can't distinguish between work that someone put deliberate production time into, and something churned out by a content farm.
If machine voiced vs human voiced is the only discernible difference in the end, this seems like a non-argument.
As someone that is building a tool in the roughly same space (machine voiced video generation), I can just say that the use-cases go far beyond "content-farm". It also enables a lot of useful content like e.g. internal training videos, or paired with browser automation, you can have narrated always up-to-date video manuals of your product. In the education space, it enables a more iterative way to produce material where you previously couldn't afford to tweak parts of a video, as you would have to narrate it again.
And I also don't think that it will amplify the existence of such videos significantly. There are already Youtube channels that already do just that, and people don't seem to mind. E.g. there is a channel that uploads "car news" content, which basically just has a narration on top of a series of pictures of a car, and the amount of views and the rating on those videos is pretty good. In the end its just a few fact bulletins stretched into an overly long video using the same old worn out phrases (just like regular "car news"), and I don't see why a human would need to waste their time to voice that.
The primary issue I have with auto-generated video is its ability to systematically reduce accuracy over time due to being generated from out-of-date information, unmaintained APIs, or simple typos.
Manual editing and manual narration tends to act as a forcing function to review the information and approve its accuracy before publishing.
Auto-generated videos can often be published without a final review or fact check. As we see auto-generated video for things like product demos, and company training, it will open a new problem domain of catching “bugs” in those auto-generated videos.
you can easily replace generated voice with professionally recorded voice later, and have it re-sync everything. generated voice is great for experimenting and iterating.
> The problem with auto-generated voices in videos like this is as a viewer I can't distinguish between work that someone put deliberate production time into, and something churned out by a content farm.
There's a big difference between good content that is automated into a video, and spam. The key use case for this was helping me focus more on the content, rather than on fiddling with synchronisation and resizing assets. I'm not a native English speaker, and although I speak at quite a few conferences per year, listening to my broken English accent (which sounds like a Bond villain) in a video is quite distracting, even for me. Even with my best efforts to record my own voice professionally, generated voice sounds a lot better than what I can do.
We need better reporting and labelling of farm-generated or auto-generated videos - possibly ML models that detect this. However it will cost YouTube revenue because they're profiting off of content farms. I don't see any other way to fix the problem.
Bear in mind that English is everyone's second-favorite language, which means that probably half its speakers don't always feel comfortable recording or public speaking. This helps them over the hump.
I haven't gotten the chance to try it out yet, but an alternative in this space is Komposition, which bills itself as "a video editor built for screencasters". I gather that mostly means that if you take certain liberties when recording your screen and voice (putting pauses in the right places), Komposition will take care of automatically splitting your input media based on when it determines a transition.
Slightly different aim compared to Video Puppet (the source being plain text is not the goal, which means you will likely have to edit and re-record a script multiple times) but still interesting, especially you'd rather avoid an auto-generated voice.
you can easily replace auto-generated voice with your own, or a professional recording in Video Puppet scripts. Just add
(audio: file.wav) to your scene.
This is amazing! I'm going to have a lot of fun with this. I would love to be able to save these as videos but I guess I could just use my mac's screen record functionality
edit: I'm going down a rabbit hole looking through your site. Digging the twisted early internet aesthetic.
Getting a real kick out of using Video Puppet. The idea of creating a video from assets and a script is not a new one, I first saw it in the context of Real Estate at a Kaltura conference back in 2012:
The existing tools for doing this sort of thing seem to either require quite a bit of programming / video skills e.g. Media Lovin' Toolkit, ffmpeg, sox, jimp, ImageMagick etc or they are templated / opinionated tools like https://www.magisto.com/
What I love about Video Puppet is that it provides a simple and easy to use set of tools and an API that through GitHub actions allows you to put version control and early/often feedback loops at the heart of your projects.
I'm using it to document the development story and back story of an Indie Video Game I'm working on. Previously I was doing it as a Google doc which I was sharing with my collaborators.
With Video Puppet, it requires little more overhead - I was writing this stuff already - but when I see and hear the results played back I can immediately see whether the story makes sense or not. I can see if I am jumping into talking about something I haven't set up properly or if I am trying to say too much.
One thing that would help me is to get feedback on fails in the markdown script quicker, before even pushing to GitHub. For code, including things like Terraform, I'd use a linter, or CircleCI has a validator tool you can run locally.
The other place I'm going to start using it is for describing defects in a product I am coaching a team on. Previously I would do a screen cap and then upload that to frame.io. Now I can do the screen cap, describe the problem and stick the whole lot into version control with a bunch of github actions to point the team to the resulting video.
I will be following this product closely and actively using it.
I'm building the reverse, video to markdown. Paircast combines screen recording, voice transcriptions, and code changes into a markdown guide.
http://paircast.io
Wow, that is FANTASTIC. I've not tried it yet, but it looks like a very approachable execution of a brilliant idea. I'm a DevRel who's fascinated by DX and I WANT THIS.
It's a shame it doesn't also capture the code's output and, ideally, the state of the interpreter. For example: at 4:45 in the demo video, he tries to run his code and it fails with an error. It's important for both coding tutorials and DX analysis to capture the text of the output/error.
What would be even better would be capturing the error _and_ the detailed stack trace, ideally with the state of each stack frame. My employer produces SDKs for different languages, so it'd be invaluable for debugging.
I can imagine a couple of different ways of doing this which might not be horrifically complicated to add to the Paircast recorder, though I suspect you're already going down this road. If you'd like to chat more, yell!
Just completed full support for scripting videos as Markdown files using Video Puppet. Check out the post for some basic info. For more examples, see https://github.com/videopuppet/examples
I love this. I've been messing around with Premiere Pro and Audacity for the past couple of days trying to get more into making video. Video puppet looks way easier to debug and collaborate on since scrolling back in forth in your video looking for stuff gets very tedious very quickly.
Is there any way I can add my own voice and then still write the words that I want my voice to say?
VideoPuppet is excellent. I am using it to create videos for the Five Minutes Serverless Youtube channel, and so far, results are outstanding. I can create a video from the markdown file really fast.
It is an application/shared library for Linux, released as free software. It has a GUI program for live narration and one, "Vox", for creating video from PDF or still images using speech synthesis (Festival).
The Kinetophone shared library could be used as a plug in for presentation software. Kinetophone's file format is XML. I haven't updated it for years, and it does require occasional patches to support the latest FFMPEG. It was originally a commercial application for OS X called Ishmael, back in about '07 which I ported to Linux after my company went out of business.
I think this would be great for all the professors/teachers who suddenly have to teach courses online. If the lecture can be made beforehand, then the teacher can just focus on addressing questions or problems on zoom/skype(or whatever platform is used for teaching online)
I'm trying to imagine all the useful things you could do with code generated videos.
I'm imagining a daily routine of airplaying the video to your TV with an annotated dashboard of quantified self metrics, weather forecast, plotted local Covid-19 cases, health advisories, etc.
I am a tech writer and I write tasks and procedures using DITA-XML. I was thinking about transforming my .dita files to .mlt to use in shortcut/melt, but I think I'm going to use this instead.
Video Puppet can also process YAML and JSON files, so if you are running an automated conversion from XML, it might be easier to output JSON instead of Markdown; in any case, should you need help, drop me an email at [email protected]
Yes yes yes! I was literally thinking to implement this myself, but didn't have time. It's a shame doesn't appear to be open source though - I might still end up creating one.
I could see this being really useful for creating product onboarding video tutorials - wondering if there's an ability to preview and edit/adjust before exporting the final video?
Building a full video is fairly quick compared to traditional editing tools, so I haven't built any faster preview yet. I usually just build the whole thing and look at it, then tweak the script and build it again.
You can easily upload just the script file into an existing project and re-build the video as many times you like, then download the version you are happy with at the end.
This looks cool. Something I have looked for on and off without create success would be a fully scriptable NLE
Something like this that would support simple fades, transitions, and maybe animation. The kind of stuff you can do fairly easily in a video editor, but with lots of fiddly clicks and zooming in and out of timelines.
I'd like to have a script that let me specify when different source media start, when to apply effects, etc. All written as a basic text file.
Wow, I may be biased because this fills a particular niche usecase for me, but this is truly incredible.
I can't stand hearing the sound of my own voice, but do a lot of tutorial content production in Markdown for guides for learning material.
This would allow me to re-use all of the existing material I have, which already includes detailed step-by-step screenshots and text instructions, to make voice-over videos with slides and publish to Youtube. Amazing!
[+] [-] gregmac|6 years ago|reply
From the view of someone that occasionally watches videos on YouTube, I am trying to figure out a nice way to say... I hate it. Or more specifically, I hate that it generates the voice, and basically enables video content spam.
What we don't need more of is cheap, easy to automatically generate videos that are basically spam and/or clickbait, trying to get views. The problem with auto-generated voices in videos like this is as a viewer I can't distinguish between work that someone put deliberate production time into, and something churned out by a content farm. The demo video even tricked me at first, I didn't realize it was a generated voice until a couple sentences in, at which point I had a visceral negative reaction, the same as when I accidently click on a content farm-generated video.
It seems a major feature is automatically syncing the narration to the slides. Perhaps a way to enhance this while avoiding spam generation is to use the generated voice only for internal timing, and generated a karaoke-like display for a narrator (human) to read? As a paid service, you could even provide professional voice-offers as an add-on.
[+] [-] hobofan|6 years ago|reply
If machine voiced vs human voiced is the only discernible difference in the end, this seems like a non-argument.
As someone that is building a tool in the roughly same space (machine voiced video generation), I can just say that the use-cases go far beyond "content-farm". It also enables a lot of useful content like e.g. internal training videos, or paired with browser automation, you can have narrated always up-to-date video manuals of your product. In the education space, it enables a more iterative way to produce material where you previously couldn't afford to tweak parts of a video, as you would have to narrate it again.
And I also don't think that it will amplify the existence of such videos significantly. There are already Youtube channels that already do just that, and people don't seem to mind. E.g. there is a channel that uploads "car news" content, which basically just has a narration on top of a series of pictures of a car, and the amount of views and the rating on those videos is pretty good. In the end its just a few fact bulletins stretched into an overly long video using the same old worn out phrases (just like regular "car news"), and I don't see why a human would need to waste their time to voice that.
[+] [-] anomaloustho|6 years ago|reply
Manual editing and manual narration tends to act as a forcing function to review the information and approve its accuracy before publishing.
Auto-generated videos can often be published without a final review or fact check. As we see auto-generated video for things like product demos, and company training, it will open a new problem domain of catching “bugs” in those auto-generated videos.
[+] [-] adzicg|6 years ago|reply
[+] [-] adzicg|6 years ago|reply
There's a big difference between good content that is automated into a video, and spam. The key use case for this was helping me focus more on the content, rather than on fiddling with synchronisation and resizing assets. I'm not a native English speaker, and although I speak at quite a few conferences per year, listening to my broken English accent (which sounds like a Bond villain) in a video is quite distracting, even for me. Even with my best efforts to record my own voice professionally, generated voice sounds a lot better than what I can do.
[+] [-] Avamander|6 years ago|reply
[+] [-] tomcam|6 years ago|reply
[+] [-] baxtr|6 years ago|reply
[+] [-] g_langenderfer|6 years ago|reply
[+] [-] jez|6 years ago|reply
https://owickstrom.github.io/komposition/
Slightly different aim compared to Video Puppet (the source being plain text is not the goal, which means you will likely have to edit and re-record a script multiple times) but still interesting, especially you'd rather avoid an auto-generated voice.
[+] [-] adzicg|6 years ago|reply
[+] [-] elviejo|6 years ago|reply
[+] [-] kickscondor|6 years ago|reply
Seems you could do something along these lines to avoid the video generation part.
[+] [-] flanbiscuit|6 years ago|reply
edit: I'm going down a rabbit hole looking through your site. Digging the twisted early internet aesthetic.
[+] [-] solstice|6 years ago|reply
[+] [-] lightyrs|6 years ago|reply
[+] [-] worldofchris|6 years ago|reply
https://connect.mediaspace.kaltura.com/media/Automated+Video...
The existing tools for doing this sort of thing seem to either require quite a bit of programming / video skills e.g. Media Lovin' Toolkit, ffmpeg, sox, jimp, ImageMagick etc or they are templated / opinionated tools like https://www.magisto.com/
What I love about Video Puppet is that it provides a simple and easy to use set of tools and an API that through GitHub actions allows you to put version control and early/often feedback loops at the heart of your projects.
I'm using it to document the development story and back story of an Indie Video Game I'm working on. Previously I was doing it as a Google doc which I was sharing with my collaborators.
With Video Puppet, it requires little more overhead - I was writing this stuff already - but when I see and hear the results played back I can immediately see whether the story makes sense or not. I can see if I am jumping into talking about something I haven't set up properly or if I am trying to say too much.
One thing that would help me is to get feedback on fails in the markdown script quicker, before even pushing to GitHub. For code, including things like Terraform, I'd use a linter, or CircleCI has a validator tool you can run locally.
The other place I'm going to start using it is for describing defects in a product I am coaching a team on. Previously I would do a screen cap and then upload that to frame.io. Now I can do the screen cap, describe the problem and stick the whole lot into version control with a bunch of github actions to point the team to the resulting video.
I will be following this product closely and actively using it.
Greak work Gojko!
[+] [-] tomatohs|6 years ago|reply
[+] [-] yoz|6 years ago|reply
It's a shame it doesn't also capture the code's output and, ideally, the state of the interpreter. For example: at 4:45 in the demo video, he tries to run his code and it fails with an error. It's important for both coding tutorials and DX analysis to capture the text of the output/error.
What would be even better would be capturing the error _and_ the detailed stack trace, ideally with the state of each stack frame. My employer produces SDKs for different languages, so it'd be invaluable for debugging.
I can imagine a couple of different ways of doing this which might not be horrifically complicated to add to the Paircast recorder, though I suspect you're already going down this road. If you'd like to chat more, yell!
[+] [-] fudged71|6 years ago|reply
[+] [-] adzicg|6 years ago|reply
[+] [-] capableweb|6 years ago|reply
In the meantime, could you write a bit what different pieces of technology/services you're using to build all this?
[+] [-] tdalaa|6 years ago|reply
[+] [-] hombre_fatal|6 years ago|reply
- Make the sample script response header "Content-Type: text/plain" so that it renders in the browser instead of downloading a file.
- Make the sample video demonstrate the three features it says it has, like image captions.
[+] [-] formalsystem|6 years ago|reply
Is there any way I can add my own voice and then still write the words that I want my voice to say?
[+] [-] adzicg|6 years ago|reply
You could create a custom brand voice with Amazon, and we can then integrate it into Video Puppet.
- https://aws.amazon.com/about-aws/whats-new/2020/02/amazon-po...
[+] [-] slobodan_|6 years ago|reply
[+] [-] tir-kaval|6 years ago|reply
https://savannah.nongnu.org/projects/kinetophone
It is an application/shared library for Linux, released as free software. It has a GUI program for live narration and one, "Vox", for creating video from PDF or still images using speech synthesis (Festival).
http://download-mirror.savannah.gnu.org/releases/kinetophone...
The Kinetophone shared library could be used as a plug in for presentation software. Kinetophone's file format is XML. I haven't updated it for years, and it does require occasional patches to support the latest FFMPEG. It was originally a commercial application for OS X called Ishmael, back in about '07 which I ported to Linux after my company went out of business.
[+] [-] mauricesvay|6 years ago|reply
[+] [-] peter1125|6 years ago|reply
[+] [-] fudged71|6 years ago|reply
I'm imagining a daily routine of airplaying the video to your TV with an annotated dashboard of quantified self metrics, weather forecast, plotted local Covid-19 cases, health advisories, etc.
[+] [-] majkinetor|6 years ago|reply
Only naration is useful form of presentation but you don't need this tech to do so.
[+] [-] simbas|6 years ago|reply
[+] [-] adzicg|6 years ago|reply
[+] [-] bArray|6 years ago|reply
[+] [-] npollock|6 years ago|reply
[+] [-] adzicg|6 years ago|reply
Building a full video is fairly quick compared to traditional editing tools, so I haven't built any faster preview yet. I usually just build the whole thing and look at it, then tweak the script and build it again.
You can easily upload just the script file into an existing project and re-build the video as many times you like, then download the version you are happy with at the end.
[+] [-] aquark|6 years ago|reply
Something like this that would support simple fades, transitions, and maybe animation. The kind of stuff you can do fairly easily in a video editor, but with lots of fiddly clicks and zooming in and out of timelines.
I'd like to have a script that let me specify when different source media start, when to apply effects, etc. All written as a basic text file.
Anything obvious out there I've missed?
[+] [-] adzicg|6 years ago|reply
you can set transitions globally in the document header, or on individual scenes. for example, just add
for a 0.2sec cross-fade transition between scenesvideo segments (different source media start) are also supported. You can do something like:
Check out video and transition sections here for more info: https://videopuppet.com/docs/format/[+] [-] pomber|6 years ago|reply
[1]: https://github.com/jxnblk/mdx-deck
[+] [-] gavinray|6 years ago|reply
I can't stand hearing the sound of my own voice, but do a lot of tutorial content production in Markdown for guides for learning material.
This would allow me to re-use all of the existing material I have, which already includes detailed step-by-step screenshots and text instructions, to make voice-over videos with slides and publish to Youtube. Amazing!