has anyone benchmarked qoa to see roughly how many instructions per sample it needs? all i see here is that it's more than adpcm and less than mp3, but those differ by orders of magnitude
like, can you reasonably qoa-compress real-time 16ksps audio on a 16 megahertz atmega328?
hmm, https://phoboslab.org/log/2023/04/qoa-specification has some benchmark results, let's see... seems like he encoded 9807 seconds of 44.1ksps stereo in 25.8 seconds and decoded it in 3.00 seconds on an i7-6700k running singlethreaded. what does that imply for other machines?
it seems to be integer code (because reproducibility between the predictor in encoding and decoding is important, and a significant part of it is 16-bit. https://ark.intel.com/content/www/xl/es/ark/products/88195/i... says it's a 4.2 gigahertz skylake. agner says skylake can do 4–6 ipc (well, μops/cycle) https://www.agner.org/optimize/blog/read.php?i=628, coincidentally testing on an i7-6700k himself, but let's assume it's 3 ipc, because it's usually hard to reach even that level of ilp in useful code
so that's about 380 μops per sample if i'm doing my math right; that might be on the order of 400 32-bit integer instructions per sample on an in-order processor. if (handwaving wildly now!) that's 600 8-bit instructions, the atmega328 should be able to encode somewhere in the range of 16–32 kilosamples per second
so, quite plausibly
for decoding the same math gives 43 μops per sample rather than 380
i'm very interested to hear anyone else's benchmarks or calculations
Comparing against 4bit ADPCM, which is already able to give quite good performance as long as your sample rates are relatively modern, this only improves it to 3.2 bits. It is fast, but ADPCM is also fast.
Would be nice to see joint stereo support. If you were to take ADPCM or this OK format and try to encode any stereo music with it, you will need 2 channels. However, there is an extremely advantageous optimization that can be made here - most music is largely center panned, so both channels are almost the same. With joint stereo you record one channel (either by picking one or mixing to an average) and then you can store the difference for the other channel which will occupy a lot fewer bits, assuming you are able to quantize away the increased entropy.
For example, instead of using two 4bit ADPCM channels for stereo, which would only be a 50% savings over uncompressed, you could probably use an average of 5 bits per sample.
I like the philosophy of QOA (and other similar projects, including QOI and TinyVG), but unlike others, it seems like it's not ready to use yet, see https://github.com/phoboslab/qoa/issues/25
> I have just pushed a workaround to master. [...]
> This still introduces audible artifacts when the weights reset. It prevents the LMS from exploding, but is far from perfect :/
This, combined with the fact that that issue is still open mean that a breaking change is still to be expected.
It's interesting that this works in the time domain (instead of frequency domain), and I wonder what the resulting quality limitations are, if any. The sound samples on the demo page, at the least the dozen I clicked on, didn't seem all that challenging. Few, mostly synthesized instruments, low dynamic range. My ears aren't good enough to evaluate audio codecs anyway, however.
LFE audio channel is different from subwoofer output.
Subwoofers come with multichannel audio systems in which directional speakers usually can't cover the lower range of audio frequencies. They are responsible for bass content from all channels, and get it from software or hardware crossover filter which is independent from specific input formats. Placement of low frequency speaker does not matter much because of human perception.
LFE track is an additional effects channel for movie theaters and similar amusement rides in which audio system plays low frequencies from other channels just fine. Dedicated LFE emitter then adds rattling and other wub-wub effects without overloading audio speakers with all that extra energy. Movies that lack car chases and explosions routinely have completely silent LFE tracks.
LFE is usually a bass shaker which is a subwoofer but it moves a weight instead of a cone, so you get vibrations in your seat. It stimulates movement to your body somewhat, I use two for my sim racing rig, one under my seat to inform me of the car dynamics and immersive feeling, one under my pedals to inform me when ABS is active and when my tires are spinning.
I looked around, but didn't see any mention of potential patent issues. I assume that this has been considered? The Ogg Vorbis people spent a lot of time on that back when they were developing their format.
HTML+CSS, converted to PDF via the Save As PDF feature in Firefox. (Or the same could be done with other browsers, but this one apparently comes from FF.)
That in terms of quality per any bitrate it comes nowhere near ubiquitous formats like AAC or MP3 when produced with good encoders. But it's good to have (possibly) patent-free solutions available.
The author wrote a very simple MPEG[1] decoder, so there's an obvious benchmark for making that even simpler.
I personally wouldn't mind a Quite OK Page Description Langage. Something that gets you most of PDF/PS/HPGL without all the effort. Could use the Quite OK Image Format for bitmap images. Not sure whether you'd need a Quite OK Vector Format and/or a Quite OK Font Format as prerequisites…
Quite OK browser. It doesn't have webgl, webgpu or other fancy and easy to exploit stuff, but it renders 95% of websites and source code is easy enough to be maintained with very few people.
Quite OK JS Plotting Library (QOJSPL, nice, sounds like my cat walking on the keyboard). With an intuitive, documented API that doesn't require you to dig through tons of examples on sites that take ages to load. Because, no, a massive stash of non-orthogonal examples does not replace a documentation.
AKA last Tuesday morning frustration : I wanted to make interactive plots on a web page to explain math stuffs.
kragen|2 years ago
like, can you reasonably qoa-compress real-time 16ksps audio on a 16 megahertz atmega328?
hmm, https://phoboslab.org/log/2023/04/qoa-specification has some benchmark results, let's see... seems like he encoded 9807 seconds of 44.1ksps stereo in 25.8 seconds and decoded it in 3.00 seconds on an i7-6700k running singlethreaded. what does that imply for other machines?
it seems to be integer code (because reproducibility between the predictor in encoding and decoding is important, and a significant part of it is 16-bit. https://ark.intel.com/content/www/xl/es/ark/products/88195/i... says it's a 4.2 gigahertz skylake. agner says skylake can do 4–6 ipc (well, μops/cycle) https://www.agner.org/optimize/blog/read.php?i=628, coincidentally testing on an i7-6700k himself, but let's assume it's 3 ipc, because it's usually hard to reach even that level of ilp in useful code
so that's about 380 μops per sample if i'm doing my math right; that might be on the order of 400 32-bit integer instructions per sample on an in-order processor. if (handwaving wildly now!) that's 600 8-bit instructions, the atmega328 should be able to encode somewhere in the range of 16–32 kilosamples per second
so, quite plausibly
for decoding the same math gives 43 μops per sample rather than 380
i'm very interested to hear anyone else's benchmarks or calculations
g0xA52A2A|2 years ago
3 months ago - https://news.ycombinator.com/item?id=35738817
6 months ago - https://news.ycombinator.com/item?id=34625573
kragen|2 years ago
these had crucial information for me
mips_r4300i|2 years ago
Would be nice to see joint stereo support. If you were to take ADPCM or this OK format and try to encode any stereo music with it, you will need 2 channels. However, there is an extremely advantageous optimization that can be made here - most music is largely center panned, so both channels are almost the same. With joint stereo you record one channel (either by picking one or mixing to an average) and then you can store the difference for the other channel which will occupy a lot fewer bits, assuming you are able to quantize away the increased entropy.
For example, instead of using two 4bit ADPCM channels for stereo, which would only be a 50% savings over uncompressed, you could probably use an average of 5 bits per sample.
anotherhue|2 years ago
This was/is available in MP3 since forever, so seems a reasonable request.
https://wiki.hydrogenaud.io/index.php?title=Intensity_stereo
gaazoh|2 years ago
> I have just pushed a workaround to master. [...]
> This still introduces audible artifacts when the weights reset. It prevents the LMS from exploding, but is far from perfect :/
This, combined with the fact that that issue is still open mean that a breaking change is still to be expected.
codeflo|2 years ago
Pet_Ant|2 years ago
It should be spelled out explicitly, but I figured out the rest
L-Left,R-Right,C-Center,FL-Front Left,FR-FrontRight,SL-SideLeft,SR-SideRight,BL-BackLeft,BR-BackRight
---
Edit: LFE-LowFrequencyEffects... so subwoofer?
https://www.dolby.com/uploadedFiles/Assets/US/Doc/Profession...
ogurechny|2 years ago
Subwoofers come with multichannel audio systems in which directional speakers usually can't cover the lower range of audio frequencies. They are responsible for bass content from all channels, and get it from software or hardware crossover filter which is independent from specific input formats. Placement of low frequency speaker does not matter much because of human perception.
LFE track is an additional effects channel for movie theaters and similar amusement rides in which audio system plays low frequencies from other channels just fine. Dedicated LFE emitter then adds rattling and other wub-wub effects without overloading audio speakers with all that extra energy. Movies that lack car chases and explosions routinely have completely silent LFE tracks.
entropicdrifter|2 years ago
ok_dad|2 years ago
bravura|2 years ago
MobiusHorizons|2 years ago
Turing_Machine|2 years ago
Other than that, looks great!
speedgoose|2 years ago
https://en.m.wikipedia.org/wiki/Software_patents_under_the_E...
marcoc|2 years ago
GraemeMeyer|2 years ago
jfk13|2 years ago
crumpled|2 years ago
That's not what they did, apparently.
The document properties call out https://cairographics.org
unknown|2 years ago
[deleted]
rockstarflo|2 years ago
DamonHD|2 years ago
daneel_w|2 years ago
ape4|2 years ago
p1mrx|2 years ago
mhd|2 years ago
I personally wouldn't mind a Quite OK Page Description Langage. Something that gets you most of PDF/PS/HPGL without all the effort. Could use the Quite OK Image Format for bitmap images. Not sure whether you'd need a Quite OK Vector Format and/or a Quite OK Font Format as prerequisites…
[1]: https://phoboslab.org/log/2019/06/pl-mpeg-single-file-librar...
dvh|2 years ago
marmakoide|2 years ago
AKA last Tuesday morning frustration : I wanted to make interactive plots on a web page to explain math stuffs.
extua|2 years ago
bartwe|2 years ago
ericls|2 years ago
Aldipower|2 years ago