top | item 28855654

Using ASCII waveforms to test real-time audio code

121 points| jwosty | 4 years ago |goq2q.net

70 comments

order

phab|4 years ago

This approach is neat for observability, but it's worth noticing that it essentially quantises all of your samples down to the vertical resolution of your graph. If you somehow introduced a bug that caused an error that was smaller than the step size then these tests wouldn't catch it.

(e.g. if you somehow managed to introduce a constant DC-offset of +0.05, with the shown step size of 0.2, these tests would probably never pick it up, modulo rounding.)

That said, these tests are great for asserting that specific functionality does broadly what it says on the tin, and making it easy to understand why not if they fail. We'll likely start using this technique at Fourier Audio (shameless plug) as a more observable functionality smoke test to augment finer-grained analytic tests that assert properties of the output waveform samples directly.

jwosty|4 years ago

That's true that it quantizes (aka bins) the samples, so it isn't right for tests that need to be 100% sample-perfect, at least vertically speaking. I suppose it is a compromise between a few tradeoffs - easy readability just from looking at the code itself (you could do images, but then there's a separate file you have to keep track of, or you're looking at binary data as a float[]) vs strict correctness. The evaluation of these tradeoffs would definitely depend on what you're doing, and in my case, most of the potential bugs are going to relate to horizontal time resolution, not vertical sample depth resolution.

If the precise values of these floats is important in your domain (which it very well may be), a combination of approaches would probably be good! Would love to hear how well this approach works for you guys. Keep me updated :)

PaulDavisThe1st|4 years ago

A more accurate and only slightly more complex process for this is to generate numerical text representations of the desired test waveforms and then feed them through sox to get actual wave files. The numerical text representations are likely even easier to generate programmatically than the ascii->audio transformation.

rkangel|4 years ago

I was thinking that maybe that lack of precision was a good thing. Makes your tests less fragile.

I agree though that you probably want to augment this with some form of assertion about noise level to check the high frequency smaller components.

contravariant|4 years ago

I suppose you could use a patterned dither / sigma-delta to get a slightly bigger chance of finding differences.

user-the-name|4 years ago

Imagine if we had terminals that could handle graphical data. We wouldn't have to do weird kludges like this, we could just plot the waveforms in the output of our tools.

But it's 2021, and not only is this not possible, there is not even a path forward to a world where this would be possible. It's just not an option. Nobody is working on this, nobody is trying to make this happen. We're just sitting here with our text terminals, and we can't even for a second imagine that there could be anything else.

It's sad, is what it is.

charlesdaniels|4 years ago

I would point out that sixels[0] exist. There is a nice library, libsixel[1] for working with it, which includes bindings into many languages. If the author of sixel-tmux[2][3] is to be believed[4], the relative lack of adoption is a result of unwillingness on the part of maintainers of some popular open source terminal libraries to implement sixel support.

I can't comment on that directly, but I will say, it's pretty damn cool to see GnuPlot generating output right into one's terminal. lsix[5] is also pretty handy as well.

But yeah, I agree, I'm not a fan of all the work that has gone into "terminal graphics" that are based on unicode. It's a dead-end, as was clear to DEC even back in '87 (and that's setting aside that the VT220[6] had it's own drawing capabilities, though they were more limited). Maybe sixel isn't the best possible way of handling this, but it does have the benefit of 34 years of backwards-compatibility, and with the right software, you can already use it _now_.

0 - https://en.wikipedia.org/wiki/Sixel

1 - https://saitoha.github.io/libsixel/

2 - https://github.com/csdvrx/sixel-tmux

3 - https://news.ycombinator.com/item?id=28756701

4 - https://github.com/csdvrx/sixel-tmux/blob/main/RANTS.md

5 - https://github.com/hackerb9/lsix

6 - https://en.wikipedia.org/wiki/VT220

gwbas1c|4 years ago

> It's sad, is what it is.

With graphics being everywhere in 2021, I wouldn't call this situation "sad," I'd think a lot more critically about why.

To start with, fixed-width text is significantly easier to work with than graphics.

Nothing's stopping anyone from writing a CI tool that outputs to HTML with embedded images. The bigger question is why it's uncommon.

MayeulC|4 years ago

In truth, it's because text is quite easy to handle. It's easy to make a program that handles text, too.

And so we have a lot of text editors, diff tools, efficient compression, tools like sort and uniq: the whole unix ecosystem.

So if you transform sound to text, you can then use text tools to compare the output to catch differences. A simple serialization of numerical sample values would have caught the bug, but I agree that having a way of visualizing the output is nice.

Command line input, programming, etc. is also still mostly done with text, because it's easy to transform. Of course, you can imagine working at a higher level with objects (like powershell does IIRC), mimetypes, etc.

rbanffy|4 years ago

The venerable xterm and a lot of later physical terminals (those things with CRTs) can emulate Tektronix (Tektronix, that today makes instruments, also made computer terminals with fancy storage CRTs that were kind of e-paper-like, but green - and sometimes yellow - screen) graphics. iTerm2 and some others, as pointed out, can do Sixel graphics (a format designed originally for DEC dot-matrix printers that some DEC terminals also implement).

voldacar|4 years ago

In TempleOS you can mix text, images, hyperlinks, and 3d models in the terminal. This is true for the whole system: you could literally have a spinning 3d model of a tank as a comment in a source file. That's right, it took a literal schizophrenic to make an OS with a feature that should have been standard decades ago.

Nobody tries to make actually interesting new operating systems anymore. OS research today is just "let's implement unix with $security_feature", nobody is actually trying to make computers more powerful or fun to use, or design a system based off of a first-principles understanding of what a computer should be.

God I wish I was born in the lisp machine timeline

HPsquared|4 years ago

Notebook interfaces are basically that, e.g. Jupyter or Mathematica.

kevin_thibedeau|4 years ago

This isn't a graphical problem. All that's required is storing arrays of validation data and a diff tool to check for mismatches. Visualizing the results is useful for failure analysis but not a core requirement. That can readily be done with free tools like matplotlib. We live in that world today.

zokier|4 years ago

> Imagine if we had terminals that could handle graphical data.

We have. They are called "browsers". You might be even using one right now!

sudara|4 years ago

Nice! I became obsessed with rendering sparkline representations of chunks of audio for the same reason: to inspect failures when writing tests / refactoring. I wrote a JUCE module (C++) and integration with lldb to make it quick to inspect chunks of audio in the IDE: https://github.com/sudara/melatonin_audio_sparklines

robotsteve2|4 years ago

Once you've got the waveforms as arrays, what do you need the ASCII rendering for?

Instead of diffing ASCII-rendered waveforms, save the arrays and diff the arrays (and then use any kind of numerical metric on the residual). Scientist programmers have all sorts of techniques for testing and debugging software that processes sampled signals.

jwosty|4 years ago

It's usually gonna be easier to tell what went wrong in an ASCII string array than a raw float[]. It's for the human reading/fixing the test.

rbanffy|4 years ago

If we go beyond ASCII, Unicode specifies 2x2 mosaics since ever (they were present in DEC terminals) and 2x3 mosaics (from Teletext and the TRS-80) since version 13. Some more enlightened terminals (such as VTE) implement those symbols without the need of font support.

Or you can use Braille to get 2x4 mosaics, but they usually look terrible.

jwosty|4 years ago

I just might have to try this next.

necubi|4 years ago

This is such a great idea! I've really struggled with how to test real-time audio code in the live looper I've been working on [0]. Most of my tests use either very small, hand-constructed arrays, or arrays generated by some function.

This is both tedious and makes it very hard to debug test failures (especially with cases like crossfades, pan laws, and looping). I love the idea of having a visual representation that lets me see what's going wrong in the test output, and I'm definitely going to try to implement some similar tests.

I'm also curious what the state-of-the-art is for these sorts of tests. Does anyone have insight into what e.g., ableton's test suite looks like?

[0] http://github.com/mwylde/loopers

jwosty|4 years ago

> I'm also curious what the state-of-the-art is for these sorts of tests. Does anyone have insight into what e.g., appleton's test suite looks like?

I don't know, but if I were to make an educated guess, maybe rendering stuff to actual audio files is a common approach? That way when something goes wrong, they can inspect it in a standard waveform editor?

bitwize|4 years ago

That's so cool, and reminds me of how I used Gnuplot as a makeshift oscilloscope to test and evaluate some (not real time) software synthesis I was doing.

spicybright|4 years ago

Why would you use ascii for something like a waveform, something that's inherently a graph?

Sure, maybe you don't need that much resolution for what the use case is. But it's the equivalent of looking at a graph and squinting your eyes to blur it.

jwosty|4 years ago

In short, because text is much easier to deal with than bitmaps, and there is much more tooling that "just works" for text than actual graphics, like Expecto's textual diffing in assertations. @MayeulC said it well: https://news.ycombinator.com/item?id=28856884

focom|4 years ago

Would love to use it as a library! Is it open source?

jwosty|4 years ago

Not yet, but it certainly could be. Would it be useful to publish the helper classes that render the waves out to ASCII? That's really the guts of the thing. After that, you just use whatever testing framework you want to do the actual diffing (in my case Expecto for F#).

rbanffy|4 years ago

Am I the only one almost offended by Braille not being ASCII?

edit: Yes. I miscalculated the dot density.

/me slaps forehead

thewakalix|4 years ago

Aren't those asterisks?

munchler|4 years ago

This is great. People are doing very cool things with F# these days.

jwosty|4 years ago

Thanks, I like to think so! I didn't see other people doing much audio programming in F#, so I figured someone would be interested in seeing what it can look like.