There’s one thing that bothers me about this. Sure, PCM sampling is a lossless representation of the low frequency portions of a continuous signal. But it is not a latency-free representation. To recover a continuous signal covering the low frequencies (up to 20kHz) from PCM pulses at a sampling frequency f_s (f_s >= 40kHz), you turn each pulse into the appropriate kernel (sinc works and is ideal in a sense, but you probably want to low-pass filter the result as well), and that gives you the decoded signal. But it’s not causal! To recover the signal at time t, you need some pulses from times beyond t. If you’re using the sinc kernel, you need quite a lot of lookahead, because sinc decays very slowly and you don’t want to cut it off until it’s decayed enough.
So if you want to take a continuous (analog) signal, digitize it, then convert back to analog, you are fundamentally adding latency. And if you want to do DSP operations on a digital signal, you also generally add some latency. And the higher the sampling rate, the lower the latency you can achieve, because you can use more compact approximations of sinc that are still good enough below 20kHz.
None of this matters, at least in principle, for audio streaming over the Internet or for a stored library — there is a ton of latency, and up to a few ms extra is irrelevant as long as it’s managed correctly when at synchronizing different devices. But for live sound, or for a potentially long chain of DSP effects, I can easily imagine this making a difference, especially at 44.1ksps.
I don’t work in audio or DSP, and I haven’t extensively experimented. And I haven’t run the numbers. But I suspect that a couple passes of DSP effects or digitization at 44.1ksps may become audible to ordinary humans in terms of added latency if there are multiple different speakers with different effects or if A/V sync is carelessly involved.
Sampling does not lose information below the Nyquist limit, but quantization does introduce errors that can't be fixed. And resampling at a different rate might introduce extra errors, like when you recompress a JPEG.
The xiphmont link is pretty good. Reminded me of the nearly-useless (and growing more so every day) fact that incandescent bulbs not only make some noise, but the noise increases when the bulb is near end of life. I know this from working in an anechoic chamber lit by bare bulbs hanging by cords in the chamber. We would do calibration checks at the start of the day, and sometimes a recording of a silent chamber would be louder than normal and then we'd go in and shut the door and try to figure out which bulb was the loud one.
> All audio signal is _perfectly_ represented in a digital form.
That is not true... A 22kHz signal only has 2 data points for a sinusoidal waveform. Those 2 points could be anywhere I.e you could read 0 both times the waveform is sampled.... See Nyquist theorem.
From memory changing the sample rate can cause other issues with sample aliasing sue to the algorithms used...
This is a nice video. But I’m wondering: do we even need to get back the original signal from the samples? The zero-order hold output actually contains the same audible frequencies doesn’t it? If we only want to listen to it, the stepped wave would be enough then
> All audio signal is _perfectly_ represented in a digital form
What? No. All bandwidth limited signal is. Which means periodic. Causal signals like audio can be approximated, with tradeoffs. Such as pre-ringing (look at sinc(x), used to reconstruct sampled signal — how much energy is in the limb preceding the x=0.)
Is the approximation achieved by filtering the 44.1kHz DAC good enough? Yes, yes it is. But the math is way more involved (i.e. beyond me) than simply "Niquist".
This popular myth that limited frequencies we can hear and limited frequencies in Fourier transform sense is the same thing is quite irritating.
As a real world example, on Windows, unless you take exclusive access of the audio output device, everything is already resampled to 48khz in the mixer. Well, technically it gets resampled to the default configured device sample rate, but I haven't seen anything other than 48khz in at least a decade if ever. Practically this is a non-issue, though I could understand wanting bit-perfect reproduction of a 44.1 khz source.
> We do [cubic curve fitting] all the time in image processing, and it works very well. It would probably work well for audio as well, although it's not used -- not in the same form, anyway -- in these applications.
Is there a reason the solution that "works very well" for images isn't/can't be applied to audio?
The short answer is that our eyes and ears use very different processing mechanisms. Our eyes sense using rods and cones where the distribution of them reflects a spatial distribution of the image. Our ears instead work by performing an analogue forier transform and hearing the frequencies. If you take an image and add lots of very high frequency noise, the result will be almost indistinguishable, but if you do the same for audio it will sound like a complete mess.
> it's probably worth avoiding the resampling of 44.1 to 48 kHz
Ehhm, yeah, duh? You don't resample unless there is a clear need, and even then you don't upsample and only downsample, and you tell anyone that tries to convince you otherwise to go away and find the original (analog) source, so you can do a proper transfer.
That seems a rather shallow - and probably incorrect - reading of the source. This is an efficiency and trust trade off as noted:
> given sufficient computing resources, we can resample 44.1 kHz to 48 kHz perfectly. No loss, no inaccuracies.
and then further
> Your smartphone probably can resample 44.1 kHz to 48 kHz in such a way that the errors are undetectable even in theory, because they are smaller than the noise floor. Proper audio equipment can certainly do so.
That is you don't need the original source to do a proper transfer. The author is simply noting
> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.
That is that re-sampling is not a bad idea in this case because it's going to have any sort of error if done properly, it's just that the Author notes you cannot trust any random given re-sampler to do so.
Therefore if you do need to resample, you can do so without the analog source, as long as you have a re-sampler you can trust, or do it yourself.
I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.
But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.
If 44.1kHz is otherwise sufficient but you have a downstream workflow that is incompatible, there are arguments for doing this. It can be done with no loss in quality.
From an information theory perspective, this is like putting a smaller pipe right through the middle of a bigger one. The channel capacity is the only variable that is changing and we are increasing it.
A very common clear need is incorporating 44.1khz audio sourcesinto video. 48khz is 48khz because 48khz divided by 24fps, 25fps, or 30fps is an integer (and 44.1khz is not).
Also, for decades upsampling on ingest and downsampling on egress has been standard practice for DSP because it reduces audible artifacts from truncation and other rounding techniques.
Finally, most recorded sound does not have an original analog source because of the access digital recording has created…youtube for example.
I wonder if this problem could be "solved" by having some kind of "dual mode" DACs that can accept two streams of audio at different sample rates, likely 44.1khz and 48khz, which are converted to analog in parallel and then mixed back together at the analog output.
Then at the operating system level rather than mixing everything to a single audio stream at a single sample rate you group each stream that is at or a multiple of either 44.1khz or 48khz and then finally sends both streams to this "dual dac", thus eliminating the need to resample any 44.1khz or 48khz stream, or even vastly simplifying the resample of any sample rate that is a multiple of this.
> I wonder if this problem could be "solved" by having some kind of "dual mode" DACs that can accept two streams of audio at different sample rates, likely 44.1khz and 48khz, which are converted to analog in parallel and then mixed back together at the analog output.
You'd just resample both at 192kHz and run it into 192kHz DAC. The "headroom" means you don't need to use the very CPU intensive "perfect" resample.
I'm kinda shocked that there's no discussion of sinc interpolation and adapting it's theoretical need for infinite signals to some finite kernel length.
For a sampled signal, if you know the sampling satisfied Nyquist (i.e., there was no frequency content above fs/2) then the original signal can be reproduced exactly at any point in time using sinc interpolation. Unfortunately that theoretically requires an infinite length sample, but the kernel can be bounded based on accuracy requirements or other limiting factors (such as the noise which was mentioned). Other interpolation techniques should be viewed as approximations to sinc.
Sinc interpolation is available on most oscilloscopes and is useful when the sample rate is sufficient but not greatly higher than the signal of interest.
> In reality, the amount of precision that can actually be "heard" by the human ear probably lies between 18 and 21 bits; we don't actually know, because it's impossible to test.
This sounds contradictory - what would be the precision that can be heard in a test then?
Lots of Live/Audigy era Creative sound cards would resample everything to 48kHz, with probably one of the worst quality resamplers available, to the chagrin of all bitperfect fanatics... still probably one of their best selling sound cards.
I had a Soundblaster Live! Gold card back in the day, and I would route my record player or stereo through it so I could use a visualizer on my computer. You could hear the digital noise that was introduced on the highhats. And the source for the sound was a late '70s era Realistic system where everything was analogue. I never knew it was because of the soundcard. I'd always just chalked it up to either Windows XP or VLC doing something.
I'm not sure I understand the "just generate it" perspective. If you want to generate a much higher sampling rate signal that has a common multiple of your input and output sampling rate, "just generating it" is going to involve some kind of interpolation, no? Because you're trying to make data that isn't there.
If you want to change the number of slices of pizza, you can't simply just make 160x more pizza out of thin air.
Personally I'd just do a cubic resample if absolutely required (ideally you don't resample ofc); it's fast and straightforward.
Edit: serves me right for posting, I gotta get off this site.
Maybe the following helps: if you have a an analog signal where there are no frequencies above 22.05 khz, it is in principle possible to sample it at 44.1 khz and then perfectly reconstruct the original signal from those samples. You could also represent the same analog signal using 48 khz samples. The key to resampling is not finding a nice looking interpolation, but rather one that corresponds to the original analog signal.
You generally want to dither when (before) you quantize, unless you have so much headroom that it doesn't matter. E.g., if you're converting 44.1/16 to 48/16 (which involves quantizing each sample from an intermediate higher-precision result), you probably want to dither, but if you're converting 44.1/24 to 48/24, you probably won't need to care since you don't really care about whether your effective result is 24 or 24.5–25 bits.
That's more of a bit depth rather than bit rate thing. I was surprised to find that going from 16 to 8 bits by simply truncating gave really obvious artifacts on certain sounds (a sampled 808 kick for example had a distinct BZZZEEEOOOOWWWW sound, quite prominent), and even really simple triangular noise dithering made it go away. It did mean the playback was more noisy but it was less obvious.
As an aside, G.711 codecs use a kind of log scale with only four bits of signal but small signal values use much smaller bits.
If you're taking something from 44.1 to 48, only 91.875% of the data is real, so 8.125% of the resulting upsampled data is invented. Some of it will correlate with the original, real sound. If you use upsampling functions tuned to features of the audio - style, whether it's music, voice, bird recordings, NYC traffic, known auditorium, etc, you can probably bring the accuracy up by several percent. If the original data already has the optimizations, it'll be closer to 92%.
If it's really good AI upsampling, you might get qualitatively "better" sounding audio than the original but still technically deviates from the original baseline by ~8%. Conversely, there'll be technically "correct" upsampling results with higher overall alignment with the original that can sound awful.
There's still a lot to audio processing that's more art than science.
everfrustrated|1 month ago
I am ashamed to admit this took me a long time to properly understand. For further reading I'd recommend:
https://people.xiph.org/~xiphmont/demo/neil-young.html https://www.youtube.com/watch?v=cIQ9IXSUzuM
amluto|1 month ago
So if you want to take a continuous (analog) signal, digitize it, then convert back to analog, you are fundamentally adding latency. And if you want to do DSP operations on a digital signal, you also generally add some latency. And the higher the sampling rate, the lower the latency you can achieve, because you can use more compact approximations of sinc that are still good enough below 20kHz.
None of this matters, at least in principle, for audio streaming over the Internet or for a stored library — there is a ton of latency, and up to a few ms extra is irrelevant as long as it’s managed correctly when at synchronizing different devices. But for live sound, or for a potentially long chain of DSP effects, I can easily imagine this making a difference, especially at 44.1ksps.
I don’t work in audio or DSP, and I haven’t extensively experimented. And I haven’t run the numbers. But I suspect that a couple passes of DSP effects or digitization at 44.1ksps may become audible to ordinary humans in terms of added latency if there are multiple different speakers with different effects or if A/V sync is carelessly involved.
pxndxx|1 month ago
klaff|1 month ago
hdjxbsksbsh|1 month ago
That is not true... A 22kHz signal only has 2 data points for a sinusoidal waveform. Those 2 points could be anywhere I.e you could read 0 both times the waveform is sampled.... See Nyquist theorem.
From memory changing the sample rate can cause other issues with sample aliasing sue to the algorithms used...
brewmarche|1 month ago
comprev|1 month ago
I buy loads of DJ music on Bandcamp and "downsample" (I think the term is) to 16bit if they only offer 24bit for smaller size and wider compatability.
112233|1 month ago
What? No. All bandwidth limited signal is. Which means periodic. Causal signals like audio can be approximated, with tradeoffs. Such as pre-ringing (look at sinc(x), used to reconstruct sampled signal — how much energy is in the limb preceding the x=0.)
Is the approximation achieved by filtering the 44.1kHz DAC good enough? Yes, yes it is. But the math is way more involved (i.e. beyond me) than simply "Niquist".
This popular myth that limited frequencies we can hear and limited frequencies in Fourier transform sense is the same thing is quite irritating.
lucyjojo|1 month ago
the article explains why.
tldr: formula for regenerating signal at time t uses an infinite amount of samples in the past and future.
adzm|1 month ago
MontagFTB|1 month ago
Is there a reason the solution that "works very well" for images isn't/can't be applied to audio?
adgjlsfhk1|1 month ago
amlib|1 month ago
ZeroConcerns|1 month ago
Ehhm, yeah, duh? You don't resample unless there is a clear need, and even then you don't upsample and only downsample, and you tell anyone that tries to convince you otherwise to go away and find the original (analog) source, so you can do a proper transfer.
zipy124|1 month ago
> given sufficient computing resources, we can resample 44.1 kHz to 48 kHz perfectly. No loss, no inaccuracies.
and then further
> Your smartphone probably can resample 44.1 kHz to 48 kHz in such a way that the errors are undetectable even in theory, because they are smaller than the noise floor. Proper audio equipment can certainly do so.
That is you don't need the original source to do a proper transfer. The author is simply noting
> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.
That is that re-sampling is not a bad idea in this case because it's going to have any sort of error if done properly, it's just that the Author notes you cannot trust any random given re-sampler to do so.
Therefore if you do need to resample, you can do so without the analog source, as long as you have a re-sampler you can trust, or do it yourself.
mort96|1 month ago
I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.
But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.
Unless I'm missing something?
bob1029|1 month ago
From an information theory perspective, this is like putting a smaller pipe right through the middle of a bigger one. The channel capacity is the only variable that is changing and we are increasing it.
brudgers|1 month ago
Also, for decades upsampling on ingest and downsampling on egress has been standard practice for DSP because it reduces audible artifacts from truncation and other rounding techniques.
Finally, most recorded sound does not have an original analog source because of the access digital recording has created…youtube for example.
amlib|1 month ago
Then at the operating system level rather than mixing everything to a single audio stream at a single sample rate you group each stream that is at or a multiple of either 44.1khz or 48khz and then finally sends both streams to this "dual dac", thus eliminating the need to resample any 44.1khz or 48khz stream, or even vastly simplifying the resample of any sample rate that is a multiple of this.
PunchyHamster|1 month ago
You'd just resample both at 192kHz and run it into 192kHz DAC. The "headroom" means you don't need to use the very CPU intensive "perfect" resample.
klaff|1 month ago
For a sampled signal, if you know the sampling satisfied Nyquist (i.e., there was no frequency content above fs/2) then the original signal can be reproduced exactly at any point in time using sinc interpolation. Unfortunately that theoretically requires an infinite length sample, but the kernel can be bounded based on accuracy requirements or other limiting factors (such as the noise which was mentioned). Other interpolation techniques should be viewed as approximations to sinc.
Sinc interpolation is available on most oscilloscopes and is useful when the sample rate is sufficient but not greatly higher than the signal of interest.
oakwhiz|1 month ago
fulafel|1 month ago
This sounds contradictory - what would be the precision that can be heard in a test then?
yeasku|1 month ago
[deleted]
AshamedCaptain|1 month ago
I.e. no one cares.
Tanoc|1 month ago
pixelpoet|1 month ago
If you want to change the number of slices of pizza, you can't simply just make 160x more pizza out of thin air.
Personally I'd just do a cubic resample if absolutely required (ideally you don't resample ofc); it's fast and straightforward.
Edit: serves me right for posting, I gotta get off this site.
superjan|1 month ago
unknown|1 month ago
[deleted]
somat|1 month ago
Makes me think of GPS where the signal is below the noise floor. Which still blows my mind, real RF black magic.
cozzyd|1 month ago
nivea3066|1 month ago
bobbylarrybobby|1 month ago
Sesse__|1 month ago
ErroneousBosh|1 month ago
As an aside, G.711 codecs use a kind of log scale with only four bits of signal but small signal values use much smaller bits.
mistrial9|1 month ago
source- wrote dithering code for digital images
functionmouse|1 month ago
HelloUsername|1 month ago
yeasku|1 month ago
[deleted]
observationist|1 month ago
If it's really good AI upsampling, you might get qualitatively "better" sounding audio than the original but still technically deviates from the original baseline by ~8%. Conversely, there'll be technically "correct" upsampling results with higher overall alignment with the original that can sound awful. There's still a lot to audio processing that's more art than science.