(no title)
edbaskerville | 4 months ago
The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.
A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.
crazygringo|4 months ago
Nobody who knows anything about signal processing has ever suggested that the ear performs a Fourier transform across infinite time.
But the ear does perform something very much akin to the FFT (fast Fourier transform), turning discrete samples into intensities at frequencies -- which is, of course, what any reasonable person means when they say the ear does a Fourier transform.
This article suggests it's accomplished by something between wavelet and Gabor. Which, yes, is not exactly a Fourier transform -- but it's producing something that is about 95-99% the same in the end.
And again, nobody would ever suggest the ear was performing the exact math that the FFT does, down to the last decimal point. But these filters still work essentially the same way as the FFT in terms of how they respond to a given frequency, it's really just how they're windowed.
So if anyone just wants a simple explanation, I would say yes the ear does a Fourier transform. A discrete one with windowing.
anyfoo|4 months ago
First, I think when you say FFT, you mean DFT. A Fourier transform is both non-discrete and infinite in time. A DTFT (discrete time fourier transform) is discrete, i.e. using samples, but infinite. A DFT (discrete fourier transform) is both finite (analyzed data has a start and an end) and discrete. An FFT is effectively an implementation of a DFT, and there is nothing indicating to me that hearing is in any way specifically related to how the FFT computes a DFT.
But more importantly, I'm not sure DFT fits at all? This is an analog, real-world physical process, so where is it discrete, i.e. how does the ear capture samples?
I think, purely based upon its "mode", what's happening is more akin to a Fourier series, which is the missing fourth category completing (FT, DTFT, DFT): Continuous (non-discrete), but finite or rather periodic in time.
But secondly, unlike Gabor transforms, wavelet transforms are specifically not just windowed Fourier anythings (whether FT/FS/DFT/DTFT). Those would commonly be called "short-time Fourier transforms" (STFT, existing again in discrete and non-discrete variants), and the article straight up mentions that they don't fit either in its footnotes.
Wavelet transforms use an entirely different shape (e.g. a haar wavelet) that is shifted and stretched for analysis, instead of windowed sinusoids over a windowed signal.
And I think those distinctions are what the article actually wanted to touch upon.
waffletower|4 months ago
kragen|4 months ago
This description applies equally well to the discrete wavelet, discrete Gabor, and maybe even Hadamard transforms, which are definitely not, as you assert, "95–99% the same in the end" (how would you even measure such similarity?) So it is not something any reasonable person has ever meant by "the Fourier transform" or even "the discrete Fourier transform".
Also, you seem to be confused about what "discrete" means in the context of the Fourier transform. The ear functions in continuous time and does not take discrete samples.
a-dub|4 months ago
this is the time-frequency uncertainty principle. intuitively it can be understood by thinking about wavelength. the more stretched out the waveform is in time, the more of it you need to see in order to have a good representation of its frequency, but the more of it you see, the less precise you can be about where exactly it is.
> but it does do a time-localized frequency-domain transform akin to wavelets
maybe easier to conceive of first as an arbitrarily defined filter bank based on physiological results rather than trying to jump directly to some neatly defined set of orthogonal basis functions. additionally, orthogonal basis functions cannot, by definition, capture things like masking effects.
> A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.
(4) size of the animal.
notably: some smaller creatures have supersonic vocalization and sensory capability, sometimes this is hypothesized to complement visual perception for avoiding predators, it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!
Terr_|4 months ago
Now I'm imagining some alien shrew with vocal-cords (or syrinx, or whatever) that runs the entire length of its body, just so that it can emit lower-frequency noises for some reason.
matthewdgreen|4 months ago
patrickthebold|4 months ago
bonoboTP|4 months ago
It's called the short-time Fourier transform (STFT).
https://en.wikipedia.org/wiki/Short-time_Fourier_transform
IshKebab|4 months ago
Nobody who knows literally anything about signal processing thought the ear was doing a Fourier transform. Is it doing something like a STFT? Obviously yes and this article doesn't go against that.
xeonmc|4 months ago
cherryteastain|4 months ago
[1] https://en.wikipedia.org/wiki/Spectrogram
km3r|4 months ago
I wonder if these could be used to better master movies and television audio such that the dialogue is easier to hear.
kiicia|4 months ago
xeonmc|4 months ago
toast0|4 months ago
lgas|4 months ago
What would it mean for a sound to not be localized in time?
hansvm|4 months ago
Zooming in to cartoonish levels might drive the point home a bit. Suppose you have sound waves
What is the frequency exactly 1/3 the way between the first two wave peaks? It's a nonsensical question. The frequency relates to the time delta between peaks, and looking locally at a sufficiently small region of time gives no information about that phenomenon.Let's zoom out a bit. What's the frequency over a longer period of time, capturing a few peaks?
Well...if you know there is only one frequency then you can do some math to figure it out, but as soon as you might be describing a mix of frequencies you suddenly, again, potentially don't have enough information.
That lack of information manifests in a few ways. The exact math (Shannon's theorems?) suggests some things, but the language involved mismatches with human perception sufficiently that people get burned trying to apply it too directly. E.g., a bass beat with a bit of clock skew is very different from a bass beat as far as a careless decomposition is concerned, but it's likely not observable by a human listener.
Not being localized in time means* you look at longer horizons, considering more and more of those interactions. Instead of the beat of a 4/4 song meaning that the frequency changes at discrete intervals, it means that there's a larger, over-arching pattern capturing "the frequency distribution" of the entire song.
*Truly time-nonlocalized sound is of course impossible, so I'm giving some reasonable interpretation.
kragen|4 months ago
Of course, none of these are completely nonlocalized in time. Sooner or later there will be a blackout and the transformer will go silent. But it's a lot less localized than the chirp of a bird.
xeonmc|4 months ago
Imagine the dissonant sound of hitting a trashcan.
Now imagine the sound of pressing down all 88 keys on a piano simultaneously.
Do they sound similar in your head?
The localization is located at where the phase of all frequency components are aligned coherently construct into a pulse, while further down in time their phases are misaligned and cancel each other out.
littlestymaar|4 months ago
dsp_person|4 months ago
We can make a short-time fourier transform or a wavelet transform in the same way either by:
- filterbank approach integrating signals in time
- take fourier transform of time slices, integrating in frequency
The same machinery just with different filters.
psunavy03|4 months ago
Well from an evolutionary perspective, this would be unsurprising, considering any other forms of language would have been ill-fitted for purpose and died out. This is really just a flavor of the anthropic principle.
SoftTalker|4 months ago
Sharlin|4 months ago
jibal|4 months ago
FarmerPotato|4 months ago
Why do we need a summary in a post that adds nothing new to the conversation?
pests|4 months ago
AreYouElite|4 months ago
I'm no expert in these matters just speculating...
fwip|4 months ago