top | item 35782503

(no title)

peepwaah | 2 years ago

Can you recommend any good references to begin understanding the Spectrogram ? I work in DL based Noise cancellation - major part of my work involves analyzing spectrograms - I find it very difficult to do my work without having an ability to critically analyze these images. Any help from anybody ?

discuss

order

picture|2 years ago

What do you mean by "understanding the Spectrogram"? The graph itself is straightforward: x axis is time, y axis is frequency. The intensity of each pixel represents the intensity of a certain frequency component and a certain point in time.

If you're referring to generating spectrograms with Fourier transforms, you will need some math background to properly do the calculation by hand. It largely just boils down to "find the amount of each frequency over time"

Last question, if this is the premise your work, shouldn't you know about it already?

HarHarVeryFunny|2 years ago

For human speech:

o The tall vertical lines reflect "plosives" - sudden releases of sound energy often at the begining of words from having mouth/airway closed then open, as in the first letter of "put" or "tea"

o The high frequencies come from "fricatives" like the first letter of "see" or "free" where air is being passed through the teeth or almost closed lips

o The lower frequencies are where most of the recognizable speech content is, corresponding to the way the resonant frequencies of the mouth and throat are being changed (articulation) by moving the tongue, lips and teeth. Specifically the speech content is in changes to the "formants" which are the changing resonant frequencies showing up as bright mostly horizontal bands in the lower frequencies

Noise may show up in various ways depending on what the noise source is. A fixed frequency spectrum background hum is going to show up as one or more horizontal frequency bands across the entire spectrogram. High frequency noise is going to show up as much more energy in the higher frequencies, which don't have a lot of energy for clean speech (fricatives only).

djsamseng|2 years ago

Thanks for sharing this! I didn’t know about these terms before. Every consider writing a blog post/tutorial on your knowledge of human speech in spectrograms? This is much more digestible than most of what’s out there

djsamseng|2 years ago

This is a pretty good introductory primer. https://medium.com/analytics-vidhya/understanding-the-mel-sp...

1. STFT (get frequencies from the audio signal)

2. Log scale/ decibel scale (since we hear on the log scale)

3. Optionally convert to the Mel scale (filters to how humans hear)

Happy to answer any questions

peepwaah|2 years ago

Thanks for your effort in sharing the link- am kind of comfortable with most of the theoretical aspects of STFT/FFT/MelScale etc.. but when i look at the spectrogram i still feel am missing something. When i look at the spectrogram i want to know how clear is the quality of the speech in the audio - is there background noise - Is there a reverb - Is there a loss anywhere - I have a feeling that these are possible to be learnt from analyzing spectrograms but not sure how to do it. Hence the question.