Why estimate PDF through histogram then convert to CDF, when one can estimate CDF directly? Doing so also avoids having to choose bin width that can have substantial impact.
Agreed -- very odd to use a parameter (bin width) in a nonparametric estimation. Just use the raw data. In numerical analysis, broadly speaking, integrals are stable while derivatives are wild; an empirical cdf is a nice smooth integral of the messy pdf.
If the data is continuous, use kernel density estimation (KDE) instead of histograms to visualize the probability density, since KDE will give a smoother fit. A similar idea is to fit a mixture of normals -- there are numerous R packages for this and sklearn.mixture.GaussianMixture in SciPy.
Yep! The next post would be on Kernel density estimation -- wanted to start from histograms as they are still a useful tool in 1-D and 2-D density estimation, and you don't have to store the data either (unlike KDE)
bagrow|1 year ago
andrewla|1 year ago
sobriquet9|1 year ago
andrewla|1 year ago
Bostonian|1 year ago
vvanirudh|1 year ago