top | item 30754654

Aggregating away the signal in your data

113 points| yurivish | 4 years ago |stackoverflow.blog | reply

11 comments

order
[+] lmkg|4 years ago|reply
One person's junk is another's treasure; one person's "normalized data" is another person's "you removed the one data point I cared most about!"

One thing this article is reinforcing to me is the value of domain knowledge to an analyst. I am deeply skeptical of "one size fits all" analysis tools, services, and consultancies for exactly this reason. Making insight actionable requires knowing what actions can be taken, and how.

[+] evrydayhustling|4 years ago|reply
I got hung up early on by the use of "aggregation"... these visualizations still aggregate data, by the necessity of mapping to a fixed number of pixels! However, the principle is strong: the author is proposing visualizations that make full use of the pattern matching over 2.5 dimensions that our eyes afford us, and by using that range they are able to make fewer assumptions about which summaries of data are sufficient.

Domain knowledge is still essential, both to pick meaningful projections of the data and to drill into patterns once observed. But since domain knowledge is always limited, it's nice to have techniques that allow you to notice patterns you didn't know well enough to summarize.

[+] klysm|4 years ago|reply
I agree, but I think the visualizations presented here can be useful in many domains and aren’t generally used. Furthermore, I think showing uncertainty in visualizations is hugely important and this is a step in the right direction there.
[+] jarenmf|4 years ago|reply
Excellent article, aggregation can also obfuscate problems with sensors (for example, weird quantization or duplicating points). It is useful whenever you have high frequency time series to look at the data points for the highest resolution possible for short segments of few data points.
[+] H8crilA|4 years ago|reply
I cannot recommend enough the "John Lamping - The One Weird Trick for Analyzing Big Data ... Eyeball it Early and Often!" video:

https://www.youtube.com/watch?v=jYH8CQS6Ab0

One of the best tips, straight from a practitioner - from a former Google search ranking engineer who touched multiple other domains later in the career. Stop tuning knobs and watching metrics, look at the data!

[+] shrx|4 years ago|reply
The full title is "Stop aggregating away the signal in your data".
[+] zzleeper|4 years ago|reply
I'm only halfway through the article, but must say that it's amazing so far and the dataviz much better and careful than what I regularly see.
[+] bsmithers|4 years ago|reply
Excellent article. Faceted visualisation is an incredibly powerful technique.

Something the author hints at but isn't quite explicit: manual inspection of individual examples from your dataset can help you understand what questions to ask, what category to facet on, or the bug in your aggregation.

[+] wodenokoto|4 years ago|reply
I think this is a piece written to promote observablehq and their visualization tools.

I think the graphs look great and I want to make similar stuff - does anyone have experience with observable? Does it beat ggplot, tableu and others?

[+] aunty_helen|4 years ago|reply
When I saw the yellow of the graph it made me think of Tufts famous book. A few paragraphs in I had to check that it actually wasn't authored by him.

The depth and knowledge the author displays is fantastic.