top | item 34073291

(no title)

p4l4g4 | 3 years ago

Did a lot of data wrangling this year. The usual grep, sed, awk, jq and even find has sped up my days significantly. Sed is among my favorites to whip up some quick, ad hoc, transformations.

This year I added Miller [0] to my list; a tool to process tabular data, similar to sed, awk, etc. It handles csv, tav, json lines, etc. in a consistent way. I like the delimited key-value pairs format, which allows me to write simple oneliners in bash to collect some data (e.g. "ip=x.x.x.x,endpoint=/api/x") and use Miller to crunch the results. Not sure it saved me 100h, but it was one of the biggest time savers this year!

[0] https://miller.readthedocs.io/en/latest/

discuss

qazxcvbnm|3 years ago

I second Miller - besides its extensive support for various formats, it is fast. If you ever have to deal with gigabyte-sized files, Miller will give noticeable speed ups versus jq et al.

mattewong|3 years ago

FYI: https://github.com/BurntSushi/xsv is much faster than mlr (like an order of magnitude), and zsv (https://github.com/liquidaty/zsv) is even faster. But, neither support formulas. Disclaimer: I am one of the zsv authors