top | item 37431817

(no title)

o1y32 | 2 years ago

Honestly doing line.split('\t') seems a newbie mistake. I don't know how many columns each line has but it could be a bottleneck. Wouldn't manually processing tab and non-tab character help, in the same way you don't load the entire log into memory but only read by line? When you are dealing with huge amount of data, you have to be extra careful, regardless of which language you use. That plus other optimization could make Node.js run very efficiently.

Otherwise, I still need to be convinced that the "line" array is causing trouble but not anything else -- the article definitely doesn't provide detail about how they found this problem. It is almost funny that the author tried to put this on Kubernetes instead of even attempting to optimize the JS code. I wish I had that luxury and tell my boss I should use more resources instead of fixing my code.

discuss

didntcheck|2 years ago

Yep. I've written code like this before for small inputs or one-off scripts, but I always thought it was obvious that doing it like this would be slow and produce lots of garbage. It's very nice and readable code, but for the hundreds of GBs they mention, it's clearly not ideal