ellimilial's comments

ellimilial | 3 years ago | on: The fastest tool for querying large JSON files is written in Python (benchmark)

If it fits on a single machine - jq, flat files, JSON lines / avro if relatively flat. Change to a tabular format if when nesting not required.

Postgres JSONB works, but it requires maintaining a heavy server process. So does Lucene/elasticsearch.

I have been yearning for embeddable store (in line with SQLite the support that both works and also keeps the data compressed like JSONB). I know there were some attempts, tried some of it those, mostly monstrosities).

ellimilial | 4 years ago | on: Why I Use Nim instead of Python for Data Processing

A context might be useful.

From what I gather, the author is a researcher in bioinformatics related field. This may indicate that they tend to work either alone or in a relatively small group. The domain is small scope data processing/manipulation, research/exploratory code, ,likely short-lived or even one-off.

The progress in this context will possibly be governed by sheer processing speed (e.g. it’s unlikely anyone will delve deep into the code, a lot of iterations to ‘just get it done’ instead of testing etc.).

If this is more or less correct, the point that Nim might be more useful than Python for the author sounds very sensible to me. It’s a nice spot between command line tools and more functionality-loaded languages.

ellimilial | 4 years ago | on: Turing Oversold?

The team being 'all-British' for obvious security reasons. Which I imagine might have felt like and insult to an injury to the 'little people', who, despite cracking the code, were not permitted to continue working on it. Making them, you know, 'little people'.

ellimilial | 4 years ago | on: Flat Data

Hi Jason, thank you very much for the background and the explanation. It is fascinating to see the progress in this direction.

I started raising my eyebrow (in the best possible sense) upon seeing parts of tooling very similar to ours but simpler and more importantly - without moving parts. We operate in biomedical data space and deal with flat/static data a lot, for example we power https://biokeanos.com with data-in-repo, so Flat Data was immediately interesting.

It is really inspiring to see GitHub actions to having a foray in this direction, definitely something to keep an eye on.

ellimilial | 4 years ago | on: Flat Data

Thank you for the response and clearing up the 'billion rows' / surly bonds confusion I had from reading project's Why Flat Data? section. I think I understand the target use case slightly better now.

One of the strong arguments for object-like storage (S3 etc) in the context of plain / flat data is scalability and availability for large scale processing frameworks. Databases are only occasionally relevant.

ellimilial | 4 years ago | on: Flat Data

Very interesting how Github comes with more and more interesting 'actions' to turn repos into 'platforms' and moves us closer to serverless future.

@idan how does it scale with the size (including storage)? Is 'a billion rows' a goal or an actual tested use case?

page 1