top | item 43937269

(no title)

zzbn00 | 9 months ago

Humans generate decisions / text information at rates of ~bytes per second at most. There is barely enough humans around to generate 21GB/s of information even if all they did was make financial decisions!

So 21 GB/s would be solely algos talking to algos... Given all the investment in the algos, surely they don't need to be exchanging CSV around?

discuss

order

wat10000|9 months ago

Standards (whether official or de facto) often aren't the best in isolation, but they're the best in reality because they're widely used.

Imagine you want to replace CSV for this purpose. From a purely technical view, this makes total sense. So you investigate, come up with a better standard, make sure it has all the capabilities everyone needs from the existing stuff, write a reference implementation, and go off to get it adopted.

First place you talk to asks you two questions: "Which of my partner institutions accept this?" "What are the practical benefits of switching to this?"

Your answer to the first is going to be "none of them" and the answer to the second is going to be vague hand-wavey stuff around maintainability and making programmers happier, with maybe a little bit of "this properly handles it when your clients' names have accent marks."

Next place asks the same questions, and since the first place wasn't interested, you have the same answers....

Replacing existing standards that are Good Enough is really, really hard.

hermitcrab|9 months ago

CSV is a questionable choice for a dataset that size. It's not very efficient in terms of size (real numbers take more bytes to store as text than as binary), it's not the fastest to parse (due to escaping) and a single delimiter or escape out of place corrupts everything afterwards. That not to mention all the issues around encoding, different delimiters etc.

zzbn00|9 months ago

Its great for when people need to be in the loop, looking at the data, maybe loading in Excel etc. (I use it myself...). But not enough humans around for 21 GB/s

jstimpfle|9 months ago

> (real numbers take more bytes to store as text than as binary)

Depends on the distribution of numbeds in the sataset. It's quite common to have small numbers. For these text is a more efficient representation compared to binary, especially compared to 64-bit or larger binary encodings.

cyral|9 months ago

The only real example I can think of is the US options market feed. It is up to something like 50 GiB/s now, and is open 6.5 hours per day. Even a small subset of the feed that someone may be working on for data analysis could be huge. I agree CSV shouldn't even be used here but I am sure it is.

nly|9 months ago

OPRA is a half dozen terabytes of data per day compressed.

CSV wouldn't even be considered.

adrianN|9 months ago

You might have accumulated some decades of data in that format and now want to ingest it into a database.

zzbn00|9 months ago

Yes, but if you have decades of data, what turns on having to wait for a minute or 10 minutes to convert it?

internetter|9 months ago

> Humans generate decisions / text information at rates of ~bytes per second at most

Yes, but the consequences of these decisions are worth much more. You attach an ID to the user, and an ID to the transaction. You store the location and time where it was made. Ect.

zzbn00|9 months ago

I think these would add only small amount of information (and in a DB would be modelled as joins). Only adds lots of data if done very inefficiently.

h4ck_th3_pl4n3t|9 months ago

You seem to not realize that most humans are not coders.

And non coders use proprietary software, which usually has an export into CSV or XLS to be compatible with Microsoft Office.