We do a lot of CSV imports (inventory/data management for art galleries). This is for a few reasons, most of which boil down to "it's an easy format to share with clients":
- Clients that don't already have their data in an inventory management system usually have Excel spreadsheets, so we can just export that to CSV, parse it, and stick it in the database.
- If they really have nothing (some commercial galleries, especially ones that have been around for 30+ years still do everything on index cards), then you can give someone a spreadsheet template to do the data entry into, and script the import step.
- Even if they do, when preparing data for import, it's easy for the clients to look through the data (and for us) to see potential errors/inconsistencies that need to be corrected during the course of importing.
- Small industry (data management services for commercial art galleries is maybe 2-3 companies in Europe, 1-2 in the US, and that's about it), all writing proprietary software, no semblance of agreed formats/standards. When a client quits one company and moves to another, it's easy to dump their data to CSV and leave the company doing the import to sort it out on the other end. Trying to piece together random-looking values across 15+ CSV files to see what the last legacy system meant by some string is a pain, but it's something that the client will pay for in most cases.
E-commerce advertising. I worked at a company that managed online advertising (mostly Adwords at the time) for ecommerce companies. Most would provide us with a CSV with all their products - this would sometimes be hundreds of thousands of entries. We imported enough CSV (gigabytes a day maybe) that it was worthwhile to do things like implement custom CSV parsers to deal with some of our larger customers noncompliant CSV.
JSON might have been nice, but I can imagine it coming with its own trouble as well. CSV worked well since our tools would allow analysts to interpolate columns into the copy or key words when doing bulk ad creation. The nestable nature of JSON would have complicated the interface given to our nontechnical analysts.
I also imagine many of our clients wouldn't even know what JSON is.
- Columns have URIs (ideally from a shared RDFS/OWL vocabulary)
- Columns have XSD datatype URIs
- CSVW can be represented as RDF, JSON, JSONLD
With CSV, which extra metadata file describes how many rows at the top are for columnar metadata? (I.e. column labels, property URI, XSD datatype URI, units URI, precision, accuracy, significant figures) ... https://wrdrd.com/docs/consulting/linkedreproducibility#csv-...
I am currently working on a project for a client in real estate management. They are aggregating data from state licensing agencies, insurance rating agencies, etc., and there are a lot that only provide CSV data. I imagine on the other side of the fence there is some guy or gal managing all this info in Excel, and CSV is the easiest path for the data to flow out.
Previously I worked for a company that provided a dashboard for small businesses to manage their listings (Yelp, Facebook, Google, Tripadvisor, etc). For multilocation clients, for initial setup we needed a list of all locations, addresses, phone, etc., and not a single client said "here's our api, grab the data in JSON format". Instead, we always got a CSV. We eventually gave them a CSV template file for them to copy/paste into.
I am also curious, but just to share my own insight on this, I've actually looked into a lot of CSV providers and they all seem to cater towards raw data with less semantics compared to the type of data that we would normally see through JSON or XML.
For example a lot of CSV data is used for plotting/visualization, because all they have is numbers. If they had more metadata they probably would have ported them to JSON.
Another observation: A lot of CSV dumps online are really that: data dump. It's a huge size file that's meant to be used after downloading, not for streaming like JSON/XML. You don't see many JSON apis that return huge size data, but it's common to see a lot of huge sized CSV files.
I myself have had to integrate with 3rd party logistics companies, and the way we got them to send out our inventory was to send them a CSV of our orders twice a day.
Former company: comparison shopping engine / website, used for transferring e-commerce offers from merchants to our database. CSV is vastly superior (at least to XML, maybe JSON) due to its size and quick line-by-line parsing. The downside is lots of quirky and broken formats, there's no real standardisation in e-commerce.
Yes - Insurance. For one thing, many people can code well enough to move data from CSVs to a database (or vice versa) but not well enough to read/write an API. I guess the main reason, though, is that many back-office applications don't need to be 'real-time' and for those it's always going to be easier to send files and have someone import them to a DB.
Tables of data are best represented with... well, tables of data. For data that's not nested it's a perfectly acceptable format. Importing well-formed (and there's the rub) is trivial and a well-known process.
No people still use the obviously superior format for structured tabulary data. Why would anyone use something that is highly suboptimal for things like these?
[+] [-] _petronius|9 years ago|reply
- Clients that don't already have their data in an inventory management system usually have Excel spreadsheets, so we can just export that to CSV, parse it, and stick it in the database.
- If they really have nothing (some commercial galleries, especially ones that have been around for 30+ years still do everything on index cards), then you can give someone a spreadsheet template to do the data entry into, and script the import step.
- Even if they do, when preparing data for import, it's easy for the clients to look through the data (and for us) to see potential errors/inconsistencies that need to be corrected during the course of importing.
- Small industry (data management services for commercial art galleries is maybe 2-3 companies in Europe, 1-2 in the US, and that's about it), all writing proprietary software, no semblance of agreed formats/standards. When a client quits one company and moves to another, it's easy to dump their data to CSV and leave the company doing the import to sort it out on the other end. Trying to piece together random-looking values across 15+ CSV files to see what the last legacy system meant by some string is a pain, but it's something that the client will pay for in most cases.
[+] [-] hamandcheese|9 years ago|reply
JSON might have been nice, but I can imagine it coming with its own trouble as well. CSV worked well since our tools would allow analysts to interpolate columns into the copy or key words when doing bulk ad creation. The nestable nature of JSON would have complicated the interface given to our nontechnical analysts.
I also imagine many of our clients wouldn't even know what JSON is.
[+] [-] westurner|9 years ago|reply
- "CSV on the Web: A Primer" http://w3c.github.io/csvw/primer/
- Src: https://github.com/w3c/csvw
- Columns have URIs (ideally from a shared RDFS/OWL vocabulary)
- Columns have XSD datatype URIs
- CSVW can be represented as RDF, JSON, JSONLD
With CSV, which extra metadata file describes how many rows at the top are for columnar metadata? (I.e. column labels, property URI, XSD datatype URI, units URI, precision, accuracy, significant figures) ... https://wrdrd.com/docs/consulting/linkedreproducibility#csv-...
... CSVW: https://wrdrd.com/docs/consulting/knowledge-engineering#csvw
@context: http://www.w3.org/ns/csvw.jsonld[+] [-] byoung2|9 years ago|reply
Previously I worked for a company that provided a dashboard for small businesses to manage their listings (Yelp, Facebook, Google, Tripadvisor, etc). For multilocation clients, for initial setup we needed a list of all locations, addresses, phone, etc., and not a single client said "here's our api, grab the data in JSON format". Instead, we always got a CSV. We eventually gave them a CSV template file for them to copy/paste into.
[+] [-] iamwil|9 years ago|reply
I'm guessing small businesses want to do integration with the web, but they didn't really have the engineers to do an API integration?
[+] [-] cocktailpeanuts|9 years ago|reply
For example a lot of CSV data is used for plotting/visualization, because all they have is numbers. If they had more metadata they probably would have ported them to JSON.
Another observation: A lot of CSV dumps online are really that: data dump. It's a huge size file that's meant to be used after downloading, not for streaming like JSON/XML. You don't see many JSON apis that return huge size data, but it's common to see a lot of huge sized CSV files.
[+] [-] iamwil|9 years ago|reply
I myself have had to integrate with 3rd party logistics companies, and the way we got them to send out our inventory was to send them a CSV of our orders twice a day.
[+] [-] lazyjones|9 years ago|reply
[+] [-] jtcond13|9 years ago|reply
[+] [-] davelnewton|9 years ago|reply
Tables of data are best represented with... well, tables of data. For data that's not nested it's a perfectly acceptable format. Importing well-formed (and there's the rub) is trivial and a well-known process.
[+] [-] bhassfurt|9 years ago|reply
[+] [-] htwillie|9 years ago|reply
.csv is like the lowest common denominator of data formats.
[+] [-] et2o|9 years ago|reply
[+] [-] jpindar|9 years ago|reply
EDA tools often export BOMs as CSV, which we import to Excel.
ATE programs often export test data as CSV, also imported to Excel for manipulation and graphing.
[+] [-] jetti|9 years ago|reply
[+] [-] herbst|9 years ago|reply
No people still use the obviously superior format for structured tabulary data. Why would anyone use something that is highly suboptimal for things like these?
[+] [-] iamwil|9 years ago|reply
[+] [-] maplechori|9 years ago|reply
[+] [-] jackgolding|9 years ago|reply
[+] [-] kidlogic|9 years ago|reply
[+] [-] iamwil|9 years ago|reply