Ask HN: If your job involves continually importing CSVs, what industry is it?

[+] _petronius|9 years ago|reply

We do a lot of CSV imports (inventory/data management for art galleries). This is for a few reasons, most of which boil down to "it's an easy format to share with clients":

- Clients that don't already have their data in an inventory management system usually have Excel spreadsheets, so we can just export that to CSV, parse it, and stick it in the database.

- If they really have nothing (some commercial galleries, especially ones that have been around for 30+ years still do everything on index cards), then you can give someone a spreadsheet template to do the data entry into, and script the import step.

- Even if they do, when preparing data for import, it's easy for the clients to look through the data (and for us) to see potential errors/inconsistencies that need to be corrected during the course of importing.

- Small industry (data management services for commercial art galleries is maybe 2-3 companies in Europe, 1-2 in the US, and that's about it), all writing proprietary software, no semblance of agreed formats/standards. When a client quits one company and moves to another, it's easy to dump their data to CSV and leave the company doing the import to sort it out on the other end. Trying to piece together random-looking values across 15+ CSV files to see what the last legacy system meant by some string is a pain, but it's something that the client will pay for in most cases.

[+] hamandcheese|9 years ago|reply

E-commerce advertising. I worked at a company that managed online advertising (mostly Adwords at the time) for ecommerce companies. Most would provide us with a CSV with all their products - this would sometimes be hundreds of thousands of entries. We imported enough CSV (gigabytes a day maybe) that it was worthwhile to do things like implement custom CSV parsers to deal with some of our larger customers noncompliant CSV.

JSON might have been nice, but I can imagine it coming with its own trouble as well. CSV worked well since our tools would allow analysts to interpolate columns into the copy or key words when doing bulk ad creation. The nestable nature of JSON would have complicated the interface given to our nontechnical analysts.

I also imagine many of our clients wouldn't even know what JSON is.

[+] westurner|9 years ago|reply

Arguing for the CSVW (CSV on the Web) W3C Standards:

- "CSV on the Web: A Primer" http://w3c.github.io/csvw/primer/

- Src: https://github.com/w3c/csvw

- Columns have URIs (ideally from a shared RDFS/OWL vocabulary)

- Columns have XSD datatype URIs

- CSVW can be represented as RDF, JSON, JSONLD

With CSV, which extra metadata file describes how many rows at the top are for columnar metadata? (I.e. column labels, property URI, XSD datatype URI, units URI, precision, accuracy, significant figures) ... https://wrdrd.com/docs/consulting/linkedreproducibility#csv-...

... CSVW: https://wrdrd.com/docs/consulting/knowledge-engineering#csvw

  @prefix csvw: <http://www.w3.org/ns/csvw#> .

@context: http://www.w3.org/ns/csvw.jsonld

[+] byoung2|9 years ago|reply

I am currently working on a project for a client in real estate management. They are aggregating data from state licensing agencies, insurance rating agencies, etc., and there are a lot that only provide CSV data. I imagine on the other side of the fence there is some guy or gal managing all this info in Excel, and CSV is the easiest path for the data to flow out.

Previously I worked for a company that provided a dashboard for small businesses to manage their listings (Yelp, Facebook, Google, Tripadvisor, etc). For multilocation clients, for initial setup we needed a list of all locations, addresses, phone, etc., and not a single client said "here's our api, grab the data in JSON format". Instead, we always got a CSV. We eventually gave them a CSV template file for them to copy/paste into.

[+] iamwil|9 years ago|reply

Oh, so like information from data.gov or other public data to verify people. That makes sense.

I'm guessing small businesses want to do integration with the web, but they didn't really have the engineers to do an API integration?

[+] cocktailpeanuts|9 years ago|reply

I am also curious, but just to share my own insight on this, I've actually looked into a lot of CSV providers and they all seem to cater towards raw data with less semantics compared to the type of data that we would normally see through JSON or XML.

For example a lot of CSV data is used for plotting/visualization, because all they have is numbers. If they had more metadata they probably would have ported them to JSON.

Another observation: A lot of CSV dumps online are really that: data dump. It's a huge size file that's meant to be used after downloading, not for streaming like JSON/XML. You don't see many JSON apis that return huge size data, but it's common to see a lot of huge sized CSV files.

[+] iamwil|9 years ago|reply

What sort of people provide the CSVs?

I myself have had to integrate with 3rd party logistics companies, and the way we got them to send out our inventory was to send them a CSV of our orders twice a day.

[+] lazyjones|9 years ago|reply

Former company: comparison shopping engine / website, used for transferring e-commerce offers from merchants to our database. CSV is vastly superior (at least to XML, maybe JSON) due to its size and quick line-by-line parsing. The downside is lots of quirky and broken formats, there's no real standardisation in e-commerce.

[+] jtcond13|9 years ago|reply

Yes - Insurance. For one thing, many people can code well enough to move data from CSVs to a database (or vice versa) but not well enough to read/write an API. I guess the main reason, though, is that many back-office applications don't need to be 'real-time' and for those it's always going to be easier to send files and have someone import them to a DB.

[+] davelnewton|9 years ago|reply

There's CSV all over the place.

Tables of data are best represented with... well, tables of data. For data that's not nested it's a perfectly acceptable format. Importing well-formed (and there's the rub) is trivial and a well-known process.

[+] bhassfurt|9 years ago|reply

Yes, many of our customers in the public utility industry use CSV. Use of JSON is somewhat rare, but XML is quite common.

[+] htwillie|9 years ago|reply

I work with a lot of environmental data coming from disparate sources.

.csv is like the lowest common denominator of data formats.

[+] et2o|9 years ago|reply

CSVs are often superior for the specific types of structured, tabled data used in bioinformatics and statistics.

[+] jpindar|9 years ago|reply

Electrical engineer here.

EDA tools often export BOMs as CSV, which we import to Excel.

ATE programs often export test data as CSV, also imported to Excel for manipulation and graphing.

[+] jetti|9 years ago|reply

I'm in health care and we use CSVs as outbound file transfers. We also use x12 (EDI) for transfers which is not XML/JSON

[+] herbst|9 years ago|reply

> moved to xml

No people still use the obviously superior format for structured tabulary data. Why would anyone use something that is highly suboptimal for things like these?

[+] iamwil|9 years ago|reply

\s ?

[+] maplechori|9 years ago|reply

In social science we use Stata/SAS when possible but CSVs show up a lot.

[+] jackgolding|9 years ago|reply

Media was a lot of CSVs (exports from TV buying programs), had some logistics guys use CSVs a lot too

[+] kidlogic|9 years ago|reply

Finance

[+] iamwil|9 years ago|reply

What aspect of finance? Or rather, who are you getting the CSV from and what do you do with it?

23 comments