(no title)
npolet | 8 years ago
We ended up building our own little system that merged our own flight data with data from suppliers. The schema some of the suppliers provided was non sensical and it ended up being a really finely tuned system that would break reguarly if one of the data suppliers decided to change it without notice.
I spent my days dreaming of a standardized data pool, and while some of the data was standardized, not enough of it was.
Thank goodness I'm now working in an industry that we can totally own our data and are not reliant on anyone else for it.
lotyrin|8 years ago
i_cant_speel|8 years ago
Half of the time I spend debugging is on figuring out what is going wrong with different MLSs.
paulie_a|8 years ago
pishpash|8 years ago
tyingq|8 years ago
I don't know that other industries are much better though. Ask a healthcare IT person about HL7.
[1]https://en.m.wikipedia.org/wiki/OpenTravel_Alliance
snuxoll|8 years ago
HL7 claims to be a standard, but it’s a hell of a lot closer to JSON in terms of standardization (that is, a serialization format) than say ANSI X12 (an ugly but generally standard data interchange system).
SmirkingRevenge|8 years ago
Been there, done that. Never again!
cwilkes|8 years ago
Bluestrike2|8 years ago
Whatever. People can come up with crazy solutions and you're stuck working with them. At least you've got the CSV files by that point, right? Well, sort of. Turns out they needed a good bit of work. And by a bit of work, I mean that some of them were quite possible the worst CSV files ever generated. And it was...weird. Normally, if there's a problem with a CSV file it's at least consistent across the entire file. Quotation marks not escaped? Ok, no big deal. They're all like that. Usually, you can normalize the data and move forward.
If only. Some lines had escaped quotes. Some didn't. Some lines were actually multiple records because apparently the magic linebreak decided to go on strike. In one case, I kid you not, the file switched from comma-separated to tab-separated. Huh? How'd that even happen? Some values were perfectly valid, just handled in a way that's guaranteed to annoy the hell out of you. But fine. You do what you can, reject the bad (and log to a file for manual review in the hopes that you'll figure...something...out about it), and move on with your sort-of-normalized data. But that's just referring to the data itself, and not the entries.
When you order a new car, the build sheet for every manufacturer is simple enough. Every option has a code. Every option has a name. New car dealers get their inventory data back from the factory and it's plugged into their inventory management systems. Through whatever accidental acts of magic and chicanery, that data eventually makes its way to the data sources you're busy importing. Unfortunately, at some point in the process all of those beautiful factory codes and names--standardized, constant, etc.--disappear. That beautiful "Cobalt Blue" is somehow transmogrified into a very unhelpful "Blue." And don't even get me started on factory options. At times, you're lucky if you somehow accidentally get the basics like, oh, "has two front seats and possibly four-ish wheels." It's even worse with used cars, because some unlucky salesperson/clerk/receptionist/car washer kid had to sit down and manually enter the car data.
Instead of thinking of that giant CSV file as a wonderful list of thousands of cars just waiting to be discovered and purchased, you start to see it as more of a starting point. It's an incomplete list of cars for sale, with some of the information about each car. You need to use other sources to fill in some of the blanks, fix some of the most obvious errors, etc. Luckily, dealers upload their data to as many services as they can. Unluckily, it's often...different across those same source, and it's up to you to figure out what data to keep the same or change. Depending on the manufacturer, you can decode the VIN and pull up all sorts of useful information about how the car's build options. Maybe. You can then use the manufacturer's pricing guide for that model year to fill in the blanks. Assuming, of course, that you've got a copy of the pricing guide in question. Which isn't guaranteed, since they're generally not publicly available (though they do leak...often).
I'm a huge Porsche fan, and I know more Porsche fans. We're all nuts. Details matter. Do you want to know how many [shades of blue](http://paintref.com/cgi-bin/colorcodedisplay.cgi?manuf=Porsc...) Porsche has used over the years? Many of the paint colors are available across different models and different years, so the number is a lot scarier than it actually is, but there are a total of 641 entries in the linked paint database. Someone searching for a used Porsche wants to make certain that they're looking for a specific color or option. They don't want to just use "blue" for their search. They're looking for the gorgeous [Oslo blue](https://gearpatrol.com/2016/08/18/definitive-ranking-blue-po...) imprinted in their minds during a magical childhood moment, damnit. Which was only used in 1961, except for custom paint-to-sample orders in later years. There was 1 993 Turbo S in Turquoise Blue. Jerry Seinfeld bought it. A PTS color will affect the relative value of the car (new or used), so it's one of those fiddly bits that matters a bit.
You'll get another multi-gigabyte CSV file delivered to by FTP on the morrow. And joy, it's not an incremental update with just the new cars. It's all the cars they have data for. If a car has been sold, it'll be omitted from the file. It's up to you to figure out which. Hopefully, the car won't be "un-sold" in a day or two after the buyer backs out. That can get weird, especially when you're dealing with multiple providers. Finally, your simple diff is further complicated by the inevitable likelihood that random data across random listings has been changed as well. Perhaps they caught a typo in the VIN number, or the annoying fact that the paint color originally listed never actually existed.
Needless to say, I sound a bit crazy at this point. Your only option is to accept that, much as it might annoy the hell out of you, there are going to be serious compromises involved with the data you have access to. Apologies to those with sever OCD and perfectionists. Handily, some manufacturers have sites for their dealers (i.e., http://porsche-dealer.com). Usually, these sources are pristine with everything in order. You just need to scrape it where it's allowed, which is its own set of fun.
And don't even get me started on the photos included with listings. It's rare that you'll see decent photos taken by a dealer. Mostly, you'll find those on eBay Motors with certain sellers. Everyone else does their best to make them as terrible as possible. The data providers then take that as a challenge, and crush the ever-living hell out of the image with another round of JPEG compression and give the resulting monstrosity to you at a nice, small resolution. Want something bigger? Forget it.
Maybe I'm blowing all of this out of proportion. Perhaps the industry has changed in the past few years since my experience. Personally, I doubt it. In any case, dealing with this was annoying as hell. This rambling post was oddly cathartic.
Anyhow, if you ask me, I'm pretty sure all of the car search sites just scrape each other; like Ouroboros, depicting a serpent eating its own tail.
closeparen|8 years ago
Somehow this is indeed a standard for data transfers between non-tech companies. (The ones that are sophisticated enough not to use fax or email attachments).
rokhayakebe|8 years ago
vpribish|8 years ago
SmirkingRevenge|8 years ago
That's been my experience anyways.
sidlls|8 years ago
benjarrell|8 years ago
elmalto|8 years ago
sbfeibish|8 years ago
madeofpalk|8 years ago
Heh. Having done work for an airline, I wouldnt even say this.
dorchadas|8 years ago
unknown|8 years ago
[deleted]
unknown|8 years ago
[deleted]
new_user224|8 years ago
hchasestevens|8 years ago