top | item 6803776

(no title)

zissou | 12 years ago

Economist and long time web scraper here.

In your original business model you wanted to understand the price of everything. In what ways did the problem of a lack of information on the demand side come up? That is, it is easy to scrape the price in many markets (supply side), but what kinds of conversations came up within your team about the lack of information on how many units were actually sold at a posted price?

By the way, glad to see you guys were able to make a business out of crawling. I've landed a handful of freelance gigs since leaving grad school based on scraping data for clients, but never tried to expand it to anything beyond consulting projects.

discuss

chad_c|12 years ago

Not an economist, but I have been mulling over a project surrounding scraping and pricing.

Without having access to the actual monetary transaction data, how does one know what was sold and for how much? Without this (or a mechanism by which the lister closes or updates the listing), how do you know anything was actually sold?

larrys|12 years ago

"how does one know what was sold and for how much?"

For example with domain names sales prices only a small amount of transactions are public. For example I've been doing it for 16 or 17 years and have never made any of my data public nor have people that I've consulted for.

Another example might be commercial rents. You can track asking rents but you can't really get a handle on actual rent paid since there are many deal factors (renovations, free rent, triple net etc.) that would change the numbers significantly.

binarysolo|12 years ago

Also an economist and a data scraper/consultant here -- depending on the data, some times all you need to figure out is correlation -- frequency of updates, listings being live for X time; clusters of listings around Y days, etc.

In terms of a few real-life examples, on the one hand you have eBay which provides you with sold data (API through Terapeak). On the other hand you have Craigslist, which is kinda opaque, hates scraping, but you can monitor listings and their half-life. (Listings that disappear quickly presumably get sold quick; listings that stick around for weeks relisted over and over have lower liquidity presumably and/or are priced high.)

highbees|12 years ago

Although it would only be available for a fraction of prices, delta in "quantity available" between scrapes could provide some data.

Most websites don't relay this info to the end user but it could be used on those that do.