top | item 6090560

(no title)

karli | 12 years ago

Hi,

yes, as it is said in the blogpost, the only thing missing is the full text of the page for indexing & searching in it, we don't dare to release it because of copyright issues (he, you distribute the full text of my page!).

With this data you could for example built a new alexa and find out what was the most visited page last week :)

discuss

order

haldujai|12 years ago

With this data you could for example built a new alexa and find out what was the most visited page last week:)

While what you're doing is interesting, and this data could shed some light on a lot of questions, you're putting the cart way before the horse here.

125k searches equates to, generously, 12.5k users? From the Chrome Webstore and PlayStore it seems theres about 500 users from those.

A correct statement would be 'with this data you could for example find out waht the most visited web page was from our subset of a subset of 12.5k users'

That said if you get a significant market share this could be very interesting. I'm guessing you don't always plan on providing dumps for free and will monetize them at some point?

geraldbaeck|12 years ago

Of course you are right, with such data you could build something like alexa. We are aware that this data is currently a tiny subset of the web and does not represent much, but it has potential if it grows.

Since we do not consider this data to be ours, we do not plan to charge anyone for the dumps.

Gerald, CTO Blippex

enigmo|12 years ago

How would we figure out which page was visited the most last week? Are these crawl logs or access logs?

haldujai|12 years ago

From blippex.org:

'Blippex is a search engine by the people, for the people. Individuals that have our browser extension installed tell us how long they stayed on a webpage.'

I haven't had a chance to download the dump yet but I'm assuming that the time points are time spent on the website by users.

mgamache|12 years ago

It would be crazy cool to get a real-time feed of browsed URLs (not this dump format). Kind of like the mythical Twitter fire-hose.