top | item 6090519

(no title)

rgiar | 12 years ago

so this is just when a given site was crawled?

  "_id": "b919f02c8f053c41e8ee86311ca9b0f6,
  "url": "https://www.example.com/",
  "host": "www.example.com",
  "root": "example.com",
  "time_spent": [
    {
      "sec": 45,
      "seen_at": ISODate("2013-06-23T00: 41: 44.0Z")
    },
    {
      "sec": 5,
      "seen_at": ISODate("2013-07-01T14: 41: 44.0Z")
    }

discuss

order

karli|12 years ago

Hi,

yes, as it is said in the blogpost, the only thing missing is the full text of the page for indexing & searching in it, we don't dare to release it because of copyright issues (he, you distribute the full text of my page!).

With this data you could for example built a new alexa and find out what was the most visited page last week :)

haldujai|12 years ago

With this data you could for example built a new alexa and find out what was the most visited page last week:)

While what you're doing is interesting, and this data could shed some light on a lot of questions, you're putting the cart way before the horse here.

125k searches equates to, generously, 12.5k users? From the Chrome Webstore and PlayStore it seems theres about 500 users from those.

A correct statement would be 'with this data you could for example find out waht the most visited web page was from our subset of a subset of 12.5k users'

That said if you get a significant market share this could be very interesting. I'm guessing you don't always plan on providing dumps for free and will monetize them at some point?

enigmo|12 years ago

How would we figure out which page was visited the most last week? Are these crawl logs or access logs?