So today I learnt that Amazon.com is acutally hosted on Google Cloud Platform uses Google Domains as well as 1&1 Hosting, alongside Adobe TagManager and Google Analytics. Who could have known? \s
Interesting, but the results seem kind of misleading. For example, https://sitestacks.com/linestarve.com says I'm using "NameCheap Web Hosting" -- they're my registrar, but I run my own HTTP server for that domain. Meanwhile https://sitestacks.com/analytics.bitmash.io reports that it's built with Piwik (true, but I would expect that to appear on a page using Piwik, which it doesn't), but for some reason leaves out all mention of webserver, DNS hosting, and so on.
Same with my site, says iwantmyname hosting, but in reality that's just my registrar. Actually I'm using a very popular cloud VM service, so I'm not sure why that wasn't picked up.
Also couldn't discern my actual web stack beyond nginx and html5 and some javascript libraries like google analytics.
It's probably not very detectable, though...flask.py
Thanks for the feedback. Sounds like NameCheap should be picked up as a registrar; we'll take a look!
On the second point, we should clarify that the data is meant to be domain-wide (except for the real-time mapper for domains not already in our universe - there we only look at the first page so we can return data quickly).
Had a quick go and was pretty disappointed. Most of this is just having a look at the third party js loaded. Other things are wild guesses based on DNS, which has nothing to do with the website but are either out of date or reside in unassociated parts of the business. The tech stack detection was was almost entirely wrong :(
In order for me to actually use your service it would need to be rolled into a browser extension. This needs to be at your fingertips when you need it, now I need to copy-paste the url into a new tab. I use Wappalyzer [1] a couple of times a day and love the browser integration.
These tools never work well, usually a weak combination of scraping the html, JS tags and checking DNS entries. Siftery tried something similar with a bunch of VC raised and is equally bad.
The only decent one is http://builtwith.com which also happens to be a great 1-man business.
> This is a limited profile.
> To see additional products added by employees and vendors, check out Siftery's profile.
Both sites use the same domain provider and the same SSL cert provider. I'd say with medium confidence they are probably 2 products of the same company.
Yeah, and they're not the only provider you can point to. Some competition pushes everyone to do better?
Here's my pitch for why you might want to use SiteStacks with a browser extension:
Lightweight: The extension is only 25 KB (mostly images). The one-click technology lookup runs entirely on our servers. No content insertions or background processes to slow down your browsing. It's like your favorite search engine.
Secure: SiteStacks doesn't download any of your browsing data - only the active tab URL is passed along.
Great product coverage: SiteStacks is supported by Siftery and its library of over 40,000 products. SiteStaks includes data for some products that isn't publicly available anywhere else.
Best-in-class accuracy: The data on SiteStacks benefits from validation form the awesome Siftery community. This built-in constant feedback loop helps us identify data collection methods that are yielding bad data and ultimately promotes best-in-class data accuracy.
SiteStacks can find the technology used at any domain, including a set of roughly 700,000 that we’re regularly checking.
What makes the dataset unique is the combination of programmatic data (code breadcrumbs, network requests, DNS, some NLP, etc.), but augmented by data validated by users directly.
The user validated data is only available on Siftery (e.g. for sitestacks.com/uber.com you have to follow the link through to siftery.com/company/uber to see the full set), but all the programmatic methods are improved by user-validated data (e.g. if a method yields too many false positive, we bump it out).
We think this approach helps create the most accurate dataset of its kind. We’ve done some internal benchmarking and feel really good about it.
We’re looking for feedback on how this can be better, and open to partnering with others who want to make use of this data for good.
just tried this for our app, and it wrongly reported mandrill and flash (we’re not using any of them). we used mandrill a few years ago, so this might be some stale historical data, but the app never used flash.
> We’re looking for feedback on how this can be better, and open to partnering with others who want to make use of this data for good.
Responding to the feedback and data discrepancies mentioned here would be a good start. The HN community here is testing this for you for free and providing you with valuable feedback, and asking you questions that you need to answer, if you want to make your product useful.
I don't see you (OP) responding to anyone. The 2 posts from you are both promoting the site.
We are looking into the data discrepancies, it might take a bit of time to get through the individual errors. DNS + Registrar data seem to have brought on false positives. We need to beef up on Front-end tech too. The real time mapper needs work.
Humbling comments really, we have our work cut out.
Ran it on https://canpicker.com/. It's kind of cool and accurate but I was expecting it to pick up stuff like react and maybe finer grain details like individual libraries.
[+] [-] dna_polymerase|8 years ago|reply
Something is really off there, what do you do to get these results? https://sitestacks.com/amazon.com
[+] [-] bootcat|8 years ago|reply
[+] [-] wolfgang42|8 years ago|reply
[+] [-] mod|8 years ago|reply
Also couldn't discern my actual web stack beyond nginx and html5 and some javascript libraries like google analytics.
It's probably not very detectable, though...flask.py
[+] [-] ggiaco|8 years ago|reply
On the second point, we should clarify that the data is meant to be domain-wide (except for the real-time mapper for domains not already in our universe - there we only look at the first page so we can return data quickly).
[+] [-] scaryclam|8 years ago|reply
[+] [-] afloatboat|8 years ago|reply
[1] https://wappalyzer.com/
[+] [-] ggiaco|8 years ago|reply
[+] [-] manigandham|8 years ago|reply
The only decent one is http://builtwith.com which also happens to be a great 1-man business.
[+] [-] thephyber|8 years ago|reply
> This is a limited profile. > To see additional products added by employees and vendors, check out Siftery's profile.
Both sites use the same domain provider and the same SSL cert provider. I'd say with medium confidence they are probably 2 products of the same company.
[+] [-] bootcat|8 years ago|reply
[+] [-] ggiaco|8 years ago|reply
Here's my pitch for why you might want to use SiteStacks with a browser extension:
Lightweight: The extension is only 25 KB (mostly images). The one-click technology lookup runs entirely on our servers. No content insertions or background processes to slow down your browsing. It's like your favorite search engine.
Secure: SiteStacks doesn't download any of your browsing data - only the active tab URL is passed along.
Great product coverage: SiteStacks is supported by Siftery and its library of over 40,000 products. SiteStaks includes data for some products that isn't publicly available anywhere else.
Best-in-class accuracy: The data on SiteStacks benefits from validation form the awesome Siftery community. This built-in constant feedback loop helps us identify data collection methods that are yielding bad data and ultimately promotes best-in-class data accuracy.
[+] [-] subie|8 years ago|reply
[+] [-] edoceo|8 years ago|reply
https://news.ycombinator.com/item?id=15249136
[+] [-] ishansgupta|8 years ago|reply
What makes the dataset unique is the combination of programmatic data (code breadcrumbs, network requests, DNS, some NLP, etc.), but augmented by data validated by users directly.
The user validated data is only available on Siftery (e.g. for sitestacks.com/uber.com you have to follow the link through to siftery.com/company/uber to see the full set), but all the programmatic methods are improved by user-validated data (e.g. if a method yields too many false positive, we bump it out).
We think this approach helps create the most accurate dataset of its kind. We’ve done some internal benchmarking and feel really good about it.
We’re looking for feedback on how this can be better, and open to partnering with others who want to make use of this data for good.
[+] [-] tjic|8 years ago|reply
I punched in a URL of a website I built, it didn't have data, went out to get some, then reported back that it couldn't.
Meanwhile https://builtwith.com/reservations.camprrm.com worked.
[+] [-] adzicg|8 years ago|reply
[+] [-] pvg|8 years ago|reply
https://news.ycombinator.com/item?id=15249136
[+] [-] ishansgupta|8 years ago|reply
[1] https://chrome.google.com/webstore/detail/sitestacks-instant...
[2] https://addons.mozilla.org/en-US/firefox/addon/sitestacks/re...
[+] [-] enitihas|8 years ago|reply
[+] [-] justboxing|8 years ago|reply
Responding to the feedback and data discrepancies mentioned here would be a good start. The HN community here is testing this for you for free and providing you with valuable feedback, and asking you questions that you need to answer, if you want to make your product useful.
I don't see you (OP) responding to anyone. The 2 posts from you are both promoting the site.
[+] [-] ayanb|8 years ago|reply
Humbling comments really, we have our work cut out.
[+] [-] Sreyanth|8 years ago|reply
Doesn't seem like it is accurate though. Seems like this is more of crowdsourced data than automatically figuring out things.
[+] [-] deadghost|8 years ago|reply
[+] [-] Doctor_Fegg|8 years ago|reply
[+] [-] ggiaco|8 years ago|reply
[+] [-] whipoodle|8 years ago|reply
[+] [-] lozzo|8 years ago|reply
I was curious to see if it was going to work out that I am running it against Google App Engine but it did not figure that
[+] [-] mceoin|8 years ago|reply
[+] [-] ggiaco|8 years ago|reply
[+] [-] israrkhan|8 years ago|reply
[+] [-] schmidty|8 years ago|reply
[+] [-] ggiaco|8 years ago|reply
Here's some data we have on CMSs currently
https://siftery.com/categories/content-management-system-cms... https://sitestacks.com/products/wordpress https://siftery.com/wordpress