Our local police force has set up a site for 'anonymous' reports from rape victims, and it had GA tracking on every page (plus Google CDN content, another issue). I wrote them to explain why this wasn't the best idea and how Piwik was a better choice.
Crimestoppers (A UK charity) are doing this too, and I wrote them to explain the potential for privacy issues. 'You've outsourced crime victims' privacy to an ad company' was my basic message.
I told both the local police and Crimestoppers how easy Piwik was, and how I thought it was a better idea but I don't think I got my point across. There is a gap in the understanding where the site owner doesn't see the raw data (bot Google does) and so they think it's okay.
Anyhow, interesting privacy issues and it may be that I am overlooking something or being overly cautious.
My clients are already using Google Webmaster Tools and Adwords and want everything integrated, plus they want the reliability of Google.
But another FOSS analytics platform is Snowplow (http://snowplowanalytics.com), and while it wouldn't replace GA for them, it might replace another commercial analytics package. Few high volume ecommerce sites use just GA these days.
Snowplow co-founder here. Thanks for mentioning us liquidcool :-)
Snowplow is a little different from Piwik - Piwik is a LAMP-stack opensource app which replicates a GA-style analytics experience.
Snowplow is more of a scalable event analytics platform - it is built on AWS (CloudFront, Elastic MapReduce, Redshift), does _not_ have a UI but has a very clean & simple event model[1] and scales horizontally to billions of events.
To date, Snowplow is mostly used by web companies that want to warehouse their granular event data to build custom analyses, segment users, personalize sites etc.
Piwik is pretty good alternative and unlike GA, it gives instant analytics and is extensible.
However, it tends to behave clunky after some time, depending on the number of sites, traffic and server where you host it. This is most visible when you try to display a larger date range or just dig in the history.
There is one more important thing to consider: it collects IP addresses of all your visitors out of the box, which might be in conflict with your local laws. Be sure to check that out before adding it to the site.
> There is one more important thing to consider: it collects IP addresses of all your visitors out of the box, which might be in conflict with your local laws. Be sure to check that out before adding it to the site.
Our attempts to install and use Piwik have always been disappointing in the past. It seems to fall apart (slow display etc.) quickly for medium to high traffic sites (50+ million page views/month). Are there any people here who are using it successfully at this order of magnitude and are willing to share some configuration hints?
We gave up on it for the same reason. It is absurd for a simple web analytics app to require massively more powerful hardware than our actual app does. And since it is a mysql mess, any updates that touch the schema put your stats offline for hours. I really can't fathom how using mysql is still considered acceptable in 2013.
Anyone fancy telling the lot at gov.uk about this? It would be nice if they weren't using foreign owned (and likely foreign-located) servers to record and analyse what UK citizens do on UK government websites.
Hi, realized about this post here in HN thanks to Piwik.
I am the author of the post Tony is linking at the top of his post, and I also use Piwik, where I saw my article was with 300+ visits instead of the 20+ it gets daily. :)
Piwik is great, and I use it to track visits to a few sites I own, all of them are some 6000 page views per day, so no real traffic.
Because my site is powered by Jekyll, I also use vanilla forums as commenting system, so no Disqus or Intense Debate either :)
Have a nice day, and thanks Tony for the link and credit .
Piwik is great until you get a decent amount of traffic, I had it on a client site as an experiment. at 100k page views per hour Piwik nuked the server :(
I'm a lover of Piwik; switch out of GA for Piwik across all the domains I operate for myself and for some clients. One install can handle multiple domains/site/accounts/groups
Many of the clients appreciate the increased "privacy". And for applications (internal/public/private) it just makes more sense to me than GA.
My favourite part is that it's doing server-side log analysis for my traffic - not using those JS based widgets.
It's got event tracking (sweet) if you choose to use a JS based tickler to do that kind of thing.
One thing I feel you miss out if you don't use GA is 'Google-juice'. Maybe it's just like blowing into a Super Nintendo cartridge, but I think using GA increases your SEO with Google.
The reason why this probably isn't true is that it would be a regulator's/anti-trust-buster's dream come true. Imagine it: Google ranks you lower if you don't use other Google products. Don't use GA? Ranked lower. Don't pay for ads? Maybe your organic results drop a bit. . .
Google has to be very careful about its organic search results. Giving extra weight to sites that use things like Google Analytics would be the "confirmation" that regulators would need. As such, it seems like Google wouldn't risk its core business over something like this.
I totally understand the logic: if I use Google Analytics, Google knows that my site is getting traffic and it makes a certain sense to take that into consideration. However, I think the opposite side of that (if sites don't give Google whatever Google wants, they're going to be ranked lower) is a can of worms that Google doesn't want to be seen dipping into.
As mdasen mentioned, Google Analytics probably doesn't have much of an effect on SEO due to regulatory concerns.
However, it probably does handicap you if you use other Google services, such as Adwords. Google Adwords heavily leverages Analytics data to optimize your marketing campaigns, and doesn't offer the ability to interface Adwords with a third-party like Piwik. So not using Analytics will cause Adwords to fly blind to user behavior on your site, and won't be able to tell which leads were useful and which weren't.
Piwik is great and so much better than Analytics. It works especially well for us when we want to track sources of sales, as sales are handled by 3rd party reseller and Analytics goals are of no use.
In Piwik you have the visitors log at a glance, we just match IP/time to the log and BAM - we know where the buyer came from, what pages did they visit and how many time stayed there.
On top of this, our site traffic is hidden from the eyes of Google.
Piwik can be set to follow the Do Not Track header.
It rigidly follows DNT, not storing the request at all.
It looks like Google and other analytics players are going to refuse to follow DNT, after a hilariously weak proposal by the DAA was rejected by the W3C committee.
If you want to give a try to Piwik on your own laptop, AWS or Azure we (BitNami) have free one-click installers, VMs and cloud images http://bitnami.com/stack/piwik
Edit: Nevermind - I just read to the end, Piwik can auto-update to latest 1.12. Missed that when cross-reading and trying it myself while reading.
The guide and the mentioned Github repo use Piwik 1.5.1. There are several security issues with this version (perhaps many more): http://www.cvedetails.com/vulnerability-list/vendor_id-9612/...
Latest version is 1.12 - I advise against this "simple" solution to use Piwik. Perhaps there is a github repo with the latest Piwik?
I've used Piwik myself for years and swear by it for all of my personal projects or sites where I need full data privacy. It provides all the basic data I need for my clients and then some. Also once it was setup I've found maintenance to be pretty simple. I use it for regular web sites, WordPress sites and MediaWiki installations.
Granted, I miss the days of being able to use tools like Analog but it's so badly out of date and not maintained anymore so I only use it when I need to process raw traffic numbers from a server.
I've used Piwik for years and it is incredibly simple to use and set up but this post makes it much more complex than it needs to be. In all honesty, it's just as simple as setting up Wordpress. Drop the Piwik folder on a server somewhere, run the installation (connecting to your database and if I recall correctly you don't need to use the root user, just a user with sufficient privileges), and you're done.
I want to love Piwik, and I do like it a lot, but I do have some problems. Piwik gets slow after a while. This may have to do with the server its running on partly but over time the software will slow down especially if you try to pull out somewhat longer date ranges.
It isn't as pretty as GA. I know this is petty and that its themeable but the UI was important to me. Keeping it up to date and maintaining it was also something that requires vigilance. It isn't hard to update but you have to make sure to check for updates. Sounds simple but you'd be surprised how lazy one can be. Also, integration with Webmaster Tools isn't available which is kind of a bummer.
On the plus side there's very little that GA offers that Piwik doesn't. There's even a great mobile app which GA doesn't yet have to my knowledge. You can monitor multiple sites on different servers using a simple JavaScript snippet just like GA, and it breaks down the data in just about every way you'd want.
In the end, despite really wanting to use Piwik long term I wasn't able to do it. I don't see a problem with using Google Analytics for tracking purposes. Google has the power to abuse the data they collect but I trust them not to. I'm not running a site where visitor privacy is a big priority. If I were running such a site I'd reconsider this position. But from an ethical standpoint if it's somehow not okay for Google to collect tracking data on your visitors (and promise not exploit it) why is it okay for any of us to use Piwik and collect that data ourselves. Google has far more data that can do far more damage but they also have far more resources to put into security than most of us. I can take a pledge not to exploit my user's data but Google does too? I know I can trust myself but my users don't. My users might even prefer that if I were to use analytics software that I use software that comes from Google, a name they know and trust, rather than me, a guy who they know a little bit but doesn't have a reputation that can even remotely compete with Google. To me, that's the more interesting aspect of Piwik. The question of why running your own anaytics software is more ethical than using Google.
Edit: When I said I wasn't running a site that made visitor privacy a priority I was excluding the site I run that actually does make user privacy a huge priority. I'm aware I look like a hypcrite now and I think I might actually spend some time thinking of whether or not to switch over to a self-hosted analytics solution for that site. I'm still not sure that a self-hosted service is preferable in my case but I'm open to the idea.
By default, Piwik will aggregate data when you a) make an API request - b) Load the dashboard (which in fact calls the API). Cron archiving makes this process faster by processing all the data beforehand so that the API can simply request it from the DB.
I think the important distinction is that trust will be either first-party or third-party, and most of your users will never think about it.
If I'm using your site or service I've already decided to trust you somewhat. Self-hosted analytics is just an additional baby step.
If you put a third-party resource on your site (and most of us do) then that third-party is going to have their own entirely separate terms & conditions, which you as webmaster have little or no control over.
It's down to whether you (and/or your users) are okay with farming out visitor privacy to a third-party. Once that question is satisfied it's just a question of performance.
> It isn't as pretty as GA. I know this is petty and that its themeable but the UI was important to me.
I explored this too--it turns out that they make money off of the design of custom Piwik UIs for sale. A site license for their white-labeling plugin costs just under 2,000 euro.
My product allows users to create and launch their own website. I'd quite like to be able to quickly provide basic statistics for users (alongside Google analytics if they want it).
We run a multi-tenant application so have thousands of sites running from the same codebase, however, each site owner would need it's own statistics. At the moment, we just let users provide their own Google Analytics, but it would be nice to report to Piwik I think and give them their own preconfigured stats area?
I've used it in the past to do something a bit similar. I had a sort of "master account" that was installed on all domains and then a second individual one per domain. Piwik worked really nicely and gave the end user the ability to see a sort of "combined stats" as well as per-domain stats. We couldn't get GA to work in this way (I think that has since changed).
This was about two years ago now and I've followed the development on and off and it's certainly come on leaps and bounds. It's definitely worth some investigation for your use-case.
I've done this on the product I'm developing. You can set it up programmatically and it works exceptionally well. That being said, there are some performance considerations, especially if the sites you're hosting are high traffic.
I'd recommend reading the documentation and doing a bit of searching for a configuration that will fit your needs before you begin. It will save you a lot of time.
Good luck and let me know if you have any specific questions!
I would not recommend Piwik over GA. The #1 reason is that Piwik does not track the (not provided) Google keyword searches. Google now provides the (not provided) keywords with the site page attached now. For example: (np - /pricing), so you at least have an understanding of what they searched for. This is a big factor if you're serious about SEO.
I have implemented Piwik widgets in my latest project for tracking visitors to personal pages. When you visit this page
http://reminderof.me/ruggero I see the insight on my dasbhboard.
IMHO Piwik is a valid open-source alternative to Google Analytics and will erode its application marketplace.
[+] [-] tombrossman|12 years ago|reply
Crimestoppers (A UK charity) are doing this too, and I wrote them to explain the potential for privacy issues. 'You've outsourced crime victims' privacy to an ad company' was my basic message.
I told both the local police and Crimestoppers how easy Piwik was, and how I thought it was a better idea but I don't think I got my point across. There is a gap in the understanding where the site owner doesn't see the raw data (bot Google does) and so they think it's okay.
Anyhow, interesting privacy issues and it may be that I am overlooking something or being overly cautious.
I posted a related question over on Stack Exchange if anyone's interested in providing some feedback there. http://webmasters.stackexchange.com/q/47069
[+] [-] toble|12 years ago|reply
Maybe you sounded like a salesperson? Piwik was unknown to me until I read it in this comment.
[+] [-] liquidcool|12 years ago|reply
But another FOSS analytics platform is Snowplow (http://snowplowanalytics.com), and while it wouldn't replace GA for them, it might replace another commercial analytics package. Few high volume ecommerce sites use just GA these days.
[+] [-] alexatkeplar|12 years ago|reply
Snowplow is a little different from Piwik - Piwik is a LAMP-stack opensource app which replicates a GA-style analytics experience.
Snowplow is more of a scalable event analytics platform - it is built on AWS (CloudFront, Elastic MapReduce, Redshift), does _not_ have a UI but has a very clean & simple event model[1] and scales horizontally to billions of events.
To date, Snowplow is mostly used by web companies that want to warehouse their granular event data to build custom analyses, segment users, personalize sites etc.
If anybody has any questions, just shout!
[1] https://github.com/snowplow/snowplow/blob/master/4-storage/r...
[+] [-] tonylampada|12 years ago|reply
[+] [-] tonylampada|12 years ago|reply
[+] [-] aram|12 years ago|reply
However, it tends to behave clunky after some time, depending on the number of sites, traffic and server where you host it. This is most visible when you try to display a larger date range or just dig in the history.
There is one more important thing to consider: it collects IP addresses of all your visitors out of the box, which might be in conflict with your local laws. Be sure to check that out before adding it to the site.
[+] [-] dkuntz2|12 years ago|reply
[+] [-] narrowingorbits|12 years ago|reply
[+] [-] af3|12 years ago|reply
Is there an way to disable this?
[+] [-] lazyjones|12 years ago|reply
[+] [-] asdasf|12 years ago|reply
[+] [-] Nursie|12 years ago|reply
Anyone fancy telling the lot at gov.uk about this? It would be nice if they weren't using foreign owned (and likely foreign-located) servers to record and analyse what UK citizens do on UK government websites.
[+] [-] afandian|12 years ago|reply
[+] [-] g-garron|12 years ago|reply
I am the author of the post Tony is linking at the top of his post, and I also use Piwik, where I saw my article was with 300+ visits instead of the 20+ it gets daily. :)
Piwik is great, and I use it to track visits to a few sites I own, all of them are some 6000 page views per day, so no real traffic.
Because my site is powered by Jekyll, I also use vanilla forums as commenting system, so no Disqus or Intense Debate either :)
Have a nice day, and thanks Tony for the link and credit .
[+] [-] nodefortytwo|12 years ago|reply
[+] [-] moepstar|12 years ago|reply
http://piwik.org/docs/optimize/
[+] [-] edoceo|12 years ago|reply
Many of the clients appreciate the increased "privacy". And for applications (internal/public/private) it just makes more sense to me than GA.
My favourite part is that it's doing server-side log analysis for my traffic - not using those JS based widgets.
It's got event tracking (sweet) if you choose to use a JS based tickler to do that kind of thing.
Here's a quick and dirty doc I made about it: http://praxis.edoceo.com/howto/piwik
[+] [-] sergiotapia|12 years ago|reply
Am I mistaken?
[+] [-] mdasen|12 years ago|reply
The reason why this probably isn't true is that it would be a regulator's/anti-trust-buster's dream come true. Imagine it: Google ranks you lower if you don't use other Google products. Don't use GA? Ranked lower. Don't pay for ads? Maybe your organic results drop a bit. . .
Google has to be very careful about its organic search results. Giving extra weight to sites that use things like Google Analytics would be the "confirmation" that regulators would need. As such, it seems like Google wouldn't risk its core business over something like this.
I totally understand the logic: if I use Google Analytics, Google knows that my site is getting traffic and it makes a certain sense to take that into consideration. However, I think the opposite side of that (if sites don't give Google whatever Google wants, they're going to be ranked lower) is a can of worms that Google doesn't want to be seen dipping into.
[+] [-] cosmie|12 years ago|reply
However, it probably does handicap you if you use other Google services, such as Adwords. Google Adwords heavily leverages Analytics data to optimize your marketing campaigns, and doesn't offer the ability to interface Adwords with a third-party like Piwik. So not using Analytics will cause Adwords to fly blind to user behavior on your site, and won't be able to tell which leads were useful and which weren't.
[+] [-] marchra|12 years ago|reply
[deleted]
[+] [-] thejosh|12 years ago|reply
[+] [-] handzhiev|12 years ago|reply
In Piwik you have the visitors log at a glance, we just match IP/time to the log and BAM - we know where the buyer came from, what pages did they visit and how many time stayed there.
On top of this, our site traffic is hidden from the eyes of Google.
[+] [-] generj|12 years ago|reply
It looks like Google and other analytics players are going to refuse to follow DNT, after a hilariously weak proposal by the DAA was rejected by the W3C committee.
[+] [-] ridruejo|12 years ago|reply
[+] [-] thomaslutz|12 years ago|reply
The guide and the mentioned Github repo use Piwik 1.5.1. There are several security issues with this version (perhaps many more): http://www.cvedetails.com/vulnerability-list/vendor_id-9612/... Latest version is 1.12 - I advise against this "simple" solution to use Piwik. Perhaps there is a github repo with the latest Piwik?
[+] [-] bcRIPster|12 years ago|reply
Granted, I miss the days of being able to use tools like Analog but it's so badly out of date and not maintained anymore so I only use it when I need to process raw traffic numbers from a server.
[+] [-] bpatrianakos|12 years ago|reply
I want to love Piwik, and I do like it a lot, but I do have some problems. Piwik gets slow after a while. This may have to do with the server its running on partly but over time the software will slow down especially if you try to pull out somewhat longer date ranges.
It isn't as pretty as GA. I know this is petty and that its themeable but the UI was important to me. Keeping it up to date and maintaining it was also something that requires vigilance. It isn't hard to update but you have to make sure to check for updates. Sounds simple but you'd be surprised how lazy one can be. Also, integration with Webmaster Tools isn't available which is kind of a bummer.
On the plus side there's very little that GA offers that Piwik doesn't. There's even a great mobile app which GA doesn't yet have to my knowledge. You can monitor multiple sites on different servers using a simple JavaScript snippet just like GA, and it breaks down the data in just about every way you'd want.
In the end, despite really wanting to use Piwik long term I wasn't able to do it. I don't see a problem with using Google Analytics for tracking purposes. Google has the power to abuse the data they collect but I trust them not to. I'm not running a site where visitor privacy is a big priority. If I were running such a site I'd reconsider this position. But from an ethical standpoint if it's somehow not okay for Google to collect tracking data on your visitors (and promise not exploit it) why is it okay for any of us to use Piwik and collect that data ourselves. Google has far more data that can do far more damage but they also have far more resources to put into security than most of us. I can take a pledge not to exploit my user's data but Google does too? I know I can trust myself but my users don't. My users might even prefer that if I were to use analytics software that I use software that comes from Google, a name they know and trust, rather than me, a guy who they know a little bit but doesn't have a reputation that can even remotely compete with Google. To me, that's the more interesting aspect of Piwik. The question of why running your own anaytics software is more ethical than using Google.
Edit: When I said I wasn't running a site that made visitor privacy a priority I was excluding the site I run that actually does make user privacy a huge priority. I'm aware I look like a hypcrite now and I think I might actually spend some time thinking of whether or not to switch over to a self-hosted analytics solution for that site. I'm still not sure that a self-hosted service is preferable in my case but I'm open to the idea.
[+] [-] halfdan|12 years ago|reply
Whenever Piwik gets slow you will have to setup cron archiving: http://piwik.org/docs/setup-auto-archiving/
By default, Piwik will aggregate data when you a) make an API request - b) Load the dashboard (which in fact calls the API). Cron archiving makes this process faster by processing all the data beforehand so that the API can simply request it from the DB.
[+] [-] m_ram|12 years ago|reply
[1] https://play.google.com/store/apps/details?id=com.google.and...
[+] [-] tombrossman|12 years ago|reply
If I'm using your site or service I've already decided to trust you somewhat. Self-hosted analytics is just an additional baby step.
If you put a third-party resource on your site (and most of us do) then that third-party is going to have their own entirely separate terms & conditions, which you as webmaster have little or no control over.
It's down to whether you (and/or your users) are okay with farming out visitor privacy to a third-party. Once that question is satisfied it's just a question of performance.
[+] [-] unknownian|12 years ago|reply
I use Piwik, but GAnalytics has a billion mobile apps.
[+] [-] themodelplumber|12 years ago|reply
I explored this too--it turns out that they make money off of the design of custom Piwik UIs for sale. A site license for their white-labeling plugin costs just under 2,000 euro.
[+] [-] richardv|12 years ago|reply
My product allows users to create and launch their own website. I'd quite like to be able to quickly provide basic statistics for users (alongside Google analytics if they want it).
We run a multi-tenant application so have thousands of sites running from the same codebase, however, each site owner would need it's own statistics. At the moment, we just let users provide their own Google Analytics, but it would be nice to report to Piwik I think and give them their own preconfigured stats area?
[+] [-] jaymzcampbell|12 years ago|reply
This was about two years ago now and I've followed the development on and off and it's certainly come on leaps and bounds. It's definitely worth some investigation for your use-case.
[+] [-] rohall|12 years ago|reply
I'd recommend reading the documentation and doing a bit of searching for a configuration that will fit your needs before you begin. It will save you a lot of time.
Good luck and let me know if you have any specific questions!
[+] [-] moepstar|12 years ago|reply
The plugin i use for that is: http://wordpress.org/plugins/wp-piwik/
Maybe you can get an idea how to do it with your codebase from that...
[+] [-] aram|12 years ago|reply
[+] [-] mitchwainer|12 years ago|reply
[+] [-] sgarbi|12 years ago|reply
IMHO Piwik is a valid open-source alternative to Google Analytics and will erode its application marketplace.
[+] [-] noinput|12 years ago|reply
Hope some find it helpful for a quick way to test out and run it for free to see if they like it (I do).