For those who didn't read the post: he's formatting syslog messages as redis wire commands, telling syslog to forward those messages to redis over the network (normally you're forwarding to another syslog client, but because of the custom message format here, it acts against a redis server), so syslog ends up as a write-only redis client.
He's already patching nginx to support logging via syslog, so why didn't he just patch nginx to log directly to redis? One less place for things to go wrong.
Brilliantly lazy, I would say, since he used out-of-box software to do the piping, he didn't have to write any code. My first inclination would be to format nginx's logs to a JSON array like such:
(just copied and pasted from my own config, many fields are not relevant to this use case in particular).
Then log to a named pipe into a program which would then RPUSH the entries into a LIST or maybe PUBLISH them to a channel, from where several different analytics programs would be running for several different metrics that depend on the same data, so that they could run in parallel.
Nice. A similar (in spirit) hack I've used is to host the pixel on a CDN (like Akamai or Cloudfront) and run map reduce over the logs there. You don't get real-time, like this, but look ma, no server! CloudFront is especially useful since it can log directly to S3 and you can run elastic map reduce directly on those files.
I've started doing this as well, and it works well. Cloudfront lists their log delivery as "best effort" which gave me pause me at first. However, after a month of testing, I've seen the stats to be on part with Google Analytics.
I'd love to see an open source package emerge to implement the collection and the data processing. Cloudfront + Redshift would rock.
As others have mentioned, nginx + one of the redis (or lua-redis) modules does this very well without the complexity of syslog in the middle. We load many millions of values a day via httpredis2. It's been rock solid.
We log the same requests to a file in a custom log format that gets batched to s3 and then Cassandra and EMR/Hive. Makes a great platform for realtime + historical analytics.
Seconding this. I've been doing something similar using OpenResty[0] and Redis. Handles millions of page views a day on a pretty low end server without breaking a sweat. Documentation on OpenResty is kinda tricky to wade through, but man is it lightweight and fast.
I've been playing around with Docker and containers a little recently and I decided to test my skills in creating a container, installing software, and packaging it up for re-use and possibly even try to do it all in a Dockerfile after I do it through just "docker run -i -t ubuntu /bin/bash/".
I followed your guide but I am not able to see the Redis records. I was able to find pixel.log in /var/log/nginx and inside was what I would expect to see in Redis:
At this point I went to check the syslog-ng program as I realized I never verified it was running/working. I am getting a "Error parsing source, source plugin system not found in /usr/local/etc/syslog-ng.conf at line 2, column 3:" error and after some googling I found some people suggesting[0] adding '@include "scl.conf"' to the syslog-ng config file. I tried added this and it caused another syntax error. I googled for a while longer to no avail.
I know HN isn't really the forum for tech support but I couldn't find a better place to post it other than over email (My email is in my profile if you prefer that). If you have any pointers please let me know. Thank you for any help you may be able to provide.
If you're just popping off a queue, using something like POSIX Message Queues of SysV IPC might be nice to avoid depending on Redis on every single app server.
[+] [-] seiji|12 years ago|reply
For those who didn't read the post: he's formatting syslog messages as redis wire commands, telling syslog to forward those messages to redis over the network (normally you're forwarding to another syslog client, but because of the custom message format here, it acts against a redis server), so syslog ends up as a write-only redis client.
Insane. and brilliant.
[+] [-] spc476|12 years ago|reply
[+] [-] bonekeeper|12 years ago|reply
log_format json '["$time_local","$msec","$remote_addr","$remote_user","$http_host","$request","$status","$upstream_addr","$upstream_cache_status","$bytes_sent","$request_time","$upstream_response_time","$http_referer","$http_user_agent","$http_x_forwarded_for","$http_cookie","$upstream_http_x_session_key","$upstream_http_x_session_user"]';
(just copied and pasted from my own config, many fields are not relevant to this use case in particular).
Then log to a named pipe into a program which would then RPUSH the entries into a LIST or maybe PUBLISH them to a channel, from where several different analytics programs would be running for several different metrics that depend on the same data, so that they could run in parallel.
[+] [-] jeffjose|12 years ago|reply
[+] [-] gfodor|12 years ago|reply
[+] [-] wiremine|12 years ago|reply
I'd love to see an open source package emerge to implement the collection and the data processing. Cloudfront + Redshift would rock.
[+] [-] jbyers|12 years ago|reply
http://wiki.nginx.org/HttpRedis2Module
We log the same requests to a file in a custom log format that gets batched to s3 and then Cassandra and EMR/Hive. Makes a great platform for realtime + historical analytics.
[+] [-] ClifReeder|12 years ago|reply
http://openresty.org/
[+] [-] zvikara|12 years ago|reply
[+] [-] joshstrange|12 years ago|reply
I followed your guide but I am not able to see the Redis records. I was able to find pixel.log in /var/log/nginx and inside was what I would expect to see in Redis:
.....At this point I went to check the syslog-ng program as I realized I never verified it was running/working. I am getting a "Error parsing source, source plugin system not found in /usr/local/etc/syslog-ng.conf at line 2, column 3:" error and after some googling I found some people suggesting[0] adding '@include "scl.conf"' to the syslog-ng config file. I tried added this and it caused another syntax error. I googled for a while longer to no avail.
I know HN isn't really the forum for tech support but I couldn't find a better place to post it other than over email (My email is in my profile if you prefer that). If you have any pointers please let me know. Thank you for any help you may be able to provide.
[0] http://comments.gmane.org/gmane.comp.syslog-ng/15325
[+] [-] spc476|12 years ago|reply
[+] [-] hackerboos|12 years ago|reply
1. Javascript disabled
2. Tracking emails opened
http://skillcrush.com/2012/07/19/tracking-pixel/
[+] [-] Sirupsen|12 years ago|reply
[+] [-] benwilber0|12 years ago|reply
[+] [-] thezilch|12 years ago|reply
[+] [-] benwilber0|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] pyotrgalois|12 years ago|reply
[+] [-] misiti3780|12 years ago|reply