top | item 3019011

scales: Greplin's new open source Python server metrics library

83 points| rwalker | 14 years ago |github.com | reply

14 comments

order
[+] rarrrrrr|14 years ago|reply
We open sourced something similar called StatGrabber awhile back, including Perl and Python client libraries. We tend to avoid threads, and so it instead uses non blocking UDP (guaranteed delivery on localhost) to a collector daemon, which aggregates and then delivers the info to Ganglia for graphing. https://spideroak.com/code if anyone is curious.

The client modules simply emit non-blocking UDP packets and get on with their business, avoiding slowing down their response time. You can graph 4 types of stats: counters (ex: transactions) averages (ex: size of transactions), accumulators (ex: bandwidth used) and elapsed time (ex: time per transaction)

It's pretty nice to see Ganglia graphing system metrics along side of all the stats we have the backend emit. For example, one of the stats graphed are revenue events. There's clear relationship between network health issues and revenue.

[+] moreati|14 years ago|reply
> UDP (guaranteed delivery on localhost)

That's not something I've heard before. Is it generally true of localhost-UDP? OS Specific? Particular to your usage?

[+] gnubardt|14 years ago|reply
We wrote something similar at Brightcove to collect system statistics and publish them graphite. It runs as an independent service and focuses on OS (not application) level metrics.

https://github.com/BrightcoveOS/Diamond

[+] SkyMarshal|14 years ago|reply
<3 SpiderOak. Any reason you guys don't host or mirror your code on Github? Makes it much easier for us curious tinkerers to keep your stuff organized and 'top of mind' among the many other things we fork, grok, and hack at.
[+] beagle3|14 years ago|reply
You could use non blocking TCP, and not worry about where the listening process, and have some idea if there's problem.

Now, you're just ignoring the possibility of a problem, which might be fine and dandy, but you're not actually getting anything in return (well, about 5 lines of C code to set up the TCP connection in non-blocking mode, and tear it down if there's an error).

[+] mattlong|14 years ago|reply
Correct me if I'm mistaken, but it seems like this wouldn't really work with a web server that has several worker processes like uwsgi in preforking mode. Each worker process would be sandboxed to its own STATS object and attempting to serve the HTTP/Graphite server in its own background thread; all of which would be trying to access the same port.
[+] pjscott|14 years ago|reply
Correct. This is intended more for daemons with one or two processes per server, or for handy debugging of a single instance on a development machine. In order to monitor something like preforking uwsgi, you would need a way to aggregate stats from all the processes, which this library doesn't do.

(You can change the port the HTTP server listens on, though. If you have ten workers, you can have them listen on ten different ports.)