emfree's comments

emfree | 8 years ago | on: Profiling Go Applications with Flamegraphs

A nice writeup, thanks. There are a few variations on this workflow that I've found useful in practice; perhaps they'll be helpful to some folks:

- Linux perf can profile unmodified Go programs. This is handy when your application doesn't expose the /debug/pprof endpoint. (http://brendangregg.com/FlameGraphs/cpuflamegraphs.html#perf has detailed instructions)

- Recent versions of https://github.com/google/pprof include a flamegraph viewer in the web UI. This is handy when you want a line-level flamegraph instead of a function-level flamegraph.

emfree | 9 years ago | on: Join-Idle-Queue: Load Balancing Algorithm for Scalable Web Services (2011)

> But in web services you often care more about the tail-end latency, the p90, p99 etc.

For sure. I think Theorem 2 in the paper implicitly addresses the latency distribution in this scheme. They're saying that in the limit of a large system, the queue length distribution at a single backend server depends only on the service time distribution (how long it takes to actually process each job) and the service discipline. So if for example job sizes are exponentially distributed and handled in FIFO order, then the wait time distribution is also exponential.

It would certainly be nice to see a more explicit discussion of the tail latency, especially in the simulations the authors did.

emfree | 9 years ago | on: How does gdb work?

Great question. I wondered the same thing a while ago, and tried to build one using SystemTap (https://github.com/emfree/pystap). Couple reasons why this isn't too easy:

* "Python" in general might mean you're on Linux/Windows/whatever, and it might mean CPython, PyPy, or some other runtime. But any out-of-process instrumentation is gonna have to be pretty platform/runtime specific.

* Even if we restrict ourselves to, say, CPython on Linux, the interpreter's internals aren't super friendly to this sort of inspection from the outside. You have to rely on and also work around implementation details.

Example: to get a Python call stack, you want to look at `PyThreadState_Current` (basically the same idea as `ruby_current_thread` in that excellent linked post of Julia's, I think). But this happens to be null whenever the GIL is released, e.g. when doing network I/O, and then you're kind of out of luck. So you'll already have trouble usefully profiling a single-threaded I/O-intensive program.

* Oh and you pretty much need debug symbols in your CPython binary (I think? Tell me if this isn't true!). Most production CPython builds don't have them. So you have to get the right binary, and rebuild any application dependencies with C extensions. Not hard but annoying.

There is potential though! With some work, we definitely could have a better story for out-of-process Python profiling a la Linux perf.

emfree | 10 years ago | on: Supersingular elliptic curve isogeny Diffie-Hellman 101

Yep! In this case, I think you end up constructing, slightly more specifically, the isogeny whose kernel is exactly the cyclic subgroup generated by the point R (i.e., phi(S) is 0 iff S is a power of R). There are explicit formulas ("Vélu's formulas") that let you compute an isogeny from its kernel. Looks like the paper goes into some depth about how to do that computation efficiently, and how to ensure that you choose a cryptographically suitable point R.

emfree | 10 years ago | on: Heroku Kafka

Thanks for the insightful comment!

> The alternative if you are at a company with the resources to do so (mine is), is to build something that fits your use case better than Kafka

I'd love to hear more about this :) What did you end up doing differently from Kafka? How's it working out for you?

emfree | 10 years ago | on: Random Walks: the mathematics in 1 dimension

Here's a reference I found for one way to do it: http://www.math.nus.edu.sg/~matsr/ProbII/Lec6.pdf (Theorem 2.1). You define the Green's function G(x, y) = \sum_n Pr_x(S_n=y), where x and y are 3-vectors and Pr_x(S_n=y) is the probability that an n-step random walk starting at x ends up at y. If you have an infinite random walk starting at 0, then G(0, 0) is the expected number of times that the walk returns to 0. That's what the mathworld link calls u(3). You can use Fourier inversion to compute G(0, 0) -- the link gives the gnarly details. It's pretty cool.

emfree | 10 years ago | on: Profiling Python in Production

Author of the post here. That's a good question. I don't know if this approach is objectively better, but it has a few nice features.

* We generally favor free/open source solutions where practical.

* It is quite a bit cheaper in dollar terms.

* The actual code to make this work is very lightweight. By doing it yourself, you have total control, and can extend or tweak to get exactly the data you want. Being able to easily add bespoke instrumentation is really powerful. To give an example from one of our use cases (IMAP sync), let's say you wanted to cohort your data by mail provider. I.e., you suspect that the workload profile when syncing against server A is significantly different than syncing against server B, and you want to know for sure. It's pretty easy to take your codebase and your instrumentation, and add that by inspecting some thread-local context at runtime. Might be hard to do with an off-the-shelf commercial tool.

emfree | 11 years ago | on: Inbox — The next-generation email platform

Hi, Inbox engineer here. Beyond the contextIO feature set, we support creating drafts, sending mail, and client sync, so you can use the API to really build full-fledged mail clients. The Inbox sync engine indexes all the data, so the API's performance isn't limited by that of the mail provider.
page 1