benfrederickson | 4 years ago | on: Why Python needs to be paused during profiling – but Ruby doesn't always
benfrederickson's comments
benfrederickson | 5 years ago | on: Show HN: Austin-Tui – Spy inside a running Python program at no performance cost
Py-spy defaults to blocking because the results can be pretty wrong otherwise: https://github.com/benfred/py-spy/issues/56 . You can see this problem profiling a program like https://github.com/benfred/py-spy/blob/master/tests/scripts/... with or without the nonblocking flag in py-spy - the nonblocking version produces garbage output.
Somewhat interestingly, this problem doesn't seem to occur with Ruby - and rbspy can get away without pausing the target program with only minor errors seen when profiling a similar function. I suspect this is because of differences between how the Ruby and Python interpreters store call stack information, but haven't had a chance to dig into the specifics.
benfrederickson | 6 years ago | on: Making Python Programs Blazingly Fast
I wrote a tool py-spy (https://github.com/benfred/py-spy) that is worth checking out if you’re interesting in profiling python programs. Not only does it solve those problems with cProfile - py-spy also lets you generate a flamegraph, profile running programs in production, works with multiprocess python applications, can profile native python extensions etc.
benfrederickson | 6 years ago | on: Profiling Native Python Extensions
benfrederickson | 7 years ago | on: Using /proc to get a process' current stack trace
benfrederickson | 7 years ago | on: Show HN: Py-spy – A new sampling profiler for Python programs
benfrederickson | 7 years ago | on: Show HN: Py-spy – A new sampling profiler for Python programs
benfrederickson | 7 years ago | on: Show HN: Py-spy – A new sampling profiler for Python programs
benfrederickson | 7 years ago | on: Show HN: Py-spy – A new sampling profiler for Python programs
benfrederickson | 7 years ago | on: How to crawl a quarter billion webpages in 40 hours (2012)
benfrederickson | 8 years ago | on: Drawing Venn Diagrams
Venn/Euler diagrams don't work all that well past 3 sets, not all areas will be shown if using circles - so unless some of the sets are disjoint it will be a misleading diagram (like in the music example). However, I think it works well for 3 set diagams, I have an interactive example on last.fm data here https://www.benfrederickson.com/distance-metrics/ in the context of explaining some simple distance metrics.
benfrederickson | 8 years ago | on: Drawing Venn Diagrams
A while back I wrote a small package in Javascript for computing area proportional Venn and Euler diagrams: https://github.com/benfred/venn.js . The 2 circle case here is relatively easy, but the problem gets tricky when you have 3+ sets. I wrote up my approach here https://www.benfrederickson.com/venn-diagrams-with-d3.js/ and https://www.benfrederickson.com/better-venn-diagrams/
benfrederickson | 8 years ago | on: Darts, Dice, and Coins: Sampling from a Discrete Distribution (2011)
Also worth reading up on are sum-heaps. Alias tables are O(1) to sample from but O(n) to build/modify. Sum-heaps let you modify in O(log(n)) at the cost of sampling in O(log(n)) as well. A good writeup is here: https://timvieira.github.io/blog/post/2016/11/21/heaps-for-i...
benfrederickson | 8 years ago | on: Why GitHub Won't Help with Hiring
benfrederickson | 8 years ago | on: Ranking Programming Languages by GitHub Users
For your first question - yes this means few people use more than one language in a month. There is also a power law distribution happening with user activity each month, so most users only have a handful of events each month (which happen to be mostly in a single language). I'm trying to measure how broad support it so this was mostly done on purpose. I was finding counting total events was getting biased by things that I most have been automatic activity (I was seeing single accounts with 10K commits a day for instance).
Percent of MAU in the charts is the total percentage of unique users who were active that month. I haven't tried out with yearly active users =(
benfrederickson | 8 years ago | on: Ranking Programming Languages by GitHub Users
benfrederickson | 8 years ago | on: Ranking Programming Languages by GitHub Users
benfrederickson | 8 years ago | on: Analyzing One Million Robots.txt Files
I ended up analyzing very different things from this article though, so this article was still pretty interesting to me.
benfrederickson | 9 years ago | on: Interactive Numerical Optimization Tutorial
benfrederickson | 9 years ago | on: Interactive Numerical Optimization Tutorial
Basically though, I'm using the non-linear CG method - so it doesn't require a positive definite matrix. The loss function is a little funky with handling the disjoint set/ subset relationships in the euler diagrams appropriately (defines the loss/gradient to be 0 if these constraints are satisfied), but this approach still works pretty well.
That venn diagram post has a couple interactive demos of how this works, and also a randomized test showing overall performance.
I actually believe its the best known algorithm for laying out area proportional venn diagrams. I benchmarked against the code from the venneuler paper here: http://benfred.github.io/venn.js/tests/venneuler_comparison/