kmavm's comments

kmavm | 6 months ago | on: Safepoints and Fil-C

Hi Fil! Congrats on all the amazing progress on Fil-C.

We needed to port all the user-level fork(2) calls to vfork(2) when working on uCLinux, a port of Linux to MMU-less microcontrollers[1]. It used to be that paging MMUs were kinda expensive (those TLBs! so much associativity!!!), and the CPU on your printer/ethernet card/etc. might not have that much grit. Nowadays not so much.

Still. A hard-and-fast use for vfork(2), as requested perhaps.

[1] http://www.ibiblio.org/lou/old/ViewStation/

kmavm | 7 months ago | on: Releasing weights for FLUX.1 Krea

Amazing. I can practically smell that owl it looks so darned owl-like.

From the article it doesn’t seem as though photorealism per se was a goal in training; was that just emergent from human preferences, or did it take some specific dataset construction mojo?

kmavm | 2 years ago | on: Real-time image editing using latent consistency models

(Disclaimer: I'm an investor in Krea AI.)

When Diego first showed me this animation, I wasn't completely sure what I was looking at, because I assumed the left and right sides were like composited together or something. But it's a unified screen recording; the right, generated side is keeping pace with the riffing the artist does in the little paint program on the left.

There is no substitute for low latency in creative tools; if you have to sit there holding your breath every time you try something, you aren't just linearly slowed down. There are points that are just too hard to reach in slow, deliberate, 30+ second steps that a classical diffusion generation requires.

When I first heard about consistency, my assumption was that it was just an accelerator. I expected we'd get faster, cheaper versions of the same kinds of interactions with visual models we're used to seeing. The fine hackers at Krea did not take long to prove me wrong!

kmavm | 2 years ago | on: Paradigms of A.I. Programming: Case Studies in Common Lisp (1991)

It's "out of favor" because it completely failed as a research program. Let's not equivocate about this; it's nice to understand heuristic search, and there was a time when things like compilation were poorly understood enough to seem like AI. But as a path towards machines that succeed at cognitive tasks, these approaches are like climbing taller and taller trees in the hopes of getting to the moon.

kmavm | 3 years ago | on: Sapling: A new source control system with Git-compatible client

Slight correction: HipHop for PHP was cleanroom, including rewriting large families of native extensions to work with its C++ runtime, although it eventually developed workalikes for the PHP dev headers to ease development. Source: I worked on HHVM, its JIT successor that initially shared its source tree and runtime.

kmavm | 3 years ago | on: SQLite: QEMU All over Again?

The idea that "virtualization" began with Zen in 2004 is rather difficult to read as an early VMware employee. Before QEMU independently discovered it, VMware was JIT'ing unrestricted x86 to a safe x86 subset from 1999 on[1]. Hardware support for trap-and-emulate virtualization came to the market in the early 'aughts after VMware had proven the market demand for it.

[1] https://www.vmware.com/pdf/asplos235_adams.pdf

kmavm | 3 years ago | on: Windows 9x Video Minidriver HD+

When I was at VMware in the 'aughts, VESA often saved us as an unaccelerated option for guests that didn't yet have a driver for our virtual display. Was there really no VESA driver for the 9x family? Or does QEMU's BIOS not do it or something?

kmavm | 4 years ago | on: PlanetScale – Database for Developers

I was Chief Architect at Slack from 2016 to 2020, and was privileged to work with the engineers who were doing the work of migrating to Vitess in that timeframe.

The assumption that tenants are perfectly isolated is actually the original sin of early Slack infrastructure that we adopted Vitess to migrate away from. From some earlier features in the Enterprise product (which joins lots of "little Slacks" into a corporate-wide entity) to more post-modern features like Slack Connect (https://slack.com/help/articles/1500001422062-Start-a-direct...) or Network Shared Channels (https://slack.com/blog/news/shared-channels-growth-innovatio...), the idea that each tenant is fully isolated was increasingly false.

Vitess is a meta-layer on top of MySQL shards that asks, per table, which key to shard on. It then uses that information to maintain some distributed indexes of its own, and to plan the occasional scatter/gather query appropriately. In practice, simply migrating code from our application-sharded, per-tenant old way into the differently-sharded Vitess storage system was not a simple matter of pointing to a new database; we had to change data access patterns to avoid large fan-out reads and writes. The team did a great write-up about it here: https://slack.engineering/scaling-datastores-at-slack-with-v...

kmavm | 5 years ago | on: Slack’s Outage on January 4th 2021

This is accurate: Slack is exclusively using Hack/HHVM for its application servers.

HHVM has an embedded web server (the folly project's Proxygen), and can directly terminate HTTP/HTTPS itself. Facebook uses it in this way. If you want to bring your own webserver, though, FastCGI is the most practical way to do so with HHVM.

kmavm | 6 years ago | on: Timeline of Slack’s Tech Stack Evolution

Hi! We didn't do a great job writing about this at the time, but Slack's migration into HHVM took place in 2016. We've been gradually increasing the coverage of Hacklang (Facebook's gradually typed descendant of PHP) since then, and are now 100% Hacklang.

kmavm | 6 years ago | on: Timeline of Slack’s Tech Stack Evolution

We decided early on to colocate most aspects of the back-end, in part because we anticipated shared channels[1], but also because provisioning even virtual hardware for each team would be prohibitively expensive: we have over 600,000 organizations in Slack today[2], too many to make hard-partitioning most resources economical.

[1] https://www.zdnet.com/article/slack-brings-shared-channels-t... [2] https://sec.report/Document/0001628280-19-004786/

kmavm | 8 years ago | on: The Future of HHVM

I'm chief architect at Slack, and we migrated to Hack from PHP 5 throughout 2016.

The toolchain for HHVM is all installed as a single big deliverable, which gives you the language engine and supporting runtime libraries itself, an in-address-space web server (Facebook's Proxygen), a debugger in the form of hhvm -a, and the Hacklang toolchain accessed via hh_client and appropriate editor/IDE integrations.

I share your intuition that there is actually a glittering core of "stuff-that-makes-you-successful" hiding in the incidental complexity of PHP, and we wrote this blog post trying to put some substance behind that intuition: https://slack.engineering/taking-php-seriously-cf7a60065329

kmavm | 9 years ago | on: Taking PHP Seriously

You're asking about Hack, and the only response I see so far is about HHVM. So!

I've seen Hack used in a multi-tens-of-MLOC codebase to gradually insert types. It made huge differences in the kinds of changes that were possible; you can, e.g., rename a class or method, or change the order of its arguments, with confidence comparable to that in a C++ codebase. Most developers hack-ified everything they could get their hands on, and did all new work in Hack, without any external encouragement.

kmavm | 10 years ago | on: NewLisp

I'm not able to make sense of why it would not just be region-based memory-management. (http://www.sciencedirect.com/science/article/pii/S0890540196...). The usual way this fails is that everything ends up bubbling up to the top-most context, which is basically uncollectable. So you leak to death unless you limit yourself in awareness of this possibility.

I was hoping that their might actually be a linear type system hiding out in there, but alas, such is not the case.

kmavm | 10 years ago | on: Inferring Algorithmic Patterns with Stack

FAIRie here. GA is essentially guess-and-check. SGD doesn't require guess or even check; if your error function is differentiable, and you didn't screw up the chain rule, you know what direction to head in to make it go down. For large models with lots of parameters, finding a setting for each parameter that makes the model as a whole go down by random choice has complexity n * f(n) for some monotonically increasing f, while SGD really finds you your next model, with decent guarantees it will be better, in O(n) time.

The theoretical hand-wringing about SGD for neural nets is that their loss surface isn't convex. It turns out this doesn't matter. The loss surface is a high-dimensional egg-carton, and you need to get winning-the-lottery-while-struck-by-lightning unlucky to find a significantly shallow local minimum. There are lots of saddle points, and you need to do something to drive out of those, but stochasticity seems sufficient in practice.

kmavm | 10 years ago | on: Deep learning

So, convolution is by itself an attempt to exploit translation-invariance in the visual world, and typical deep convnets end up picking up a certain amount of scaling tolerance (though I would not call it invariance) by having features that are sensitive to larger and larger patches of the input as you go up the hierarchy of features. This is not real scale-invariance, and many people run a laplacian pyramid of some sort at test time to get it real scale-invariance when eking out the best possible numbers.

Rotation-invariance is probably not really a thing you want. The visual world is not, in fact, rotation-invariant, and the "up" direction on Earth-bound, naturally-occurring images has different statistics than the "down" direction, and you'd like to exploit these. Animal visual systems are not rotation-invariant either; an entertainingly powerful demo of this is "the Thatcher Effect" (https://en.wikipedia.org/wiki/Thatcher_effect).

Reflection across a vertical axis, on the other hand, often is exploitable, at least in image recognition contexts (as opposed to, say, handwriting recognition). If you look at the features image recognition convnets are learning they are often symmetric around some axis or other, or sometimes come in "pairs" of left-hand/right-hand twins. As far as I know nobody has tried to exploit this architecturally in any way other than just data augmentation, but it's a big world out there and people have been trying this stuff for a long time.

kmavm | 11 years ago | on: How we made editing Wikipedia twice as fast

The win here wasn't about scale, though. A tiny fraction of wikipedia users are logged in.

The win here was about individual page load time. And page load time is just as important, if not more so, for something new trying to vigorously grow as it is for the big sites.

(Disclaimer: HHVM alum.)

page 1