lukemerrick's comments

lukemerrick | 1 year ago | on: Sharing new research, models, and datasets from Meta FAIR

Meta is a very large organization, and I'm willing to believe that a good chunk of Meta FAIR (the lab releasing all of this stuff) truly do care about innovations for advancing AI safety and are doing great work along these lines. I'm not disagreeing with your point about the company being led by its financial incentives as a unit, but let's also allow ourselves permission to celebrate this work by this group of people.

lukemerrick | 1 year ago | on: Cosmopolitan v3.5

Looks like there is both an ARM and x86 version according to the docs. Probably need two different binaries, but you still get cross-OS for each architecture.

lukemerrick | 1 year ago | on: Show HN: Huewords, a Word and Logic Puzzle

Just spent two hours on this. Would happily spend the time again (and probably will soon!). Awesome work.

lukemerrick | 1 year ago | on: New attention mechanisms that outperform standard multi-head attention

Just skimmed so far and didn't see any reference to the Simplified Transformer block of https://arxiv.org/abs/2311.01906 (and it seems they also left out grouped query attention, too, as pointed out by another comment).

While lazy me wants them to explain how their approach compares to these approaches, it looks like their exposition is pretty clear (quite nice for a preprint!) and I guess I'll just have to actually read the paper for real to see for myself.

Given how well I've seen Simplified Transformer blocks work in my own playground experiments, I would not at all be surprised if other related tweaks work out well even on larger scale models. I wish some of the other commenters here had a bit more curiosity and/or empathy for these two authors who did a fine job coming up with and initially testing out some worthwhile ideas.

lukemerrick | 2 years ago | on: The Pile: An 800GB dataset of diverse text for language modeling (2020)

Related to the idea of "no one trains on data they own, they shouldn't own the resulting model": since big public datasets like The Pile have CC-SA items in them, is anyone considering bringing the argument that model weights are derivative work that must be "shared alike"?

lukemerrick | 2 years ago | on: JupyterLab 4.0

As you guessed, the history tracking is one of the killer features. Imagine it being super easy to edit the history of a REPL session (delete, reorder, merge, and edit contents of each command) and rerun... That's a notebook! Notebooks also allow for markdown input and rich HTML output (which is killer for plotting) making it possible to polish your REPL history into a document you'd actually want to share with a colleague to explain something like a data analysis workflow.

I actually started in notebooks and then learned to love the REPL as a simplified "scratchpad notebook." I'd say in many ways notebooks are an improvement that cater heavily to REPL-lovers, but that for some quick tasks, the extra complexity isn't always worth it.

lukemerrick | 2 years ago | on: Building a Lox Interpreter in Julia

Thank you for the tip! Getting to implement a different language from Lox feels like a nice way to cut down on the tedium.

lukemerrick | 2 years ago | on: Building a Lox Interpreter in Julia

Wow, the whole concept of a peephole optimizer is a bit mind blowing to me. I'm appreciating all the reasons to power through to writing a bytecode VM as the next step.

I'm not sure how far down the compiler I actually will enjoy going vs. exploring ideas around type systems, linters, etc. up near the AST level, but if I do venture down this advice will certainly come in handy!

lukemerrick | 2 years ago | on: Building a Lox Interpreter in Julia

Thank you for the suggestions! I was actually just searching about MLIR today after reading some Julia language community discussions on the new Mojo language that uses MLIR.

lukemerrick | 2 years ago | on: Building a Lox Interpreter in Julia

I actually played with Unityper.jl and SumTypes.jl, but my conclusion was that if I was going to depart from dispatch on Julia types in my code, I might as well just stick to an untyped tree, since either way I'd have to have a single `evaluate` function for interpreting any kind of node.

Reconsidering now, it seems that there might be benefits beyond type dispatch to having a typed syntax tree, so maybe I'll give that a shot as a next step!

lukemerrick | 2 years ago | on: Building a Lox Interpreter in Julia

In general, if you keep the source code positions of every nontrivial token, and you keep the raw source code, then yeah you can print out those pretty specific point error messages regardless of whether you keep your trees lossless. Also, if you want to include filename in your messages (perhaps because unlike Lox your language supports imports), then you'll need more than just lossless trees to store the necessary information.

I'm not sure exactly how Rust and rust-analyzer keep track of the info necessary to their excellent error messages and diagnostics, but I wouldn't be surprised if pinpoint messages were not the primary motivation for rust-analyzer to do lossless parsing.

lukemerrick | 2 years ago | on: Building a Lox Interpreter in Julia

Disclaimer: I wrote this blog post. If this were an "Ask HN" post, though, the question would be "What next after reading Crafting Interpreters?" I have only done the tree-walk interpreter half of the book, but I'm already excited to move beyond Lox, and I'm curious to hear what others have done in this situation.

lukemerrick | 3 years ago | on: Show HN: Yaksha Programming Language

I'm late to the party, but I want to say thank you for sharing this. It's inspiring to look at how much you've built and (hopefully) enjoyed the process of building! I'm loving everything -- your site, your language design, your docs, your builtin libraries, your dev tools. Beyond impressive. People like you are the ones who make HN one of my best places on the internet.

For context on where I'm coming from, about two weeks ago I picked up Crafting Interpreters [1] for fun. I'm finding your clear-yet-concise Compiler internals [2] to be particularly compelling reading, and jumping back and forth between those "how this all works" docs and the live example of this language you actually built do a WASM-compiled tree-blowing-in-the-wind animation is just... just wow. So freaking cool!

I also enjoyed reading the comment thread that inspired you to start on Yaksha and seeing how this project has a wholesome start as inspiration-by-programming-hero. I hope you recognize that a few years later you've now ascended from inspiree to inspirer. I also hope you're still having tons of fun building out Yaksha!

[1] https://www.craftinginterpreters.com/

[2] https://yakshalang.github.io/documentation.html#compiler-int...

lukemerrick | 3 years ago | on: Optimizing utility-scale battery storage dispatch

Disclaimer: I'm the author.

I was torn whether to share my own post, but I figured the HN crowd might include a few others who will also really geek out about this topic and appreciate it. It's mathematical optimization and forecasting used to guide giant batteries hooked up to the electrical grid, after all.

There is some accompanying code I got to share publicly, too, if you want to run this yourself [1]. While I'm at it, I'll also mention some papers for anyone who wants a true deep dive [2, 3].

[1] https://gist.github.com/lukemerrick/4e1f9921a19ec97f7b949909...

[2] Linear Programming for battery optimization -- https://www.osti.gov/servlets/purl/1244909

[3] Mixed-Integer Linear Programming for battery optimization [PDF] -- https://www.sandia.gov/ess-ssl/wp-content/uploads/2018/08/20...

lukemerrick | 3 years ago | on: Show HN: Investorsexchange.jl – parse trade-level stock market data in Julia

HN is special, and you all here in this comment tree are the best of the best -- love the fact that everyone is having fun with this in such a civil way and historically knowledgeable way.

For anyone who wants the naming backstory, InvestorsExchange.jl was originally IEXTools.jl, but Julia's package registration automatic name checks didn't like it ("Name does not meet all of the following: starts with an uppercase letter, ASCII alphanumerics only, not all letters are uppercase. Name is not at least five characters long") [1]. So to Wikipedia I went to find the non-acronym name of the IEX exchange, which is "Investors Exchange" [2]. Thank you all for helping me understand why IEX goes by IEX in all of their branding.

[1] https://github.com/JuliaRegistries/General/pull/27989

[2] https://en.wikipedia.org/wiki/IEX

lukemerrick | 3 years ago | on: Show HN: Investorsexchange.jl – parse trade-level stock market data in Julia

Ah, this is a nuanced point I totally left off the README. Each raw file is ~5GB, but the raw files are a dump of network traffic from the firehose feed that tracks not just trades, but also updates to orders that do not result in trades. If you skip all of the algorithmic bots' constant updates to their bid/offer spreads and look at just the trades that clear, you can store years of raw trades in under 100GB.

lukemerrick | 6 years ago | on: Why are we using black box models in AI when we don’t need to? (2019)

This is an incredibly salient point. Others have also pointed out that there are numerous applications in which black box models appear to offer significantly greater accuracy than interpretable models, bolstering the notion that this article is a bit overstated.

However, in this article and elsewhere Professor Rudin has cited compelling evidence of cases in which black box models have been demonstrated to be no more accurate than interpretable alternatives. I feel this fairly justifies the question in the title of the article. For example, based upon available evidence, it appears reasonable that some onus should lie on the creators and buyers of COMPAS (a proprietary black box recidivism model) to demonstrate COMPAS actually is more accurate than an interpretable baseline. While it may not be the case, as the article seems to suggest, that in all modeling cases there is an interpretable alternative with comparable accuracy, in cases which there is, there doesn't seem to be any justification for using a black box model.

On the matter of "human-style" interpretability, we are brought to the difference between "interpretability" and "explainability." Humans have a complex capacity for constructing explanations for the thoughts and actions of ourselves and others (among other things). As OP points out, a lot of famous psychological experiments by Kahneman and others have shown how much of our reasoning appears to be post-hoc, often biased, and often inaccurate (in other words, human explanations are not actually true transparent interpretations of our thoughts and actions). However, we humans do have a powerful capacity to evaluate and challenge the explanations presented by others, and we are able to reject bad explanations. For those interested, a great book on this topic is "The Enigma of Reason" by Mercier and Sperber (https://www.hup.harvard.edu/catalog.php?isbn=9780674237827), but the gist here is that we must understand that while explanations are not the same as transparent interpretability, they are still useful.

I would conjecture that at some level of complexity (which some predictive tasks like pixel-to-label image recognition seem to exhibit), true end-to-end interpretability is not possible -- the best we can do is to construct an explanation. However, two very important points should be observed when considering this conjecture: 1. (Professor Rudin's point in the article) In cases which are not too complex for interpretable models to achieve comparable accuracy to black-box models, we can and should use them, as they offer super-human transparency at no cost in accuracy. 2. Constructing no explanations (or bad explanations) is not the same as reaching the same level of semi-transparency that humans offer. If we want to use human interpretability as a benchmark, black box models with no explanations are not up to par.