lstmemery's comments

lstmemery | 3 years ago | on: PyScript: Run Python in your HTML

I think it's primarily meant as a learning tool. Here's the TL;DR from their GitHub.[1]

PyScript is a Pythonic alternative to Scratch, JSFiddle or other "easy to use" programming frameworks, making the web a friendly, hackable, place where anyone can author interesting and interactive applications.

https://github.com/pyscript/pyscript

lstmemery | 4 years ago | on: LastPass appears to be holding users' passwords hostage

I'd like to recommend Aegis Authenticator, which is FOSS. It also encrypts tokens at rest, has password protection and the ability to export tokens.

Lastpass Authenticator does not do that, so I spent an hour yesterday manually resetting all my 2FA.

lstmemery | 4 years ago | on: R, OpenMP, MKL, Disaster

I had a similar problem in a prediction pipeline a few years back. If I remember correctly, someone updated a R package to the next minor version. The package was to read an obscure file format. The fix installed a new C++ library. That C++ library somehow interacted with a second R package (using a specialized type of linear model) when compiled at source and all the results coming out of our package were subtly wrong but only with large files.

It turns out the way the second R package would determine the required precision of floats in sparse arrays was based on the compiled linear algebra libraries available. It took a week for us to debug and ultimately it was easier for us to just rewrite the whole thing in Python.

Renv has made things easier but I don't think packrat/renv allows you to lock C/C++ libraries as well as R ones.

lstmemery | 4 years ago | on: OpenAI Codex

I don't think the bitter lesson is applies to ASTs.

From the Bitter Lesson:

"Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better."

Those models are taking advantage of inductive biases. Every model has them, including the massive language models. They are not the same as engineered features (such as SIFTs) or heuristics.

Using the AST is just another way of looking at the code already in your dataset. For the model to understand what it is writing, it needs to map the text sequences map to ASTs anyways. It can attempt to learn this, but the 12B model still makes illegal Python code so it clearly hasn't.

lstmemery | 4 years ago | on: OpenAI Codex

You need to scale the amount of data to take advantage of the increase in parameters. I'm not sure where we would find another 100 GitHubs worth of data.

lstmemery | 4 years ago | on: OpenAI Codex

I have to disagree with you here. In the Codex paper[1], they have two datasets that Codex got correct about 3% of the time. These are interview and code competition questions. From the paper:

"Indeed, a strong student who completes an introductory computer science course is expected to be able to solve a larger fraction of problems than Codex-12B."

This suggests to me that Codex really doesn't understand anything about the language beyond syntax. I have no doubt that future systems will improve on this benchmark, but they will likely take advantage of the AST and could use unit tests in a RL-like reward function.

[1] https://arxiv.org/abs/2107.03374

lstmemery | 4 years ago | on: MIT and Harvard agree to transfer edX to ed-tech firm 2U

This really feels like the end of an era. I got my start going through the MITx computer science and probability courses. I wouldn't be a data scientist today if I didn't have those resources available.

I now understand how to self-learn difficult subjects with textbooks and online lectures but I really appreciated MITx's commitment to making rigorous courses freely available.

lstmemery | 5 years ago | on: The Unstoppable Momentum of Outdated Science

I'm not sure if I understand this graph. The cone seems to represent "no climate policy". Why should we be surprised that we are currently trending out of the "no climate policy" cone since we are implementing climate policy?

Taking a second look, that plot is only CO2 from energy and the the plot y scale has a discontinuity. If we need to get to net 0 emissions and the rate of CO2 production is still increasing that would suggest we are still a far ways off from the goal.

I just checked and the IPCC 5th assessment was made in 2014. The 6th assessment is scheduled for 2022. Does anyone know if the baselines get updated with each IPCC assessment?

lstmemery | 5 years ago | on: Won’t Subscribe

Many libraries have access to newspaper archives and it's hard to compete with free!
page 1