jaybaxter's comments

jaybaxter | 1 year ago | on: Ask HN: Who is hiring? (June 2024)

X | Machine Learning Engineer - Community Notes | SF, SJ, Seattle, LA, NYC | All Levels

The role: build scalable production machine learning systems and infrastructure, and improve our open source algorithm: https://github.com/twitter/communitynotes

Community Notes has been discussed here on HN a number of times e.g. here: https://news.ycombinator.com/item?id=37292041

Apply here: https://twitter.wd5.myworkdayjobs.com/X/job/San-Francisco-CA...

There are many other positions at the company too, see: https://careers.x.com/en

jaybaxter | 2 years ago | on: California bill would ban all plastic shopping bags at grocery stores

I wouldn't say they were being childishly subversive -- it is very unclear that paper bags are better than plastic bags for the environment.

"Manufacturing a paper bag takes about four times as much energy as it takes to produce a plastic bag"

"Studies have shown that, for a paper bag to neutralize its environmental impact compared to plastic, it would have to be used anywhere from three to 43 times."

https://education.nationalgeographic.org/resource/sustainabl...

jaybaxter | 2 years ago | on: What do I think about Community Notes?

The code is open source: https://github.com/twitter/communitynotes

The notes and rating data is released to the public every day: https://twitter.com/i/communitynotes/download-data

Feel free to run the matrix factorization code on the data, and then try to interpret the resulting latent dimension it finds for yourself! And also, you can read the code to verify that it really is running a matrix factorization rather than hardcoding a particular left/right split.

jaybaxter | 3 years ago | on: White House deletes tweet after Twitter adds 'context' note

Hi! Birdwatch ML Engineer here-- these context notes are from Birdwatch. They are written and rated by users, and the notes are only added as context to Tweets if they are rated highly enough by multiple raters who have tended to disagree in the past. The core algorithm is open source as well as all of the data, and there is lots of public documentation about it too:

https://twitter.github.io/birdwatch/

https://twitter.com/birdwatch/status/1585794012052611076?s=2...

https://arxiv.org/pdf/2210.15723.pdf

jaybaxter | 3 years ago | on: Coffee drinking linked to lower mortality risk, new study finds

>but you’ve got to think about the idea that the folks who conducted this study know as well how hard science is, and for them to publish this anyway means they (and their reviewers) did a pretty good job dealing with this issue (and many others).

What makes you say that about this paper specifically? Certainly in general there are career incentives for authors to publish papers which are not very conclusive, and we know that relatively useless observational studies do get published.

jaybaxter | 11 years ago | on: Online Algorithms in High-Frequency Trading

Could you please link to the technical reviews you mentioned? I would be very interested to read them because as an outsider, Flash Boys seemed overly simplistic, but I haven't been able to figure out what the real story is yet.

jaybaxter | 12 years ago | on: Major Changes in SAT Announced by College Board

Test questions like the SAT are designed to have wrong answers that look like they are right at first glance, so people who think they have a better than average chance by guessing often don't because they were baited by one of the wrong answers without thinking it all the way through.

jaybaxter | 12 years ago | on: Major Changes in SAT Announced by College Board

When there is a penalty for wrong guesses, students who have a pretty good idea that they know the answer, but aren't certain, must waste time determining whether it is in their favor to answer the question.

Evaluating whether it is worth it to take a guess is a test-taking skill, and the SAT is trying to shift away from test-taking skills.

jaybaxter | 12 years ago | on: BayesDB - a Bayesian database table

Once you have the VM up, you can try running demo scripts, located at ~/bayesdb/examples//.py after you've checked out and pulled master from ~/bayesdb and ~/crosscat.

Yes, the install process is a pain in the current release (sorry!), but the next release (almost ready) will be much more friendly and granular to install.

jaybaxter | 12 years ago | on: BayesDB - a Bayesian database table

Hi, sorry for the trouble, and thanks for the bug report! If you git checkout master and git pull both the crosscat and BayesDB repos (at ~/crosscat and ~/bayesdb on the VM), this issue will be fixed. We are working on pushing a new VM where the proper commits will be checked out already.

jaybaxter | 12 years ago | on: BayesDB - a Bayesian database table

The answer is both yes and no. In principle, longitudinal/panel data can be run through BayesDB, though BayesDB would have to infer the temporal structure. Also, we've toyed a little with forecasting by making a table where each row is a sliding window.

That said, BayesDB is really about the classic multivariate statistics setting: each row is a sample from some population. We think that a streaming Bayesian database, that models sequences of timestamped UPDATEs to a DB (and with FORECAST in addition to INFER) is an interesting, distinct project that we've done a little work on.

Contact us if you're interested in this kind of data and we'd be happy to talk more.

jaybaxter | 12 years ago | on: BayesDB - a Bayesian database table

You are correct that each variable is considered conditionally independent of the other variables given the cluster assignment. CrossCat learns additive models and correlations between continuous variables by using many clusters (the clusters don't necessarily have meaningful real-world interpretations).

jaybaxter | 12 years ago | on: BayesDB - a Bayesian database table

Ah, good question. Currently, it's implemented to only use CrossCat for predictions.

However, the great thing about the Bayesian Query Language (BQL, BayesDB's extension of SQL) is that it can be implemented by any joint density estimator. So, you could implement BayesDB with Bayes net structure learning, kernel density estimation, or almost anything else instead of CrossCat, if you wanted.

page 1