dspoka's comments

dspoka | 2 years ago | on: The False Promise of Imitating Proprietary LLMs

Sensational title that misrepresents the message in paper.

However, when conducting more targeted automatic evaluations, we found that the imitation models close little to none of the large gap between LLaMA and ChatGPT. In particular, we demonstrate that imitation models improve on evaluation tasks that are heavily supported in the imitation training data. On the other hand, the models do not improve (or even decline in accuracy) on evaluation datasets for which there is little support. For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.

Just because this might not be the way to replicate the performance of ChatGPT across all tasks, it seems to work quite well on whichever tasks are in the imitation learning. That is still a big win.

Later on this also works for factual correctness. (leaving aside the argument whether this is the right approach for factuality)

For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.

dspoka | 3 years ago | on: Carbon offsetting is just another form of greenwashing

My main qualm with the parent comment was this in particular, "Number one, is that forests work as long term carbon storage and sinks." Even with the surrounding context it sounded like this strategy will just "work".

The example you give with mangroves is a great one which does in fact work. Pragmatically and historically most of the attempts however, have not due to mismanagement and other unseen complications.

Seeing the further comments I see the point the parent was making is more around first principles of Forrests as carbon sinks not about its implementations.

dspoka | 4 years ago | on: I do not agree with Github's use of copyrighted code as training for Copilot

I think the reaction from software devs on how Copilot's uses their code for ML is interesting in that all the ML companies have been doing this with all other forms of produced content: texts, posts, messages, photo captions, etc. And most likely even less care went into adhering to laws or ethics. Yes code has licenses and thus more distinct legal ramifications but on the other side are people who don't really understand that every time they interact with software or produce some content, everything is gathered and harnessed to power all these companies.

dspoka | 5 years ago | on: Thousands are monitoring police scanners during the George Floyd protests

Great to see you working on this!

I was wondering if you could estimate what it would cost to have always on recording of all these radio conversations, cost of running this speech2text ML and cost of labeling this data.

I think having these rough estimates will make donations easier for people.

dspoka | 7 years ago | on: PyTorch 1.0 is out

Is there some sort of pytorch 1.0 migration guide or does anyone know if there is any breaking from .41 to 1.0 ?

dspoka | 8 years ago | on: Algorithmic decision making and the cost of fairness (2017)

So this idea seems intuitive at first but turns out to be one of the worst things to treat unfairness.

There are several reasons for this from both technical and legal perspective.

It is incredibly easy to find statistically significant correlations given just a few (more than 7) different views of the data. In general these ml models are not working with less than hundreds or thousands.

If the model learned this suppose racial bias, once, you deleting this column is not going to stop it from learning it again, and I believe some research showed that it actually can make the unfairness more severe.

from a legal standpoint a company that may or may not be infringing on rights could just say, oh we can't be because we don't have these fields in our data: which makes it harder to monitor and audit wrong doing.

most of the methods that I am familiar try to ease the effects of the learned biases as a post-processing step for the model.

dspoka | 8 years ago | on: The Google Brain Team – Looking Back on 2017

The most promising area here to me seem like automl. The promise of the new machine learning was that we get to move away from tedious feature engineering and everything will work and be simple. It may have become simpler but training/debugging new DL models is still painful causing the focus to move to extensive hyperparameter search. automl may become the next step in abstraction, where we design single models/algorithms that are able to build viable networks for many tasks/purposes.

dspoka | 8 years ago | on: Fairness in Machine Learning UC Berkeley Class

This seems like it's going to be a great class and topic taught my Moritz Hardt, a researcher at Google who has done research on several topics of fairness such as analysis of demographic parity.

I hope this becomes one of the most important and necessary topics for ML researchers as well as ML practitioners to consider when building models. There has been some serious discussion between researchers about adding some sort of licenses to help mitigate/ add accountability for researchers knowingly building biased ai.

page 1