dspoka | 1 year ago | on: Show HN: Wordllama – Things you can do with the token embeddings of an LLM
dspoka's comments
dspoka | 2 years ago | on: The False Promise of Imitating Proprietary LLMs
However, when conducting more targeted automatic evaluations, we found that the imitation models close little to none of the large gap between LLaMA and ChatGPT. In particular, we demonstrate that imitation models improve on evaluation tasks that are heavily supported in the imitation training data. On the other hand, the models do not improve (or even decline in accuracy) on evaluation datasets for which there is little support. For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.
Just because this might not be the way to replicate the performance of ChatGPT across all tasks, it seems to work quite well on whichever tasks are in the imitation learning. That is still a big win.
Later on this also works for factual correctness. (leaving aside the argument whether this is the right approach for factuality)
For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.
dspoka | 2 years ago | on: Show HN: GPT-4-powered web searches for developers
dspoka | 3 years ago | on: Carbon offsetting is just another form of greenwashing
The example you give with mangroves is a great one which does in fact work. Pragmatically and historically most of the attempts however, have not due to mismanagement and other unseen complications.
Seeing the further comments I see the point the parent was making is more around first principles of Forrests as carbon sinks not about its implementations.
dspoka | 3 years ago | on: Carbon offsetting is just another form of greenwashing
[1] https://twitter.com/ForrestFleisch1/status/13062214459331297...
dspoka | 4 years ago | on: Show HN: Full text search on 630M US court cases
dspoka | 4 years ago | on: I do not agree with Github's use of copyrighted code as training for Copilot
dspoka | 5 years ago | on: Blackstone to Acquire Ancestry.com for $4.7B
dspoka | 5 years ago | on: The Eviction Tracking System: Monitor the number of U.S. eviction cases
dspoka | 5 years ago | on: Thousands are monitoring police scanners during the George Floyd protests
I was wondering if you could estimate what it would cost to have always on recording of all these radio conversations, cost of running this speech2text ML and cost of labeling this data.
I think having these rough estimates will make donations easier for people.
dspoka | 7 years ago | on: How I Organize My GitHub Repositories
dspoka | 7 years ago | on: PyTorch 1.0 is out
dspoka | 8 years ago | on: CometML wants to do for machine learning what GitHub did for code
dspoka | 8 years ago | on: Algorithmic decision making and the cost of fairness (2017)
here is a good resource on that: https://arxiv.org/pdf/1606.08813.pdf European Union regulations on algorithmic decision-making and a “right to explanation”
dspoka | 8 years ago | on: Algorithmic decision making and the cost of fairness (2017)
There are several reasons for this from both technical and legal perspective.
It is incredibly easy to find statistically significant correlations given just a few (more than 7) different views of the data. In general these ml models are not working with less than hundreds or thousands.
If the model learned this suppose racial bias, once, you deleting this column is not going to stop it from learning it again, and I believe some research showed that it actually can make the unfairness more severe.
from a legal standpoint a company that may or may not be infringing on rights could just say, oh we can't be because we don't have these fields in our data: which makes it harder to monitor and audit wrong doing.
most of the methods that I am familiar try to ease the effects of the learned biases as a post-processing step for the model.
dspoka | 8 years ago | on: PyTorch, a year in
for example, python 2.7/ pip/ osx/
pip install http://download.pytorch.org/whl/torch-0.3.0.post4-cp27-none-... pip install torchvision
dspoka | 8 years ago | on: The Google Brain Team – Looking Back on 2017
dspoka | 8 years ago | on: Row over AI that 'identifies gay faces'
dspoka | 8 years ago | on: Fairness in Machine Learning UC Berkeley Class
I hope this becomes one of the most important and necessary topics for ML researchers as well as ML practitioners to consider when building models. There has been some serious discussion between researchers about adding some sort of licenses to help mitigate/ add accountability for researchers knowingly building biased ai.
dspoka | 8 years ago | on: ATLAS sees first direct evidence of light-by-light scattering at high energy