soraki_soladead's comments

soraki_soladead | 5 days ago | on: NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

also, BabyLM is more of a conference track / workshop than an open-repo competition which creates a different vibe

soraki_soladead | 1 year ago | on: Towards Nyquist Learners

Alias-Free GANs? https://nvlabs.github.io/stylegan3/

soraki_soladead | 1 year ago | on: Diffusion for World Modeling

We have lossless memory for models today. That's the training data. You could consider this the offline version of a replay buffer which is also typically lossless.

The online, continuous and lossy version of this problem is more like how our memory works and still largely unsolved.

soraki_soladead | 1 year ago

You might enjoy this paper[0] which shows that recurrent position encodings recover grid cell representations and maps to path integration found in a popular model of the hippocampus. This isn't terribly surprising since RNNs have shown this before[1, 2] but its an interesting connection.

[0] https://arxiv.org/abs/2112.04035

[1] https://arxiv.org/abs/1803.07770

[2] https://www.nature.com/articles/s41586-018-0102-6

soraki_soladead | 1 year ago

Fwiw, that's SwiGLU in #3 above. Swi = Swish = silu. GLU is gated linear unit; the gate construction you describe.

soraki_soladead | 2 years ago | on: Building a deep learning rig

It's not always about cost. Sometimes the ergonomics of a local machine are nicer.

soraki_soladead | 2 years ago | on: Cyclist hit by driverless Waymo car in San Francisco, police say

Agree that's not a great look for the supervisor.

Cyclists have a bad rep in SF because many (not all) ride quite dangerously. It's a common sight to see cyclists running four-way stop signs and lights without even yielding. I live adjacent to a four-way stop and there's an incident where a cyclist fails to yield nearly hourly.

Meanwhile, Waymo has millions of incident-free miles and of all the self-driving car companies generally takes safety seriously, even if they will act to protect their interests here.

Until more evidence comes out I'll be taking Waymo's side here. I want safer vehicles and Waymo is currently the best bet.

soraki_soladead | 2 years ago | on: Cyclist hit by driverless Waymo car in San Francisco, police say

Cyclists do this all the time in SF. Afaik an "Idaho stop" is not legal here, despite it being common and often unsafe for obvious reasons.

soraki_soladead | 2 years ago

FLOPs by perplexity by samples is an interesting way to compare this family of models.

soraki_soladead | 2 years ago

https://github.com/cozodb/pycozo/blob/main/pycozo/test_build...

Here's the python version of what I think you're looking for. Shouldn't be too difficult to port to rust.

soraki_soladead | 3 years ago | on: Build full “product skills” and you'll probably be fine

Sure. A few below but far from exhaustive:

- https://arxiv.org/abs/1909.07528 - https://arxiv.org/abs/2212.10403 - https://arxiv.org/abs/2201.11903 - https://arxiv.org/abs/2210.13382

There are also literally hundreds of articles and tweet threads about it. Moreover, as I said, you can test many of my claims above directly using readily available LLMs.

GP has a much harder defense. They have to prove that despite all of these capabilities that LLMs are not intelligent. That the mechanisms by which humans possess intelligence is fundamentally distinct from a computer’s ability to exhibit the same behaviors so much that it invalidates any claim that LLMs exhibit intelligence.

Intelligence: “the ability to acquire and apply knowledge and skills”. It is difficult to argue that modern LLMs cannot do this. At best we can quibble about the meaning of individual words like “acquire”, “apply”, “knowledge”, and “skills”. That’s a significant goal post shift from even a year ago.

soraki_soladead | 3 years ago | on: Build full “product skills” and you'll probably be fine

> They are not intelligent.

Citation needed. Numerous actual citations have demonstrated hallmarks of intelligence for years. Tool use. Comprehension and generalization of grammars. World modeling with spatial reasoning through language. Many of these are readily testable in GPT. Many people have… and I dare say that LLMs reading comprehension, problem solving and reasoning skills do surpass that of many actual humans.

> They model intelligent behavior

It is not at all clear that modeling intelligent behavior is any different from intelligence. This is an open question. If you have an insight there I would love to read it.

> They don't know or care what language is: they learn whatever patterns are present in text, language or not.

This is identical to how children learn language prior to schooling. They listen and form connections based on the cooccurrence of words. They’re brains are working overtime to predict what sounds follow next. Before anyone says “not from text!” please don’t forget people who can’t see or hear. Before anyone says, “not only from language!” multimodal LLMs are here now too!

I’m not saying they’re perfect or even possess the same type of intelligence. Obviously the mechanisms are different. However far too many people in this debate are either unaware of their capabilities or hold on too strongly to human exceptionalism.

> There is this religious cult surrounding LLMs that bases all of its expectations of what an LLM can become on a personification of the LLM.

Anthropomorphizing LLMs is indeed an issue but is separate from a debate on their intelligence. I would argue there’s a very different religious cult very vocally proclaiming “that’s not really intelligence!” as these models sprint past goal posts.

soraki_soladead | 3 years ago | on: Understanding Large Language Models – A Transformative Reading List

Only some model architectures continue to get better as you pump in more data. Transformers and their variants have this property more so than prior architectures.

soraki_soladead | 3 years ago | on: TensorFlow Datasets

To each their own. I like that TF separates them since they are separate tasks and combining them is only one use case. At the end of the day we should just use what works best. The ML landscape is far from settled.

soraki_soladead | 3 years ago | on: TensorFlow Datasets

UX preferences vary. Imo, hf is too verbose and their pages try to cram in too much information with poor information hierarchy. For example:

https://huggingface.co/datasets/glue

https://www.tensorflow.org/datasets/catalog/glue

soraki_soladead | 3 years ago | on: TensorFlow Datasets

Quantity of datasets doesn’t seem like the right metric. The library just needs the datasets you care about and both libraries have the popular ones. What’s more important is integration and if you’re training custom TF models then tfds will generally integrate more smoothly than huggingface.

soraki_soladead | 3 years ago | on: S.F. police announce dozens of arrests in crackdown on retail theft

According to this the SOTA program requires the income to make rental payments beyond the one year: https://www.nyc.gov/site/hra/help/sota.page

“SOTA is only provided to households whom DSS has determined will likely have the future ability to pay the rent once they no longer have the SOTA grant to cover their rent.”

That sounds like a very high bar for people in the situation of needing rent coverage and especially if they have mental illness and/or drug addiction. Note that busing people to another city appears to be a separate program.

soraki_soladead | 3 years ago | on: S.F. police announce dozens of arrests in crackdown on retail theft

As mentioned, many of these are very common in other large cities (EDIT: fine, in the US). I'm going to call out this one because its frequently mentioned and seems harder to tackle than the other problems:

> Mentally ill people in high-traffic areas that openly use drugs and defecate on the sidewalk.

What is the solution to this? Round them up and put them in jail? Bus them to another city? Forcibly enroll them at a mental health facility? Improving housing costs somehow? Free housing for the homeless? Maybe walk-in drug clinics?

Some of these solutions sound inhumane. Others appear to be politically impossible at the scale needed. So what's the solution and why are the people who live there against it?

soraki_soladead | 3 years ago

Second hand may not have been the best phrasing on my part, I admit. What I mean is that the model only has textual knowledge in its dataset to infer what “basketball” means. It’s never seen/heard a game, even if through someone else’s eyes/ears. It has never held and felt a basketball. Even visual language models today only get a single photo right now. It's an open question how much that matters and if the model can convey that experience entirely through language.

There are entire bodies of literature addressing things the current generation of available LLMs are missing: online and continual learning, retrieval from short-term memory, the experience from watching all YouTube videos, etc.

I agree that human exceptionalism and vitalism are common in these discussions but we can still discuss model deficiencies from a research and application point of view without assuming a religious argument.

soraki_soladead | 3 years ago

I didn’t read it as being a religious take. They appear to be referring more to embodiment (edit: alternatively, online/continual learning) which these models do not posses. When we start persisting recurrent states beyond the current session we might be able to consider that limited embodiment. Even still the models will have no direct experience interacting with the subjects of their conservations. Its all second hand from the training data.