MAXPOOL's comments

MAXPOOL | 1 year ago | on: Darwin Machines

If you take a birds eye view, fundamental breakthroughs don't happen that often. "Attention Is All You Need" paper also came out in 2017. It has now been 7 years without breakthrough at the same level as transformers. Breakthrough ideas can take decades before they are ready. There are many false starts and dead ends.

Money and popularity are orthogonal to pathfinding that leads to breakthroughs.

MAXPOOL | 1 year ago | on: The Engineer’s Guide to Deep Learning: Understanding the Transformer Model

There are many others that are better.

1/ The Annotated Transformer Attention is All You Need http://nlp.seas.harvard.edu/annotated-transformer/

2/ Transformers from Scratch https://e2eml.school/transformers.html

3/ Andrej Karpathy has really good series of intros: https://karpathy.ai/zero-to-hero.html Let's build GPT: from scratch, in code, spelled out. https://www.youtube.com/watch?v=kCc8FmEb1nY GPT with Andrej Karpathy: Part 1 https://medium.com/@kdwa2404/gpt-with-andrej-karpathy-part-1...

4/ 3Blue1Brown: But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning https://www.youtube.com/watch?v=wjZofJX0v4M Attention in transformers, visually explained | Chapter 6, Deep Learning https://www.youtube.com/watch?v=eMlx5fFNoYc Full 3Blue1Brown Neural Networks playlist https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_6700...

MAXPOOL | 1 year ago | on: Some geometric intuition for single-layer ReLU networks

Without looking the answer, what is your intuition about the size of the VC-dimension of ReLU networks as a function of a number of weights and layers?

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks https://arxiv.org/abs/1703.02930

MAXPOOL | 2 years ago | on: Vernor Vinge has died

That's is based on old assumption of neuron function.

Firstly, Kurzweil underestimates the number connections by order of magnitude.

Secondly, dentritic computation changes things. Individual dentrites and the dendritic tree as a whole can do multiple individual computations. logical operations low-pass filtering, coincidence detection, ... One neuronal activation is potentially thousands of operations per neuron.

Single human neuron can be equivalent of thousands of ANN's.

MAXPOOL | 2 years ago | on: Why do tree-based models still outperform deep learning on tabular data? (2022)

> deep learning architectures have been crafted to create inductive biases matching invariances and spatial dependencies of the data. Finding corresponding invariances is hard in tabular data, made of heterogeneous features, small sample sizes, extreme values

Transformers with positional encoding have embeddings are invariant to the input order. CNN's have translation invariance and can have little rotational invariance.

It's harder to find similar invariances to tabular data. Maybe applying methods from GNN's would help?

MAXPOOL | 2 years ago | on: Effect of exercise for depression: systematic review, meta analyisis

Effect of exercise for depression: systematic review and network meta-analysis of randomised controlled trials.

Conclusions Exercise is an effective treatment for depression, with walking or jogging, yoga, and strength training more effective than other exercises, particularly when intense. Yoga and strength training were well tolerated compared with other treatments. Exercise appeared equally effective for people with and without comorbidities and with different baseline levels of depression. To mitigate expectancy effects, future studies could aim to blind participants and staff. These forms of exercise could be considered alongside psychotherapy and antidepressants as core treatments for depression.

MAXPOOL | 2 years ago | on: Introduction to State Space Models (SSM)

Mamba is a new model architecture based on SSM's.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752

https://github.com/state-spaces/mamba

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model https://paperswithcode.com/paper/vision-mamba-efficient-visu...

MAXPOOL | 2 years ago | on: Making Real-World Reinforcement Learning Practical [video]

Jan 3, 2024 Lecture by Sergey Levine about progress on real-world deep RL. Covers these papers:

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion

Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing: https://sites.google.com/view/fastrlap

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions: https://qtransformer.github.io/

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators: https://rl-at-scale.github.io/

MAXPOOL | 2 years ago | on: Ask HN: Is Knuth's TAOCP worth the time and effort?

His first publication was "Potrzebie System of Weights and Measures" for Mad Magazine in June 1967 when he was 19-years old.

https://silezukuk.tumblr.com/image/616657913

MAXPOOL | 2 years ago | on: Ask HN: What's the most compelling AI prompt result you've seen?

Two best:

CHESS IS A FUN SPORT, WHEN PLAYED WITH SHOT GUNS

COWS FLY LIKE CLOUDS BUT THEY ARE NEVER COMPLETELY SUCCESSFUL.

These are from MegaHal that entered 1998 Loebner Prize Contest. MegaHal was able to produce mind-blowing insightful sayings but most were just bs.

It seems that creativity is easy for computers. Just push randomness through some generative algorithm. Curating and selecting the best output makes all the difference. The ability to select, critique, and understand what is generated and what the meaning is is much harder.

MAXPOOL | 2 years ago | on: Ask HN: What is Q* (Q star) at OpenAI and how does it threaten humanity

Speculating from the name only.

Q* might be name derived from Q-learning and A* search algorithm.

In that case it would be informed best best-first search using reinforcement learning.

MAXPOOL | 2 years ago | on: Show HN: Convert any screenshot into clean HTML code using GPT Vision (OSS tool)

Being a universal function approximator means that a multi-layer NN can approximate any bounded continuous function to an arbitrary degree of accuracy. But it says nothing about learnability and the structure required may be unrealistically large.

The learning algorithm used: Backpropagation with Stochastic Gradient Descent is not the universal learner. It's not guaranteed to find the global minimum.

MAXPOOL | 2 years ago | on: Copy is all you need

What about LLM reasoning ability?

Faith and Fate: Limits of Transformers on Compositionality https://arxiv.org/abs/2305.18654

Transformers solve compositional reasoning tasks by reducing multi-step compositional reasoning into linearized subgraph matching without problem-solving skills. They can solve problems when they have reasoning graphs in the memory.

MAXPOOL | 2 years ago | on: Modern language models refute Chomsky’s approach to language

> some evolved language structures in the brain.

That's Chomsky's argument. A small set of constraints for organizing language.

MAXPOOL | 2 years ago | on: Modern language models refute Chomsky’s approach to language

I agree. Question has not been settled.

20 year old human has

* heard ~220 million words, talked 50 million words.

* read ~10 million words.

* experienced 420 million seconds of wakeful interaction with the environment (can be used to estimate the limit to conscious decisions, or number of distinct 'epochs' we experience)

From a machine learning perspective human life is surprisingly small set of inputs and actions, just a blip of existence.

MAXPOOL | 3 years ago | on: Chess Investigation Finds U.S. Grandmaster ‘Likely Cheated’ More Than 100 Times

The next 'move' for cheaters is to use chess computers in a way that passes 'Chess Turing Test' and makes cheating indistinguishable from normal human play under analysis.

When there is money in the game, there is incentive to cheat.

> The report says dozens of grandmasters have been caught cheating on the website, including four of the top-100 players in the world who confessed.

There are probably smart cheaters already playing who are able to evade detection.

MAXPOOL | 3 years ago | on: Paradigms of Artificial Intelligence Programming (1992)

I meant Datalog.

MAXPOOL | 3 years ago | on: Paradigms of Artificial Intelligence Programming (1992)

One of the top 5 programming books.

Old AI is today's bleeding edge computer engineering. There is an enourmous amount of free lunches for computer engineers and software startups in the old school artificial intelligence.

* modern SAT solver performance is impressive. They can solve huge problems.

* Writing a complex systems configurator with Prolog or Datalog can be like magic.

* Expert systems. There has never been so much use for them than today. Whenever you see expensive systems utilizing complex mess of "business logic" and expensive consultants, you should know there is a better way.

(I use SAT-solvers to partially initialize neural network parameters).

MAXPOOL | 3 years ago | on: AI model finds potential drug molecules a thousand times faster

What do you think after read the abstract? https://arxiv.org/abs/2202.05146

>... Existing methods are computationally expensive as they rely on heavy candidate sampling coupled with scoring, ranking, and fine-tuning steps. We challenge this paradigm with EquiBind, an SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand's bound pose and orientation. ...

MAXPOOL | 3 years ago | on: Economic incentives help explain a longstanding puzzle (Flynn effect)

>Conclusions

>The stock of human capital plays an important role in economic growth and prosperity (e.g. Bishop 1989, Toivanen and Väänänen 2013, Aghion et al. 2017). Cognitive scientists studying population trends in measured intelligence have tended to emphasise the role of factors affecting the supply of skills. Our analysis suggests that the mix of skills in a society also evolves in response to the society’s demands and suggests the value of economic reasoning in the study of population trends in measured intelligence.