top | item 36385809

Neural networks in the 1990s

131 points| jrott | 2 years ago |twitter.com

84 comments

order

bvan|2 years ago

Yikes, I’m old. There was a lot of NN work and a lot of books available on NN’s back in the mid and late 90’s. ‘Soft computing’ was the all-encompassing term for NN, genetic algorithms, AI, expert systems, fuzzy logic, ALife and all sorts of nascent computational areas back then. I still have a bunch of issues to the monthly AI Expert magazine one could buy at a decent magazine stand. Small data-sets were definitely a limiting factor as well as limited computer power. I remember certain applied fields did embrace NN’s early on, like some civil engineers and hydrologists, who were finding some use for them. At the U of Toronto, I considered doing a PhD with a biologist who was using them to investigate vision (and got help from Hinton). Physiology was one area where you could generate “long” time-series in a relatively short period of time. Those were still the days when Intel 286/386/486 and lowly Pentium machines were still common currency. Computer scientists at the time didn’t yet have clear break-through commercial applications which would have attracted crazy funding. A lot of theory, little real actions.

p_l|2 years ago

Let's not forget that especially early 1990s are still in shock from AI Winter and there's essentially no funding.

m-i-l|2 years ago

>"Small data-sets were definitely a limiting factor as well as limited computer power."

Not just small data-sets and limited computer power, but also very few libraries to help you out - although you could download something like xerion from ftp.cs.toronto.edu and join their email list, it was generally a case of retyping examples or implementing algorithms from printed textbooks. And it was all in C, presumably for performance reasons, while most of the symbolic AI folks came from Lisp or Prolog backgrounds.

rm999|2 years ago

While my experience is not from the 90s, I think I can speak to some of why this is. For some context, I first got into neural networks in the early 2000s during my undergrad research, and my first job (mid 2000s) was at an early pioneer that developed their V1 neural network models in the 90s (there is a good chance models I evolved from those V1 models influenced decisions that impacted you, however small).

* First off, there was no major issue with computation. Adding more units or more layers isn't that much more expensive. Vanishing gradients and poor regulation were a challenge and meant that increasing network size rarely improved performance empirically. This was a well known challenge up until the mid/later 2000s.

* There was a major 'AI winter' going on in the 90s after neural networks failed to live up to their hype in the 80s. Computer vision and NLP researchers - fields that have most famously recently been benefiting from huge neural networks - largely abandoned neural networks in the 90s. My undergrad PI at a computer vision lab told me in no uncertain terms he had no interest in neural networks, but was happy to support my interest in them. My grad school advisors had similar takes.

* A lot of the problems that did benefit from neural networks in the 90s/early 2000s just needed a non-linear model, but did not need huge neural networks to do well. You can very roughly consider the first layer of a 2-layer neural network to be a series of classifiers, each tackling a different aspect of the problem (e.g. the first neuron of a spam model may activate if you have never received an email from the sender, the second if the sender is tagged as spam a lot, etc). These kinds of problems didn't need deep, large networks, and 10-50 neuron 2-layer networks were often more than enough to fully capture the complexity of the problem. Nowadays many practitioners would throw a GBM at problems like that and can get away with O(100) shallow trees, which isn't very different from what the small neural networks were doing back then.

Combined, what this means from a rough perspective, is that the researchers who really could have used larger neural networks abandoned them, and almost everyone else was fine with the small networks that were readily available. The recent surge in AI is being fueled by smarter approaches and more computation, but arguably much more importantly from a ton more data that the internet made available. That last point is the real story IMO.

low_tech_love|2 years ago

The funny thing is that the authors of the paper he linked actually answer his question in the first paragraph, when they say that the input dataset needs to be significantly larger than the number of weights to achieve good generalisation, but there is usually not enough data available.

MilStdJunkie|2 years ago

Data, data, data, data. 1990s don't have wikipedia, Youtube, megapixel cameras every which where, every single adult human hooked up to a sensor package 24 hours a day, and who knows what else. I know as a 1990s guy I would never have imagined the amount of data we would eventually all throw up into the ether even ten years later, to say nothing of today. Without that corpus . .

reverius42|2 years ago

And none of those examples except Wikipedia were used to train the various LLMs. I wonder how much better multi-modal models are going to get if they start incorporating the 24/7 sensor data from billions of people.

moomoo11|2 years ago

encyclopedia Britannia existed. I came to USA in late 90s and my school had the CD set.

signa11|2 years ago

gpus don't forget the gpus ! compute was too slow for the task at hand.

robg|2 years ago

Highly recommend the exercises in Rumelhart and McClelland - Parallel Distributed Processing: Explorations in the Microstructure of Cognition from 1986-1987 (two volumes)

https://direct.mit.edu/books/book/4424/Parallel-Distributed-...

watersb|2 years ago

I was studying computer science and AI in 1987-1990; I didn't know it was the deepest, darkest pit of AI research despair.

I found the two Rumelhart & McClelland books, just a single copy on the shelf at Cody's Books, soon after publication. I worked through the examples, and was immediately convinced that this low-level approach was a way forward.

For some reason, none of the stressed out Comp Sci professors wanted to listen to a weirdo undergraduate, a lousy student.

I'm glad I was there at a reboot of AI, but my timing was lousy.

dunefox|2 years ago

Does it hold up for today?

radq|2 years ago

We were missing two architecture patterns that were needed to get deeper nets to converge: residual nets [1] which solved gradient propagation, and batch normalization [2] which solved initialization.

[1] Residual nets (2015): https://arxiv.org/abs/1512.03385

[2] Batch normalization (2015): https://arxiv.org/abs/1502.03167

sigmoid10|2 years ago

Also quasi-linear activation functions (prevent vanishing gradients), tons of regularisation (e.g convolutions) and more adaptive gradient descent (faster convergence). I've still met people in the early 2010s who tried to make neural networks work using only a few dozen units. Academia is pretty slow. What people also forget is that libraries like pytorch or tensorflow simply didn't exist. I wrote my own neural network stacks complete with backpropagation from scratch in c++ back then.

hzay|2 years ago

Yes, but the tweet is talking about single layer networks!

arketyp|2 years ago

AlexNet predated that though.

Solvency|2 years ago

Do you think Carmack, deep down, wonders why he let himself miss the boat on the LLM revolution? He spent golden years toiling away in Facebook, only to finally announce he was quitting to focus on AGI... only for the world to be taken by storm by transformers, GPT, Midjourney, etc.

If anyone could have been at the forefront of this wave, it could've been him.

And now the landscape has utterly changed and no one is even convinced they need "AGI". Just a continually refined LLM hooked up to tools and other endpoints.

jjtheblunt|2 years ago

> If anyone could have been at the forefront of this wave, it could've been him.

Why does DOOM and clever programming on a NeXT imply what you assert?

lyu07282|2 years ago

I sometimes wonder what could've happened if he stuck to the 3d graphics space. He once was a great innovator, wolfenstein, doom then quake, he did some innovation in Rage / id Tech 5 with infinite texture streaming but it was full of technical issues. Ultimately around doom 3 / rage, it felt like id software wasn't anything special anymore, they were brought out and then he left Id.

Now the last major innovation in the space came from epic games / unreal engine.

Swizec|2 years ago

The biggest problem with AGI is definitional. How will we know when we see it?

Once that little detail gets solved, who’s to say that “refined LLM hooked up to tools and other specialized LLMs” won’t be it? Sure could be.

But it also could not be! AGI has been right around the corner my whole life and even longer. 50 years at least. Every new AI discovery is on the verge of AGI until a few years later it hits a wall. Research is hard like that.

jojobas|2 years ago

With everything Carmack achieved two things dumbfounded me: his sycophantic relationship with Jobs (who apparently almost succeeded in getting him to postpone his wedding so that he could appear on some Apple event) and that he would go near Facebook at all.

Talk about having "fuck you" money but just not willing to say "fuck you".

moomoo11|2 years ago

maybe he gets to go to Mars and set up a research facility there.

waivej|2 years ago

I got exposed to programming neural networks in the early 90s. It solved certain problems incredibly fast like the traveling salesman problem. I was tinkering with 3D graphics and fractals and map pathfinding. Though it didn’t occur to me how much more power was there.

“Data” was so much smaller then. I had a minuscule hard drive if any, no internet, 8 bit graphics but nothing photo realistic, glimpses of windows and os2, and barely a mouse. In retrospect, it was like embedded programming.

WiSaGaN|2 years ago

I believe the issue was not a lack of computational power, but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale. As Ilya Sutskever expressed, people were not convinced there was still room to increase the scale. For the status quo to shift, two things could happen: a substantial reduction in computing costs, making large-scale experiments less a matter of conviction and more a matter of course; or the emergence of individuals with the resources and conviction to undertake larger experiments.

Palomides|2 years ago

is that really true? a modern high end GPU has more computing power than the top 20 supercomputers of the year 2000 added together

a1369209993|2 years ago

> but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale.

I've also noticed this, and want to ask: who are these people? Do they not have (~80-billion-neuron) brains? (And that's neurons, with by most estimates thousands of synapses each; so you're actually talking on the order of tens to hundreds of trillions of neural network parameters before you reach parity with biological examples.)

kristopolous|2 years ago

Did you post something nearly identical to this before? I feel like I read it before.

version_five|2 years ago

I think it's more that modern automatic differentiation abstractions weren't well known to researchers. From what I remember, even in the early 2000s when I went to school, backpropagation was basically hand coded.

mlajtos|2 years ago

Yes, everything was hand coded (no autodiff) & Hinton loves Matlab.

brrrrrm|2 years ago

I doubt it was obvious scaling up would magically work. I suspect the experiments were limited for analytic simplicity rather than computational.

pavon|2 years ago

The only ML that I ever did was a single undergrad NN class around ~2001. That was a long time ago, but I vaguely remember being taught at that time that adding more nodes rarely helped, that you were just going to overfit to your dataset and have worse results on items outside the dataset, or worse end up with a completely degenerate NN - eg that best practice was to use the minimum number of nodes that would do the job.

mhh__|2 years ago

The modern slow-but-scales way of coding them also wasn't prevalent

Solvency|2 years ago

Why couldn't mathematical proofs/models have predicted or revealed this to be the case back then?

yobbo|2 years ago

To experiment with SGD and back-propagation with 4096x4096 32-bit matrices, you would need a machine with hundreds of megabytes of ram in the 90s. In terms of software, you would need to be comfortable with C/C++ or maybe Fortran to be able to experiment quickly enough to land on effective hyper parameters.

Probably too many low-probability events chained together.

But I think they discovered most of the interesting things that small networks can do? For example, TD-Gammon from 1992: https://en.wikipedia.org/wiki/TD-Gammon .

hax0ron3|2 years ago

The 1990s gamer in me gets a kick out of seeing John Carmack and Tim Sweeney talk to each other.

ttul|2 years ago

In 1999, our “computer vision” guy - a masters student - struggled mightily to recognize very simple things in a video stream from a UAV. Today, we would take this for granted. But back then, the computation was for all intents and purposes entirely non-existent. At best he was hoping to apply an edge detection kernel maybe once every two seconds and see if he could identify some lines and arcs and then hand code some logic to recognize things.

Ono-Sendai|2 years ago

What? There were pentium 2 and 3 machines back then that could certainly do more than a edge detection kernel every 2 seconds. Or do you mean on an embedded CPU?

rmnclmnt|2 years ago

Yeah good times! The other day I was browsing for the 999th time Steve Smith's book "The Scientist and Engineer's Guide to Digital Signal Processing"[1] and stumbled upon the chapter on NN[2]: I remember ready this when I was a student I could make sense of it and why it worked, but reading it 15 years later I find it is explained so clearly compared to other resources! (maybe experience is playing in my favor too)

You got a BASIC code snippet for training and inference and mos of all, there is an explicit use-case for digital filter approximation! At the time NN were treated as a tool among other ones, not a "answer-to-everything" type of thing.

I know Deep Learning opened new possibilities but a lot of time CNN/RNN/Transformers are definitely not needed: working on the data instead and using "linear" models can go really far (my 2 cents)

[1]: https://www.dspguide.com [2]: http://www.dspguide.com/ch26.htm

29athrowaway|2 years ago

In the early 90s, not only there was lower computing power but there was not that much internet connectivity, low bandwidth, no digital cameras so not that many images online, and the images the images you had were low res and low color depth. Internet giants didn't yet exist and didn't yet collect massive amounts of data.

Ono-Sendai|2 years ago

I personally made a quake 2 bot using neural networks in 1999, I think it had several hundred neurons and several thousand 'synapses' (parameters). At the time that felt like a lot of parameters. Computation wasn't much of a limit though, I could run several NNs faster than realtime.

2sk21|2 years ago

I have one of the early PhDs in neural networks (graduated in 1992). However my work was analytical - I was able to prove a couple of theorems about the backpropagation. I just needed a simple implementation to prove that my ideas worked so I wrote my code from scratch in C.

bilsbie|2 years ago

I remember people telling me you would just get overfitting if you made the network too big.

I wonder how LLM’s avoid that?

amichal|2 years ago

I followed a scientific American article in 1992 as a high schooler and got digit recognition and basic arithmetic working on a 386. What the popsci press said at the time was that we were limited by memory bandwidth (cache size), training data, and to some extend pointer-chasing (and other inefficencies) in graph algos

gattilorenz|2 years ago

On the topic of AI history, I would like to set up a demo of old AI and/or general CS research on late 90s/early 00s Sun Ultra machines.

Does anyone have suggestions (and links to code!) for what would be a cool demo? I’m thinking of a haar classifier to show some object recognition/face detection, but would appreciate more options!

mistrial9|2 years ago

definitely saw NN code in the 1990s ; I recall a hardback book with mostly red cover.. not sure of the title.. Prominent and rigorous code implementations were associated with MIT at that time (the Random Forest guy was at Berkeley in the stats department)

edit yes, almost certainly Neural Networks for Pattern Recognition (1995) thx!

huitzitziltzin|2 years ago

The book “neural networks and pattern recognition” by bishop dates to 1996 and has a red cover, at least in its current softcover iteration.

The random forest guy you mean is/was Leo Breiman. His student Adele Cutler deserves some of the credit there too.

mjan22640|2 years ago

In 2012 were published results of a vision processing in the brain research, that (among other things, like the retina compressing the input) figured out that visual cortex uses convolution. That got mimicked and was a breakthrough in image recognition NN, which sparked life into the whole field.

r13a|2 years ago

Would you mind giving a reference to the paper? A quick googling didn't brought anything.

LarsDu88|2 years ago

Lol, Carmack's like -- I could've gotten a 4096 NN running on my early 90s NextCube dev rig, you neural networking researcher peasants!

rwmj|2 years ago

I knew someone in the early 90s who was making a neural network on a chip for his PhD. The chip fitted 1 neuron. Yes he might have used float16 to cram more in but those techniques were not known at the time.

There really wasn't the compute power around at the time, and as others have pointed out there wasn't the training data, or the cameras.

throwawayadvsec|2 years ago

[deleted]

api|2 years ago

Carmack got the brain worms? Say it isn’t so. Citation needed.

FrustratedMonky|2 years ago

Reading through the twitter thread, and these comments. It reminds me of all of the back and forth when HN discusses Psychology.

One side, holding a pipe, 'well actually, back in 1954, I put together an analog variant of a neuron perceptron built out of old speaker cables and car parts, strung it across the living room and it could say 10 words and fetch my slippers'. 'Really', 'Yes, Indubitably'.

The other side, It's all, 'REEEEEEEEEE'

peterfirefly|2 years ago

https://en.wikipedia.org/wiki/Elmer_and_Elsie_(robots)

"Elmer and Elsie, or the "tortoises" as they were known, were constructed between 1948 and 1949 using war surplus materials and old alarm clocks."

"The robots were designed to show the interaction between both light-sensitive and touch-sensitive control mechanisms which were basically two nerve cells with visual and tactile inputs."

FrustratedMonky|2 years ago

A very bad comment, that failed to make a point, and that wasn't very humorous.

I meant to make relationship between Psychology and Machine Learning.

Psychology, the study of the mind, with questionable scientific methods and a replication problem.

And

Machine Learning, (that is taking the mind as a model), with questionable scientific methods, and replication problem, and the addition of corporate hype machines.

Often in last few months we stand in awe of what AI achieves, but it produces questionable results, and has a lot of problems. Machine learning is worshiped.

And yet often in last few months, posts on Psychology is railed on and called a field full of con-men and BS-Artists.

Why the duality? Both are young fields and stretching. Rapidly making progress, hitting dead ends, and changing course. The scientific method isn't a strait path. But Psychology doesn't seem to be given much leeway to make errors and course correct.

I just find it hitting a peak right now, because the study of the Human Mind (wet net) and Machine Mind (electric net). Seem to be hitting a lot of the same issues. There are so many parallels in how they are spoken of, so many common problems and how they are framed within each field.

Wonder how long until we just openly talk about a field of Psychology of Machines, where we use the same tools to try and understand what the Neural Nets are thinking.