top | item 22736449

Machine translation of cortical activity to text with encoder–decoder framework

195 points| bookofjoe | 6 years ago |nature.com

91 comments

order
[+] bognition|6 years ago|reply
Really cool to see progress made here but this won't be available for public use any time soon (likely decades).

One of the biggest challenges with decoding brain signals is getting a large number sensors that detect voltages from a very localized region of the brain. This study was done with ECoG (Eletro-Cortico-Gram) which involves implanting small electrodes directly on the surface of the brain. Nearly all consumer devices use EEG (Electro Encephelo Gram) which involves putting sensors on the surface of the skin.

Commercially available ECoG is highly unlikely as it requires an extremely invasive brain surgery. For ethical reasons the implants from the study we're likely implanted to help diagnose existing life threatening medical issues.

Decoding speech from EEG won't work as well as ECoG a number of reasons. First the physical distance between the sensors and the brain Means the signals you pick up aren’t localized. Second the skin and skull are great low pass filters and filter out really interesting signals at higher frequencies, 100-2KHZ. Additionally these signals have a really low signal power because they're correlated with neuronal spiking.

ECoG does a really good job picking on these signals because the sensor is literally on the surface of the brain. Its really hard to pick up these signals reliably with EEG.

[+] hughw|6 years ago|reply
Seismologist here. The problems you describe sound reminiscent of those in seismic imaging. Ideally you'd bury geophones below the weathered layer (typically 3 m) at the surface, to get ideal coupling. In practice that's not economic at scale, and so you plant them on the surface, and account for the non-linear wave transmission through the weathered layer by clever math and by collecting more samples.

There's been a forty-year evolution in these techniques. The cheap, noisy technique might prevail if scientists keep refining the craft by tiny improvements.

[+] 2008guy|6 years ago|reply
No, actually high density electrode implants are right around the corner. Watch the neuralink press event.
[+] turingbike|6 years ago|reply
There was a great article on the front page yesterday, When to Assume Neural Networks Can Solve a Problem, https://news.ycombinator.com/item?id=22717367 . Case 2 is very relevant here, basically: if you have solved the problem with access to lots of data, usually you can adapt to a lower-data regime.

I am way out of my element talking about brain surgery and sensors. However, one thing that I do well is say "you shouldn't bet against neural networks", which is a great way to be right on a few-year time horizon.

[+] lioeters|6 years ago|reply
I've been fascinated by this topic since a couple decades ago, when I participated in a research study involving EEG and word recognition. Progress seems to be slow but steady, with applications getting more accurate and practical.

Your point about invasive implants being impractical for commercial use made me wonder.. I searched the first phrase that popped into my head, "Non-invasive Brain-Computer Interface". Looks like there's promising research on significantly improving sensitivity/resolution of EEG singals.

- First Ever Non-invasive Brain-Computer Interface Developed - https://www.technologynetworks.com/informatics/news/first-ev... (2019)

- Noninvasive neuroimaging enhances continuous neural tracking for robotic device control - https://robotics.sciencemag.org/content/4/31/eaaw6844

Still, your prediction of "likely decades" sounds realistic. I'm hoping for an affordable, non-invasive brain-computer interface to be as widely used as the keyboard, mouse, or microphone.

[+] PetitPrince|6 years ago|reply
> Its really hard to pick up these signals reliably with EEG.

If I may add : EEG is also a pain to put in place (you have to put gel, correctly put the cap etc.) and it's very easy to pollute your signal by merely moving a bit or even blinking.

[+] king07828|6 years ago|reply
Has there been any work to use a neural network to generate/simulate ECoG signal output from an EEG signal input? (My Google-fu only gives definitions and distinctions for ECoG and EEG). Almost sounds similar in concept to deep learning super sampling (DLSS), i.e., taking a low resolution image/signal (EEG) and using a neural network to generate/simulate a high resolution image/signal (ECoG).
[+] antupis|6 years ago|reply
Stupid question but could you do something very simple like on/off-switch with EEG and something similar than what they have done here?
[+] HPsquared|6 years ago|reply
What's your opinion on Neuralink?
[+] duckface|6 years ago|reply
I doubt this very much.

Sparse signal reconstruction is a massive and very possible thing to do using IIRC various forms of FFT.

I think this has already been done, and probably consumer devices will be using sparsity to reconstruct cortical signals with sufficient detail for this.

[+] lars|6 years ago|reply
This is cool. For those who are not super familiar with language processing, I think it's good to point out the limitations of what's been done here though. They mention that professional speech transcription has word error rate around 5%, and that their method gets a WER of 3%. Sure, but the big distinction is that speech transcription must operate on an infinite number of sentences, even sentences that have never been said before. This method only has to distinguish between 30-50 sentences, and the same sentences must exist at least twice in the training set and once in the test set. Decoding word-by-word is really a roundabout way of doing a 50-way classification here.

It's an invasive technique, so they need electrodes on a human cortex. This means data collection is costly, so their operating in very low data regime compared to most other seq2seq applications. It seems theoretically possible that this could operate on Google translate level accuracy if the sentence dataset was terrabyte sized rather than kilobyte sized. That dataset size seems very unlikely to be collected any time soon, so we'll need massive leaps in data efficiency in machine learning for something like this to reach that level. They explore transfer learning for this, which is nice to see. Subject-independent modelling is almost certainly a requirement to achieve significant leaps in accuracy for methods like this.

[+] kasmura|6 years ago|reply
Is the following quote at odds with what you are saying about 50-way classification?

"On the other hand, the network is not merely classifying sentences, since performance is improved by augmenting the training set even with sentences not contained in the testing set (Fig. 3a,b). This result is critical: it implies that the network has learned to identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible."

[+] hrgiger|6 years ago|reply
I have tried something similar 5 years ago using the meditation device from choosemuse.com. It was the cheapest option and provided hackable interface that you had access all the data. Then I wrote a small mobile app connects to headset.

Application was picking and showing a single random word from "hello world my name is hrgiger" then showing a greeen light, when I see green light, i think about the word and blink, headset was able to detect blinks as well so app was creating training data using blink time - xxx millis. So I created few thousands training data with 6 class using this and trained with my half-ass nn implementation and used generated weights to predict same way via mobile. Never achieved higher than 40%, tried all mixed waves, raw data, different windows of time series. Yet still it was a fun project to mess with, still I try to tune this nn implementation. If they achieve a practical solution I would use subtitles for the full length training. Simple netflix browser plugin might do the trick,but I am not sure if there will be a single AI algo that would understand everyones different data.

[+] linschn|6 years ago|reply
40% over 6 classes is way above a random baseline. This is actually pretty cool. Congratulations!
[+] andai|6 years ago|reply
The Muse headband looks like it covers a pretty small area of the head right? Other products in a similar price range cover more or less the whole scalp.
[+] leggomylibro|6 years ago|reply
It looks cool, but they trained their models on people reading printed sentences out loud.

Would that actually translate to decoding the process of turning abstract thoughts into words?

The researchers also note that their models are vulnerable to over-fitting because of the paucity of training data, and they only used a 250-word vocabulary. Neuralink also has a strong commercial incentive to inflate the results, so I'm not too sure about this.

It's great to see progress in these areas, but it seems that technologies like eye-tracking and P300 spellers are probably going to be more reliable and less invasive for quite some time.

[+] hyyggnj|6 years ago|reply
The speaking aloud is very suspicious. Why do subjects need to speak aloud? Are they actually decoding neural signals or just picking up artifacts introduced by the physical act of speaking (i.e. electrodes vibrating due to sound, etc)?
[+] weinzierl|6 years ago|reply
Fascinating work but far from what some might hope from reading only the title.

The translation is restricted to a vocabulary of 30 to 50 unique sentences.

[+] zo1|6 years ago|reply
They do mention that the network is partially learning the words themselves:

> "On the other hand, the network is not merely classifying sentences, since performance is improved by augmenting the training set even with sentences not contained in the testing set (Fig. 3a,b). This result is critical: it implies that the network has learned to identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible."

[+] warnhardcode|6 years ago|reply
More than enough to control window focus on my computer and such. I'd be happy to have a system that responded to a few hundred thoughts: "Left desktop", "Right desktop", "Last focused window", "Lock screen", "What time is it?"
[+] zo1|6 years ago|reply
Can we remove the tracking query-string from this link, please? It works fine without it:

https://www.nature.com/articles/s41593-020-0608-8.epdf

Edit. Sorry, seems to only show the first page if you remove the token.

[+] IHLayman|6 years ago|reply
Not only that, if you don't run the trackjs script, the pdf won't load at all. Sorry but a hard pass from me. Don't track my reading.
[+] imglorp|6 years ago|reply
Good idea.

I wonder if the site could run a URL cleaner on all link submissions?

@dang?

[+] briga|6 years ago|reply
It seems like this field is at about the same stage of progress as image recognition was in the 90s when researchers were trying to getting a handle on MNIST-type tasks.

I wonder how much the language embeddings learned by the transformer are reflected in the actual physical structure of the brain? Could it be that the transformer is making the same sort of representations as those in the brain, or is it learning entirely new representations? My guess is that it's doing something quite different from what the brain is doing, although I wouldn't rule out some sort of convergence. Either way, this is a fascinating branch of research both for AI and the cognitive sciences.

[+] h3ctic|6 years ago|reply
Looks like a good approach and the error rate of 3% is really good, I guess. Did they mention how they got the input data? I couldn't find it.
[+] zo1|6 years ago|reply
They use 250 ECG electrodes as input. I think that means it's above the skin, so not invasive.
[+] carapace|6 years ago|reply
I'm pretty sure you can get that with a HD camera or two and some hypnosis plus off-the-shelf ML.

One of the very first things I learned when I was studying hypnosis was to induce a simple binary signal from the unconscious. (Technically it's trinary: {y,n,mu} https://en.wikipedia.org/wiki/Mu_(negative) )

(In my case my right arm would twitch for "yes", left for "no", no twitch for "mu" (I don't want to go on a long tangent about all the various shades of meaning there, suffice it to say it's a form of "does not compute."))

Anyway, it would be trivial to set up one or more binary signals, and detect them via switches or, these days, HD cameras and ML. You could train your computer to "read" your mind from very small muscular contractions/relaxations of your face. (The primary output channel, even before voice, of the brain, eh?)

Or you could just set up a nine-bit parallel port (1 byte + clock) and hypnotize yourself to emit ASCII or UTF_8 directly. That would be much much simpler because it's so much easier and faster to write mind software than computer software (once you know how.) And you could plug yourself into any USB port and come up as a HID (mouse & keyboard.)

I'll say it again: when you connect a brain to a computer the more sophisticated information processing unit is the point of greatest leverage. Trying to get the computer to do the work is like attaching horses to the front of your truck to tow it. Put the horses in the back and let the engine tow them.

[+] astrea|6 years ago|reply
Could you elaborate a little bit on the 'induce a simple binary signal from the unconscious'? That sounds fascinating.
[+] tighter_wires|6 years ago|reply
Man, what is going on with Participant C's active neurons.