Machine Learning Can't Handle Long-Term Time-Series Data

[+] mjburgess|6 years ago|reply

Time is only a symptom of what's missing: causation.

ML operates with associative models of billions of parameters: trying to learn thermodynamics by parameterizing for every molecule in a billion images of them.

Animals operate with causal models of a very small number of parameters: these models richly describe how an intervention on one variable causes another to change. These models cannot be inferred from association (hence the last 500 years of science).

They require direct causal intervention in the environment to see how it changes (ie., real learning). And a rich background of historical learning to interpret new observation. You need to have lived a human life to guess what a pedestrian is going to do.

If you overcome the relevant computational infinities to learn "strategy" you will still only do so in the narrow horizon of a highly regulated game where causation has been eliminated by construction (ie., the space of all possible moves over the total horizon of the game can be known in an instant).

The state of all possible (past, current, future) configurations of a physical system cannot be computed -- it's an infinity computational statistics will never bridge.

The solution to self-driving cars will be to try and gamify the roads: robotize people so that machines can understand them. This is already happening on the internet: our behaviour made more machine-like so it can be predicted. I'm sceptical real-world behaviour can be so-constrained.

[+] nextos|6 years ago|reply

Exactly, that's why I think we need to put logic and probability theory back into cutting edge ML. [1,2] are only early approaches that show potential directions to achieve this.

Deep learning is very useful, but only one piece of the whole AGI puzzle.

Furthermore, many AI problems will benefit the generality of being formulated as a probabilistic program synthesis problem [3]. In this framework, lots of program semantics (~formal methods) concepts like abstract interpretation [4,5] might become very useful. They allow to explore huge program spaces very quickly.

Lastly, Pearl's do calculus [6] is a good starting point closely related to [1].

[1] http://probmods.org/

[2] http://pyro.ai/

[3] https://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8....

[4] http://www.concrete-semantics.org/concrete-semantics.pdf

[5] http://adam.chlipala.net/frap/frap_book.pdf

[6] http://bayes.cs.ucla.edu/BOOK-2K/causality2-epilogue.pdf

[+] iandanforth|6 years ago|reply

The causal argument suffers from a problem of nomenclature.

On one side we have the colloquial understanding of cause and effect where a cause is a true impetus of effect. On the other side we have "causal" learning in biology where you're not actually learning causes, just strong correlations. We can learn just about any temporal association even if there is no direct cause-effect relationship. Random reward structures are a way to illustrate this: present a reinforcing stimulus to an animal at random times and a random subset of behavior will increase in frequency. The animal develops a false "causal" belief that a series of its actions is influencing the presentation of a reward.

That's why I like focusing on "sequence prediction", even colloquially we know predictions can be wrong. Those predictions can be influenced by low-d world models, but you don't accidentally elevate that model to claim a pure/symbolic/accurate model as can happen with incautious use of the words like "causal."

[+] naasking|6 years ago|reply

> Animals operate with causal models of a very small number of parameters: these models richly describe how an intervention on one variable causes another to change. These models cannot be inferred from association (hence the last 500 years of science).

* Tegmark and Wu's AI Physicist: https://news.ycombinator.com/item?id=18381827

* AIXI: https://en.wikipedia.org/wiki/AIXI

[+] scottlegrand2|6 years ago|reply

Or you could engineer the roads and the cars to provide telemetry on where they are in high resolution in real time.

This would reduce a nearly impossible job in machine intelligence to a really difficult simulation problem.

But I don't see that happening anytime soon.

[+] unknown|6 years ago|reply

[deleted]

[+] nabla9|6 years ago|reply

This is clever crackpottery type brainstorming from a smart person.

The author has extremely grand set of connections he developed. It ties down Buddha, enlightenment, vipassana meditation, artificial intelligence, cybernetics, fractals and neuroscience. Nothing wrong with that, of course.

Creative thinker should have these kind of crazy ideas and connections every day or at least once a week. I carry with me a notebook that is full of them.

Most ideas die as 'premature babies'. They may be interesting to think and write down, but they are not fully developed and never fit together as well as you initially thought. Filtering and piking some of them to work with is important. Giving them up is the difference between crackpot and non-crackpot.

Forcing grand connections prematurely makes this crackpottery type. Sharing the creative brainstorm in an essay that does not try make up connections would be easier to read.

[+] scottlocklin|6 years ago|reply

>This is clever crackpottery type brainstorming from a smart person.

I think you're being too kind here. It's just crackpottery as far as I can tell. I agree with you though that it is the type of dumb idea that should die in a private notebook.

[+] anst|6 years ago|reply

Yeah, stir some mysterious ideas, maybe someone sees there a meaning. Quite surprised with this kind of vague magical thinking coming from lesswrong (thought they were rationalists or something).

[+] mcguire|6 years ago|reply

Are you reading the same article?

[+] tgflynn|6 years ago|reply

The line between crackpottery and genius is a fine one. If blogs had existed in 1900 and a certain patent clerk had written a post on his ideas about clock synchronization somehow being related to electromagnetism, many would have dismissed him as a crackpot as well.

The questions this article relates to are among the most profound and difficult that human reason has ever attempted to confront. I think one should be careful in labeling such ambitious speculation as crackpottery just because it doesn't yet amount to a fully coherent and formally testable theory.

[+] scythe|6 years ago|reply

To be clear, you’re not disagreeing that the author identifies a problem—you disagree (as I do) that “organized fractally” is not a meaningful phrase that can guide the development of new AI, correct?

[+] NOT_A_ROBOT|6 years ago|reply

For the past 10 years, I have been called a "crackpot" by friends and family, but It never crushed my motivation.

I moved forward with 5 hours days / 5 days a week while keeping a full time job.

Today I am sitting on a serious "wealth machine".

We have to accept that 98% of human cannot "think outside the box"

[+] skunkworker|6 years ago|reply

This article seems out-of-date by 5 years or more even though it was published today, and I am unsure as to why.

It calls out long short term memory but doesn't mention recent (last 5 years) improvements like Gated Recurrent Networks (GRUs) or Transformers (GPT-2, huggingface/transformers) which have shown significant improvements over the traditional LSTM model. These can handle time series data much better than older models could.

[+] joe_the_user|6 years ago|reply

I think you need to give details and references to support a claim that these innovations make a fundamental difference.

I don't doubt that the things you mention involve improvements but are these improvements doing better on the same benchmarks in the same fashion or a fundamental change. I read many claims that recent changes in deep learning represent the former.

[+] lern_too_spel|6 years ago|reply

Even weirder, it calls LSTM a "newly invented variant" of RNN. LSTM is 20 years old.

[+] hnews_account_1|6 years ago|reply

Any links to heavy time series based machine learning algorithms? I'm in finance, and while I know how to establish and run a random forest or gradient boost regressor using standard libraries, I've never had a good handle on them.

[+] joe_the_user|6 years ago|reply

This claim seems plausible.

The reason seems even simpler than the article. Deep learning requires lots of training data - that data naturally needs to more or less be "the same"; follow "the same" logic.

A long enough time series is going to involve a change in the logic of the real world, a change that the network won't be trained for.

[+] s_Hogg|6 years ago|reply

Deep Learning doesn't necessarily require a huge amount of data. What DL does is allow you to fit more complex relationships between input and output than would be the case with, say, a bog standard linear model. If a relationship is complex and also clearly defined in your data, then you don't necessarily need much. In general it's true that may not be the case, but that doesn't make "deep learning requires lots of training data" true, only that "the data used for deep learning models is typically noisy on top of representing a complex relationship".

It's a semantic difference, but a very important one if we're to avoid going down the road of just mindlessly throwing compute at every problem. And if we do that, we'll just wind up with millions of Rube Goldberg machines instead of actually solving problems.

The change in logic of the real world thing is absolutely spot on, though. Over enough time it becomes basically impossible to disentangle effects.

[+] dclowd9901|6 years ago|reply

I think this is where the lack of “imagination” that computers currently cannot replicate becomes a huge problem that will set AI and ML back decades more.

[+] unknown|6 years ago|reply

[deleted]

[+] scottlocklin|6 years ago|reply

Machine learning does just fine to extremely well at long term time series data; there are entire branches of machine learning dedicated to this. The fact that this imbecile never heard of these tools is why nobody should be reading his essay.

Uber's engineers didn't do this for their human finder because;

1) Image recognition stuff isn't explicitly built to do this (though it easily could be jury rigged to do so)

2) Uber's engineers apparently never heard of the concept of "moving averages" and "threshholds" which would have worked just fine.

"More precisely, today's machine learning (ML) systems cannot infer a fractal structure from time series data."

-look at this idiot using words he doesn't understand. Muh fractals.

[+] justapassenger|6 years ago|reply

“Imbecile”, “idiot”? Please refrain from personal insults as they add nothing to the discussion.

[+] cbsmith|6 years ago|reply

I'm feeling like this is entire missing the whole world of Matrix Profiles and Time Series Chains...

[+] NPMaxwell|6 years ago|reply

The article I would like to read is what the challenges are to including a few prior states in navigation. I'm amazed that, when I drive over or under a bridge, my online mapping software changes instructions as if my car were able to levitate 20 feet onto the roadway above or below, even when that roadway is a highway without exit or entrance within a mile.

[+] longemen3000|6 years ago|reply

I remembered that neural differential equations are better suited to represent time series data, I saw them being used a lot in pharmacological processes, any additional idea or insight related to this?

64 comments