top | item 41111906

(no title)

I have some experience. Variants of regularization are a must. There are just too few samples and too much noise per sample.

In a related problem, covariance matrix estimation, variants of shrinkage is popular. The most straight forward one being Linear Shrinkage (Ledoit, Wolf).

Excepting neural nets, I think most people doing regression simply use linear regression with above type touches based on the domain.

Particularly in finance you fool yourself too much with more complex models.

discuss

fasttriggerfish|1 year ago

Yes these are good points and probably the most important ones as far as the maths is concerned, though I would say regularisations methods are really standard things one learns in any ML / stat course. Ledoit, Wolf shrinkage is indeed more exotic and very useful.

Ntrails|1 year ago

> There are just too few samples and too much noise per sample.

Call it 2000 liquid products on the US exchanges. Many years of data. Even if you approximate it down from per tick to 1 minutely, that doesn't feel like you're struggling for a large in sample period?

kqr|1 year ago

It sounds like you are assuming the joint distribution of returns in the future is equal to that of the past, and assuming away potential time dependence.

These may be valid assumptions, but even if they are, "sample size" is always relative to between-sample unit variance, and that variance can be quite large for financial data. In some cases even infinite!

Regarding relativity of sample size, see e.g. this upcoming article: https://two-wrongs.com/sample-unit-engineering

bormaj|1 year ago

They may have been referring to (for example) reported financial results or news events which are more infrequent/rare but may have outsized impact on market prices.

energy123|1 year ago

If the distribution changes enough, multiple years of data may as well be no data.