confuseshrink's comments

confuseshrink | 3 years ago | on: Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Starcraft in the form of Alphastar worked in the sense that it could beat humans, at least in the short term. The problem with the whole technique is that they had to tether it to the human examples they had gathered in the form of a divergence loss.

I haven't checked out the linked paper yet but if they managed to do something from first principles that would still be an interesting development.

confuseshrink | 5 years ago | on: The neural network of the Stockfish chess engine

Interesting point. Nvidia have been improving the int performance for quantized inference on their GPUs a lot. It might be a lot of work but could it be possible to scale up this NNUE approach to the point where it would be worthwhile to run on a GPU?

For single-input "batches" (seems like this is what's being used now?) it might never be worthwhile but perhaps if multiple positions could be searched in parallel and the NN evaluation batched this might start to look tempting?

Not sure what the effect of running PVS with multiple parallel search threads is. Presumably the payoff of searching with less information means you reach the performance ceiling quite a lot quicker than MCTS-like searches as the pruning is a lot more sensitive to having up-to-date information about the principal variation.

Disclaimer: My understanding of PVS is very limited.

confuseshrink | 5 years ago | on: MuZero: Mastering Go, chess, shogi and Atari without rules

The published hyperparameters are usually ridiculously conservative, for the simple games like breakout and pong you can usually converge in far fewer frames than in the papers.

confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google

> We're getting a bit off-topic here, but the #1 target by far in reducing greenhouse emissions is power generation.

I readily admit I don't know any of the numbers associated with carbon production and my comment was solely based on the one GPU vs car figure presented in the aforementioned paper.

confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google

This is a very valid argument but it's hard to know what scaling a transformer will really do without trying (looking at you GPT-3). This is probably an issue for ML in general at this point.

I think a more nuanced conversation around these topics will look at exactly what you bring up, how do we properly trade the potential knowledge benefit against the costs?

It pains me that entirely valid avenues of research like this get covered up in nonsense and drama and their message seemingly lost in the midst of it.

confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google

Yes it's something I often see ignored as "common knowledge" dictates that in ML inference is way cheaper than training. But if you're running a model in production at google with loads of google searches hitting it every second. At what point does the inference costs start to outweigh the training costs?

I simply have no idea where the hinge point is. This could inform other questions like, could it be worth to scale up to get a more accurate model (pay up-front in training) to avoid further searches (inference)?

confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google

The Strubell paper which is the origin of this "5 cars" number isn't even in the right ballpark for this stuff. What they did was take desktop GPU power consumption running the model in fp32, extrapolate to a 240x GPU (P100) setup that would run for a year straight at 100% power consumption.

Yes if you do run 240x p100s at literally 100% 24/7 for a year you get the power consumption of 5 cars. This run never happened though, this all ran on TPUs at lower precision, lower power consumption and much lower time to converge.

If anything this tells you that electronics are ridiculously green even when operating at 100%. I've never profiled world-wide carbon production but something tells me if you wanted to carbon optimise you'd be better served trying to take cars off the road and planes out of the sky.

confuseshrink | 5 years ago | on: Yann LeCun on GPT-3

Yann is a consistently sober voice in this world of AI hype. I find it quite refreshing.

Personally I see little evidence that this "just scale a transformer until sentience" hype-train is going to take us anywhere interesting or particularly useful.

And for the people who claim it is super useful already, can you actually trust its outputs without any manual inspection in a production setting? If not it's probably not as useful as you think it might be.

confuseshrink | 5 years ago | on: Are we in an AI Overhang?

> So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

I saw a lot of basic arithmetic in the thousands range where it failed. If we have to keep scaling it quadratically for it to learn log n scale arithmetic then we're doing it wrong.

I'm surprised you think it learned some basic rules around arithmetic. A lot of simple rules extrapolate very well, into all number ranges. To me it seems like it's just making things up as it goes along. I'll grant you this though, it can make for a convincing illusion at times.

confuseshrink | 5 years ago | on: Powerful AI Can Now Be Trained on a Single Computer

I would start with David Silvers (DeepMind) youtube series to get an idea of what's possible or not.

Running an already trained reinforcement learning agent is relatively cheap (unless your model is massive).

I suspect the reason people aren't using it yet is because it's a) really difficult to get right in training, even basic convergence is not guaranteed without careful tuning b) really difficult to guarantee reasonable behavior outside of the scenarios you're able to reach in QA.

edit: Link to lecture series https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTra...

confuseshrink | 5 years ago | on: Building a $5k ML Workstation with Tiitan RTX and Ryzen ThreadRipper [video]

It depends on how intensive your pre-processing pipeline is. With a really fast accelerator you can quite easily start to be bottlenecked by your CPU.

confuseshrink | 5 years ago | on: Linus Torvalds on AVX512

Vectorization: I'm not an expert in this area so I can only tell you what I've personally found difficult in dealing with vectorization. Usually it all comes down to alignment and vector lanes. To utilize the vector instructions you basically have to paint your memory into separate (but interleaved) regions that can be mapped to distinct vector lanes efficiently. Everything is fine as long as no two elements from separate lanes have to be mixed in some way, as soon as your computation requires that you incur a heavy cost.

Dealing with these issues might require you to know the corners of the instruction set really well or some times the solution is outside of the instruction set and is related to how your data structure is laid out in memory leading you to AoS vs SoA analysis etc.

Compilers and vectorization: Based on reading a lot of assembly output I think what compilers usually struggle with are assumptions that the human programmer know hold for a given piece of code, but the compiler has no right to make. Some of this is basic alignment, gcc and clang have intrinsics for these. Some times it's related to the memory model of the programming language disallowing a load or a store at specific points.

GPGPU programmability: GPUs being easy to program is something I take with a grain of salt, yes it's easy to get up and running with CUDA. Making an _efficient_ CUDA program however is easily as challenging if not more than writing an efficient AVX program.

confuseshrink | 5 years ago | on: Understanding Convolutional Neural Networks

Bias is just a scalar term that is added. You can learn it via backpropagation like all the other weights.

confuseshrink | 5 years ago | on: On the Folly of Rewarding A, While Hoping for B (1975) [pdf]

Since you are right that I have no idea what you are talking about, could you explain what you are talking about?

confuseshrink | 5 years ago | on: Anders Tegnell defends Sweden's virus approach

I find it very surprising that someone would rely on unvalidated mathematical models for this, that goes for the Imperial College people as well as Sweden. Are they even able to fit the parameters in retrospect?

Anyone with a background in mathematical modelling should be extremely cautious of applying models in situations where there are serious ramifications to getting the wrong answer.

Hubris.

confuseshrink | 6 years ago | on: SARS-CoV-2 titers in wastewater are higher than expected from confirmed cases

That article doesn't state that 14% of cases were asymptomatic.

confuseshrink | 6 years ago | on: Mochizuki's proof of the ABC conjecture accepted for publication

I'm not a mathematician but coming from the software world if one guy wrote a massive program (I'm assuming 600 pages is massive) in "an impenetrable, idiosyncratic style" you could virtually guarantee it would not be correct.