confuseshrink | 3 years ago | on: Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
confuseshrink's comments
confuseshrink | 5 years ago | on: The neural network of the Stockfish chess engine
For single-input "batches" (seems like this is what's being used now?) it might never be worthwhile but perhaps if multiple positions could be searched in parallel and the NN evaluation batched this might start to look tempting?
Not sure what the effect of running PVS with multiple parallel search threads is. Presumably the payoff of searching with less information means you reach the performance ceiling quite a lot quicker than MCTS-like searches as the pruning is a lot more sensitive to having up-to-date information about the principal variation.
Disclaimer: My understanding of PVS is very limited.
confuseshrink | 5 years ago | on: MuZero: Mastering Go, chess, shogi and Atari without rules
confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google
I readily admit I don't know any of the numbers associated with carbon production and my comment was solely based on the one GPU vs car figure presented in the aforementioned paper.
confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google
I think a more nuanced conversation around these topics will look at exactly what you bring up, how do we properly trade the potential knowledge benefit against the costs?
It pains me that entirely valid avenues of research like this get covered up in nonsense and drama and their message seemingly lost in the midst of it.
confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google
I simply have no idea where the hinge point is. This could inform other questions like, could it be worth to scale up to get a more accurate model (pay up-front in training) to avoid further searches (inference)?
confuseshrink | 5 years ago | on: We read the paper that forced Timnit Gebru out of Google
Yes if you do run 240x p100s at literally 100% 24/7 for a year you get the power consumption of 5 cars. This run never happened though, this all ran on TPUs at lower precision, lower power consumption and much lower time to converge.
If anything this tells you that electronics are ridiculously green even when operating at 100%. I've never profiled world-wide carbon production but something tells me if you wanted to carbon optimise you'd be better served trying to take cars off the road and planes out of the sky.
confuseshrink | 5 years ago | on: Yann LeCun on GPT-3
Personally I see little evidence that this "just scale a transformer until sentience" hype-train is going to take us anywhere interesting or particularly useful.
And for the people who claim it is super useful already, can you actually trust its outputs without any manual inspection in a production setting? If not it's probably not as useful as you think it might be.
confuseshrink | 5 years ago | on: Are we in an AI Overhang?
I saw a lot of basic arithmetic in the thousands range where it failed. If we have to keep scaling it quadratically for it to learn log n scale arithmetic then we're doing it wrong.
I'm surprised you think it learned some basic rules around arithmetic. A lot of simple rules extrapolate very well, into all number ranges. To me it seems like it's just making things up as it goes along. I'll grant you this though, it can make for a convincing illusion at times.
confuseshrink | 5 years ago | on: Powerful AI Can Now Be Trained on a Single Computer
Running an already trained reinforcement learning agent is relatively cheap (unless your model is massive).
I suspect the reason people aren't using it yet is because it's a) really difficult to get right in training, even basic convergence is not guaranteed without careful tuning b) really difficult to guarantee reasonable behavior outside of the scenarios you're able to reach in QA.
edit: Link to lecture series https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTra...
confuseshrink | 5 years ago | on: Building a $5k ML Workstation with Tiitan RTX and Ryzen ThreadRipper [video]
confuseshrink | 5 years ago | on: Linus Torvalds on AVX512
Dealing with these issues might require you to know the corners of the instruction set really well or some times the solution is outside of the instruction set and is related to how your data structure is laid out in memory leading you to AoS vs SoA analysis etc.
Compilers and vectorization: Based on reading a lot of assembly output I think what compilers usually struggle with are assumptions that the human programmer know hold for a given piece of code, but the compiler has no right to make. Some of this is basic alignment, gcc and clang have intrinsics for these. Some times it's related to the memory model of the programming language disallowing a load or a store at specific points.
GPGPU programmability: GPUs being easy to program is something I take with a grain of salt, yes it's easy to get up and running with CUDA. Making an _efficient_ CUDA program however is easily as challenging if not more than writing an efficient AVX program.
confuseshrink | 5 years ago | on: Understanding Convolutional Neural Networks
confuseshrink | 5 years ago | on: On the Folly of Rewarding A, While Hoping for B (1975) [pdf]
confuseshrink | 5 years ago | on: Anders Tegnell defends Sweden's virus approach
Anyone with a background in mathematical modelling should be extremely cautious of applying models in situations where there are serious ramifications to getting the wrong answer.
Hubris.
confuseshrink | 6 years ago | on: SARS-CoV-2 titers in wastewater are higher than expected from confirmed cases
confuseshrink | 6 years ago | on: Mochizuki's proof of the ABC conjecture accepted for publication
I haven't checked out the linked paper yet but if they managed to do something from first principles that would still be an interesting development.