pyentropy | 1 year ago | on: Nvidia’s $589B DeepSeek rout
pyentropy's comments
pyentropy | 1 year ago | on: Why haven't biologists cured cancer?
pyentropy | 1 year ago | on: Is Aschenbrenner's 165 page paper on AI the naivety of a 25 year old?
pyentropy | 1 year ago | on: Is Aschenbrenner's 165 page paper on AI the naivety of a 25 year old?
pyentropy | 1 year ago | on: Is Aschenbrenner's 165 page paper on AI the naivety of a 25 year old?
But is the "-ed" in worked a problem?
pyentropy | 1 year ago | on: Is Aschenbrenner's 165 page paper on AI the naivety of a 25 year old?
pyentropy | 1 year ago | on: Is Aschenbrenner's 165 page paper on AI the naivety of a 25 year old?
pyentropy | 2 years ago | on: In the long run, we're all Dad
He's an atheist psychiatrist. However, he enjoys how natural selection, social dynamics and reputation can also be modeled by the moral rules of most religions. For example, going to therapy isn't that different from practicing confessions in a church.
pyentropy | 2 years ago | on: My primality testing code is faster than Sir Roger Penrose's
pyentropy | 2 years ago | on: The puzzling poll that made many Twitter users angry
pyentropy | 2 years ago | on: Metaculus
However it gets more interesting when you try to beat the crowd - because you have to take risk and disagree with the masses. You will either end up with negative reputation or a very large one. You can learn more about scoring functions and how to measure the accuracy of everyone's forecasts: https://www.metaculus.com/help/scoring/
Personally I have opened one question, and it involves predicting the net sales of Apple Vision Pro until 2025: https://www.metaculus.com/questions/17407/apple-vision-pro-n...
pyentropy | 2 years ago | on: Google claims to have proved its supremacy with new quantum computer
Quantum circuits are made of quantum logic gates like Hadamard, CNOT, Z, CZ, etc. Instead of bits as inputs and outputs, quantum logic gates have qubits. Unlike boolean logic where bits are 0 and 1, a qubit is a 2D vector [α β] where α and β are complex numbers, corresponding to a superposition of the zero and one bases: α * |0> + β * |1>. You can visualise a qubit as a point on a sphere, the so called Bloch sphere [1]
There are multiple ways to implement a qubit, but you need to start with some quantum phenomenon. An example is the polarisation of a photon, so horizontal could be |0> and vertical polarisation could be |1> and the qubit is represented as complex vector of these two. If you've studied linear algebra you know manipulating a vector often involves linear transformations. Any linear transformation can be represented as a matrix - so applying gates is just doing matrix multiplication. Unary gates are 2x2 matrices and binary gates are 4x4 matrices - for photons they would be implemented with mirrors and optical waveplates. Measuring the polarisation at the end is the output. The output is not deterministic but it always follows the same distribution, so you could design a circuit that has |001> X% of the time, |010> Y%, |111> Z% of the time, etc. such that X + Y + Z + .. = 100%.
I'm not too familiar with the details of random circuit sampling, but the idea is that you start with a big circuit that wasn't intentionally designed and therefore has no known properties we can exploit - instead it's a random mess of transformations to the qubits. A classical computer cannot run big quantum circuits - N gates with the 49 Google qubits requires like 2^49 * N^3 classical gates, so it won't be able to calculate the output distribution. However, what we can do is run the quantum circuit many times (do measurements on the quantum computer) and collect many samples. Given enough samples, a classical computer can verify whether there's consistency between them and whether an actual transformation produced them (and therefore quantum computation happened) or its just pure noise / garbage using cross entropy benchmarks [2].
Note that the purpose of the "random" in the random circuit is to introduce hardness and prevent cheating (assume that the classical computer is the "opponent" of the quantum computer); the circuits don't calculate anything useful / of human value.
What's interesting is that once people with supercomputers saw the benchmark formula and analysed the constant factors, they found a loophole which let them run a classical algorithm which generates measurements/samples that satisfy the benchmark with 40K classical CPUs for a week, or even a single A100 within 140 days. Some of their success was due to the sheer power available and some is due to algorithmic cleverness (see: tensor networks). In my opinion, they are only disproving the Sycamore supremacy in a fussy way.
[1] - https://en.wikipedia.org/wiki/Bloch_sphere
[2] - https://en.wikipedia.org/wiki/Cross-entropy_benchmarking
pyentropy | 2 years ago | on: DMT, Derealization, and Depersonalization
No matter how many times it happened, it was always equally scary - feeling like a passive observer of a movie starring some piece of flesh and bones as the main character, feeling completely separate from that body and unable to control its decisions. The episodes usually lasted <30 minutes.
I don't know about its occurrence in psychedelics, but in my case it always occurred after periods of extreme emotions (seeing a classmate die and being aware of my own mortality, being rejected by some 'friends' in school, and a few others). The way I see it (and some neuroscientists claim), the brain shuts the perception of "self" in order to stop intense emotional pain.
pyentropy | 2 years ago | on: Anime.js – A lightweight JavaScript animation library
pyentropy | 2 years ago | on: So this guy is now S3. All of S3
The official method is to set a TXT record, but apparently their "AT protocol" also lets you confirm a domain by serving `GET your.domainname.com/xrpc/com.atproto.identity.resolveHandle`
and `xrpc` was available as an S3 bucket name :)
pyentropy | 2 years ago | on: Transcendental Algebra (2017)
pyentropy | 2 years ago | on: We're building a browser when it's supposed to be impossible
And this is much more than that: custom JS interpreter, SVG, CSS renderers and so on...
pyentropy | 2 years ago | on: OpenAI Tokenizer
BPE is a tradeoff between single letters (computationally hard) and a word dictionary (can't handle novel words, languages or complex structures like code syntax). Note that tokens must be hardcoded because the neural network has an output layer consisting of neurons one-to-one mapped to the tokens (and the predicted word is the most activated neuron).
Human brains roughly do the same thing - that's why we have syllables as a tradeoff between letters and words.
pyentropy | 3 years ago | on: Grover's algorithm offers no quantum advantage
We don't know how noise will scale IRL so the job of theoretical scientists is to design the basic units of quantum computation regardless of how it may or may not work IRL. It's like judging XOR and NAND in 1920s because transistors maybe won't be able to simulate them.
pyentropy | 3 years ago | on: How did Dennis Ritchie produce his PhD thesis? A typographical mystery (2022) [pdf]
The authors are the famous Kernighan and Brailsford (!). They spent their life on stuff like this, and know him personally very well. They are talking about spacing much finer than half-character.
The only reason why a team would allocate time on memory optimizations and writing NVPTX code rather than focusing on posttraining is if they severely struggled with memory during training.
I mean, take a look at the numbers:
https://www.fibermall.com/blog/nvidia-ai-chip.htm#A100_vs_A8...
This is a massive trick pulled by Jensen, take the H100 design whose sales are regulated by the government, make it look 40x weaker and call it H800, while conveniently leaving 8-bit computation as fast as H100. Then bring it to China and let companies stockpile without disclosing production or sales numbers, and have no export controls.
Eventually, after 7 months, US govt starts noticing the H800 sales and introduces new export controls, but it's too late. By this point, DeepSeek has started research using fp8. They slowly build bigger and bigger models, work on the bandwidth and memory consumptions, until they make r1 - their reasoning model.