Roxxik | 2 days ago | on: Flash-MoE: Running a 397B Parameter Model on a Laptop
Roxxik's comments
Roxxik | 2 days ago | on: Flash-MoE: Running a 397B Parameter Model on a Laptop
Roxxik | 8 days ago | on: How I write software with LLMs
I often use AI successfully, but in a few cases I had, it was bad. That was when I didn't even know the end goal and regularly switched the fundamental assumptions that the LLM tried to build up.
One case was a simulation where I wanted to see some specific property in the convergence behavior, but I had no idea how it would get there in the dynamics of the simulation or how it should behave when perturbed.
So the LLM tried many fundamentally different approaches and when I had something that specifically did not work it immediately switched approaches.
Next time I get to work on this (toy) problem I will let it implement some of them, fully parametrize them and let me have a go with it. There is a concrete goal and I can play around myself to see if my specific convergence criterium is even possible.
Roxxik | 5 months ago | on: The Weird Concept of Branchless Programming
https://gist.github.com/Stefan-JLU/3925c6a73836ce841860b55c8...
Roxxik | 1 year ago | on: Catalytic computing taps the full power of a full hard drive
So the data might be incompressible and thus compressing it and restoring it afterwards would not work.
Edit: From the paper:
> One natural approach is to compress the data on the hard disk as much as possible, use the freed-up space for your computation and finally uncompress the data, restoring it to its original setting. But suppose that the data is not compressible. In other words, your scheme has to always work no matter the contents of the hard drive. Can you still make good use of this additional space?
Roxxik | 1 year ago | on: Catalytic computing taps the full power of a full hard drive
Which is called catalytic, because it wouldn't be able to do the computation in the amount of clean space it has, but can do it by temporarily mutating auxiliary space and then restoring it.
What I haven't yet figured out is how to do reversible instructions on auxiliary space. You can mutate a value depending on your input, but how do you use that value, since you can't assume anything about the contents of the auxiliary space and just overwriting with a constant (e.g. 0) is not reversible.
Maybe there is some xor like trick, where you can store two values in the same space and you can restore them, as long as you know one of the values.
Edit: After delving into the paper linked in another comment, which is rather mathy (or computer sciency in the original meaning of the phrase), I'd like to have a simple example of a program that can not run in it's amount of free space and actually needs to utilize the auxiliary space.
Roxxik | 1 year ago | on: The Power of Small Brain Networks
What even is an artificial neuron in an Artificial Neural Network executed on "normal" (non-neuromorphic) hardware? It is a set of weights and an activation function.
And you evaluate all neurons of a layer at the same time by multiplying their weights in a matrix by the incoming activations in a vector. Then you apply the activation function to get the outgoing activations.
Viewing this from a hardware perspective, there are no individual neurons, just matrix multiplications followed by activation functions.
I'm going out of my area of expertise here, I just started studying bioinformatics, but neurological neurons can't simply hold an activation because they communicate by depolarising their membrane. So they have to be spiking by their very nature of being a cell.
This depolarization costs a lot of energy, so they are incentived to do more with less activations.
Computer hardware doesn't have a membrane and thus can hold activations, it doesn't need spiking and these activations cost very little on their own.
So I'm not sure what we stand to gain from more complicated artificial neurons.
On the other hand, artificial neutral networks do need a lot of memory bandwidth to load in these weights. So an approach that better integrates storage and execution might help. If that is memristor tech or something else.
Roxxik | 2 years ago | on: Ask HN: What's your seniority, location and notice period?
I started as a junior IT Consultant in Hamburg, Germany with a 3 month notice, but could only quit every half year. So for quitting at the end of june I had to give notice in march, and then the next opportunity was september for quitting at the end of december.
But since I'm going back to university now, we made a "Aufhebungsvertrag" so that I quit at the end of september. You have no right to it, but I have a good relation to my employer so they accepted.
EDITED for clarity
Roxxik | 7 years ago | on: So you want to be a wizard
Disclaimer: I'm from germany
Roxxik | 7 years ago | on: LIDL cancels SAP introduction after spending 500M Euro
Lidl is wasting 500 million euros
Because the introduction of a new data system did not work out, Deutsche Post already had to record a high loss several years ago. The same thing happened to Lidl. After seven years and costs of more than half a billion euros, the planned system is still not running smoothly. Now the discounter has pulled the ripcord.
Lidl has been on an expansion course for years. The discount store from Neckarsulm now has branches in almost every country in Europe and is now also growing in the USA. A new merchandise management system was needed to easily keep track of the increasingly complex business processes and to control branches, purchasing and logistics. Therefor the decision in 2011.
System is not good for high-turnover countries
Software from the Walldorf-based software company SAP was to be adapted to the needs of Lidl. So far, however, the new system has only been introduced in some small agencies in Austria, Northern Ireland and the USA. It has been shown that the SAP version developed by over one hundred IT specialists is not suitable for high-turnover countries. Now Lidl has stopped the project. In one of the newspaper "Heilbronner Stimme" present letter to the coworkers it is said that the actual "goals" are "not reachable with justifiable effort". So far, according to expert opinion, the project has consumed more than half a billion euros - for expensive IT consultants and SAP licenses, for example. Now Lidl wants to further develop its own inventory management system.
Roxxik | 8 years ago | on: FreeBSD 11.1 released
Roxxik | 8 years ago | on: Ethereum from scratch – Part 1: Ping
https://github.com/Roxxik/ping_ether/blob/master/src/main.rs
Roxxik | 8 years ago | on: Ethereum from scratch – Part 1: Ping
"bytes address; // BE encoded 4-byte or 16-byte address (size determines ipv4 vs ipv6)"[0]
Roxxik | 8 years ago | on: Ethereum from scratch – Part 1: Ping
Looking into ethereum related stuff I often get the feeling that one already needs to know half of ethereum quite decent to learn something new, which makes it hard to even get started.
Reading your guide (am not finished yet) gives me a better picture of the things I actually need to do, to start discovering some nodes, so I'm really thankful for that and hope for more beginner targeted ethereum content.
Roxxik | 9 years ago | on: DTrace for Linux 2016
Roxxik | 9 years ago | on: Packet Capturing MySQL with Rust
And adding the functionality for a pcap like fileformat doesn't seem that difficult.
The filters are a major pain point, I don't know how libpcap handles this, but at least it says it won't copy packets from kernel- to userspace that are not matching. Thus avoiding alot of overhead, maybe it's possible to introduce some rusty kind of filtering in libpnet, too.
Going to log into Github now and see if I can do something.
EDIT: fixed spelling
Roxxik | 9 years ago | on: Packet Capturing MySQL with Rust
Wouldn't it be better to use some kind of privilege separation? I think there is a reason WireShark does this... And even saying Rust is a safe Language won't save you from programming errors, it just makes them more diffcult.
Outside of that the SSD is idling.
Table 3 shows for K=4 experts an IO of 943 MB/Tok at 3.15 Tok/s giving an average IO of 2970 MB/s far below what the SSD could do.
I'm not sure, but not all expert weights are used immediately. Maybe they could do async reads for the down tensors parallelizing compute with IO.
Not sure if this works on Mac, I only tested my larger than RAM setup on Linux with io_uring O_DIRECT reads and I saw that about 20% of total reads do finish while my fused upgate matmul is already running.
Edit: Typos