Roxxik's comments

Roxxik | 2 days ago | on: Flash-MoE: Running a 397B Parameter Model on a Laptop

IO is very bursty in these setups. When the router results are in you can start loading experts from SSD. In this brief moment the SSD is saturated.

Outside of that the SSD is idling.

Table 3 shows for K=4 experts an IO of 943 MB/Tok at 3.15 Tok/s giving an average IO of 2970 MB/s far below what the SSD could do.

I'm not sure, but not all expert weights are used immediately. Maybe they could do async reads for the down tensors parallelizing compute with IO.

Not sure if this works on Mac, I only tested my larger than RAM setup on Linux with io_uring O_DIRECT reads and I saw that about 20% of total reads do finish while my fused upgate matmul is already running.

Edit: Typos

Roxxik | 2 days ago | on: Flash-MoE: Running a 397B Parameter Model on a Laptop

Does an SSD meaningfully degrade by read only workloads?

Roxxik | 8 days ago | on: How I write software with LLMs

Not only you understanding the how, but you not understanding the goal.

I often use AI successfully, but in a few cases I had, it was bad. That was when I didn't even know the end goal and regularly switched the fundamental assumptions that the LLM tried to build up.

One case was a simulation where I wanted to see some specific property in the convergence behavior, but I had no idea how it would get there in the dynamics of the simulation or how it should behave when perturbed.

So the LLM tried many fundamentally different approaches and when I had something that specifically did not work it immediately switched approaches.

Next time I get to work on this (toy) problem I will let it implement some of them, fully parametrize them and let me have a go with it. There is a concrete goal and I can play around myself to see if my specific convergence criterium is even possible.

Roxxik | 5 months ago | on: The Weird Concept of Branchless Programming

I also tried myself, on different array sizes, with more iterations. The branchy version is not strictly worse.

https://gist.github.com/Stefan-JLU/3925c6a73836ce841860b55c8...

Roxxik | 1 year ago | on: Catalytic computing taps the full power of a full hard drive

They are explicitly not assuming anything about the content of the auxiliary space (full hard drive).

So the data might be incompressible and thus compressing it and restoring it afterwards would not work.

Edit: From the paper:

> One natural approach is to compress the data on the hard disk as much as possible, use the freed-up space for your computation and finally uncompress the data, restoring it to its original setting. But suppose that the data is not compressible. In other words, your scheme has to always work no matter the contents of the hard drive. Can you still make good use of this additional space?

Roxxik | 1 year ago | on: Catalytic computing taps the full power of a full hard drive

So the trick is to do the computation forwards, but take care to only use reversible operations, store the result outside of the auxiliary "full" memory and then run the computation backwards, reversing all instructions and thus undoing their effect on the auxiliary space.

Which is called catalytic, because it wouldn't be able to do the computation in the amount of clean space it has, but can do it by temporarily mutating auxiliary space and then restoring it.

What I haven't yet figured out is how to do reversible instructions on auxiliary space. You can mutate a value depending on your input, but how do you use that value, since you can't assume anything about the contents of the auxiliary space and just overwriting with a constant (e.g. 0) is not reversible.

Maybe there is some xor like trick, where you can store two values in the same space and you can restore them, as long as you know one of the values.

Edit: After delving into the paper linked in another comment, which is rather mathy (or computer sciency in the original meaning of the phrase), I'd like to have a simple example of a program that can not run in it's amount of free space and actually needs to utilize the auxiliary space.

Roxxik | 1 year ago | on: The Power of Small Brain Networks

I'm not sure that the analogy stretches so far.

What even is an artificial neuron in an Artificial Neural Network executed on "normal" (non-neuromorphic) hardware? It is a set of weights and an activation function.

And you evaluate all neurons of a layer at the same time by multiplying their weights in a matrix by the incoming activations in a vector. Then you apply the activation function to get the outgoing activations.

Viewing this from a hardware perspective, there are no individual neurons, just matrix multiplications followed by activation functions.

I'm going out of my area of expertise here, I just started studying bioinformatics, but neurological neurons can't simply hold an activation because they communicate by depolarising their membrane. So they have to be spiking by their very nature of being a cell.

This depolarization costs a lot of energy, so they are incentived to do more with less activations.

Computer hardware doesn't have a membrane and thus can hold activations, it doesn't need spiking and these activations cost very little on their own.

So I'm not sure what we stand to gain from more complicated artificial neurons.

On the other hand, artificial neutral networks do need a lot of memory bandwidth to load in these weights. So an approach that better integrates storage and execution might help. If that is memristor tech or something else.

Roxxik | 2 years ago | on: Ask HN: What's your seniority, location and notice period?

I think it is a little unusual, but nothing unheard of.

I started as a junior IT Consultant in Hamburg, Germany with a 3 month notice, but could only quit every half year. So for quitting at the end of june I had to give notice in march, and then the next opportunity was september for quitting at the end of december.

But since I'm going back to university now, we made a "Aufhebungsvertrag" so that I quit at the end of september. You have no right to it, but I have a good relation to my employer so they accepted.

EDITED for clarity

Roxxik | 7 years ago | on: So you want to be a wizard

This looks like german to me, but i haven't ever seen these words used together. I thought it might be some old german and googled it, but searching "bewunderungeifersucht" just yields your comment and "bewunderungseifersucht", which i deemed more likely, yields nothing.

Disclaimer: I'm from germany

Roxxik | 7 years ago | on: LIDL cancels SAP introduction after spending 500M Euro

minimally corrected google translation of the (short) article:

Lidl is wasting 500 million euros

Because the introduction of a new data system did not work out, Deutsche Post already had to record a high loss several years ago. The same thing happened to Lidl. After seven years and costs of more than half a billion euros, the planned system is still not running smoothly. Now the discounter has pulled the ripcord.

Lidl has been on an expansion course for years. The discount store from Neckarsulm now has branches in almost every country in Europe and is now also growing in the USA. A new merchandise management system was needed to easily keep track of the increasingly complex business processes and to control branches, purchasing and logistics. Therefor the decision in 2011.

System is not good for high-turnover countries

Software from the Walldorf-based software company SAP was to be adapted to the needs of Lidl. So far, however, the new system has only been introduced in some small agencies in Austria, Northern Ireland and the USA. It has been shown that the SAP version developed by over one hundred IT specialists is not suitable for high-turnover countries. Now Lidl has stopped the project. In one of the newspaper "Heilbronner Stimme" present letter to the coworkers it is said that the actual "goals" are "not reachable with justifiable effort". So far, according to expert opinion, the project has consumed more than half a billion euros - for expensive IT consultants and SAP licenses, for example. Now Lidl wants to further develop its own inventory management system.

Roxxik | 8 years ago | on: FreeBSD 11.1 released

had a kernel panic once, i think it was some 9.x, when unplugging a mounted usb stick (oops)

Roxxik | 8 years ago | on: Ethereum from scratch – Part 1: Ping

Just started to copycat your code to rust, not yet finished, continuing tomorrow

https://github.com/Roxxik/ping_ether/blob/master/src/main.rs

Roxxik | 8 years ago | on: Ethereum from scratch – Part 1: Ping

documentation says it all:

"bytes address; // BE encoded 4-byte or 16-byte address (size determines ipv4 vs ipv6)"[0]

[0]: https://github.com/ethereum/devp2p/blob/master/rlpx.md

Roxxik | 8 years ago | on: Ethereum from scratch – Part 1: Ping

Just today I started to look into RLPx to do some basic network discovery, so your guide is quite nice to have as a reference. But I had some difficulties getting started, because the RLPx spec was not simple to find (ethereum wiki -> devp2p -> RLPx) and is quite terse. I myself wasn't able to decode the packet structure.

Looking into ethereum related stuff I often get the feeling that one already needs to know half of ethereum quite decent to learn something new, which makes it hard to even get started.

Reading your guide (am not finished yet) gives me a better picture of the things I actually need to do, to start discovering some nodes, so I'm really thankful for that and hope for more beginner targeted ethereum content.

Roxxik | 9 years ago | on: DTrace for Linux 2016

It shouldn't be that difficult to write some simple compiler to translate a scripty language that looks like awk to BPF-Assembly (or C and stuff that into LLVM). I might look into this stuff for my Bachelor Thesis ;)

Roxxik | 9 years ago | on: Packet Capturing MySQL with Rust

I was just thinking about writing a minimal traffic-analyzer and libpnet looks way more suitable for this task than libpcap.

And adding the functionality for a pcap like fileformat doesn't seem that difficult.

The filters are a major pain point, I don't know how libpcap handles this, but at least it says it won't copy packets from kernel- to userspace that are not matching. Thus avoiding alot of overhead, maybe it's possible to introduce some rusty kind of filtering in libpnet, too.

Going to log into Github now and see if I can do something.

EDIT: fixed spelling

Roxxik | 9 years ago | on: Packet Capturing MySQL with Rust

> To run Spyglass, you need extra permissions above that of a normal user in order to capture network traffic at the data-link layer, below IP, and without having to alter or interfere with the regular data flow between the client app and database servers. We recommend running it using “sudo.”

Wouldn't it be better to use some kind of privilege separation? I think there is a reason WireShark does this... And even saying Rust is a safe Language won't save you from programming errors, it just makes them more diffcult.