gpuhacker's comments

gpuhacker | 1 year ago | on: Pretty.c

Reminds me of a C++ codebase I once had to inspect that was entirely written as if it were written in Java. With camelcase naming for everything, getters and setters for every class variable, interfaces everywhere.

gpuhacker | 1 year ago | on: Scuda – Virtual GPU over IP

As this mentions some prior art but not rCUDA (https://en.m.wikipedia.org/wiki/RCUDA) I'm a bit confused about what makes scuda different.

gpuhacker | 1 year ago | on: The Deadlock Empire: An Interactive Guide to Locks

It also assumes all reads and writes are volatile. In the real world, threads can witness out-of-order execution in different threads.

gpuhacker | 1 year ago | on: CPU Energy Meter: A tool for measuring energy consumption of Intel CPUs

In my opinion, Astron's PMT (Power Measurement Toolkit) is a much more useful tool than this, because it abstracts over Intel, AMD, and Nvidia (including Jetson): https://git.astron.nl/RD/pmt

There is also a paper about PMT: https://arxiv.org/pdf/2210.03724

gpuhacker | 1 year ago | on: CPU Energy Meter: A tool for measuring energy consumption of Intel CPUs

For GPUs: https://arxiv.org/pdf/2211.07260

gpuhacker | 1 year ago | on: Tesla slashes its summer internship program

All of the above

gpuhacker | 1 year ago | on: Peter principle

From the wiki page:

"In 2018, professors Alan Benson, Danielle Li, and Kelly Shue analyzed sales workers' performance and promotion practices at 214 American businesses to test the veracity of the Peter principle. They found that these companies tended to promote employees to a management position based on their performance in their previous position, rather than based on managerial potential. Consistent with the Peter principle, the researchers found that high performing sales employees were likelier to be promoted, and that they were likelier to perform poorly as managers, leading to considerable costs to the businesses.[15][16][2]"

gpuhacker | 2 years ago | on: TextSnatcher: Copy text from images, for the Linux Desktop

Surprises me to see I'm the first comment here to say: I just use GPT4 for this. Works perfectly, even for getting the Latex source of a formula you only have a screenshot of.

Probably quite the overkill in terms of energy efficiency for just image to text, but I only need this like once every two weeks or so.

gpuhacker | 2 years ago | on: California debuts 'turbo roundabout' to fix intersection near Bay Area

In the city where I live, also in the Netherlands, we had multiple of these but all of them got converted back to normal intersections with traffic lights and it has been a huge improvement.

gpuhacker | 2 years ago | on: High school student allegedly uses device to turn off nearby iPhones

Brings back high school memories when we used to shutdown peoples mIRC client with a special message that you could send as a personal message to them.

I don't know it anymore but I remember it started with a whole bunch of aaaaaaa, but also included other characters. The beauty is the attack left no trace at all, so they never knew what hit them.

gpuhacker | 2 years ago | on: Ask HN: Why aren't all heaters computers?

There was a company in the Netherlands, can't seem to find the name right now, that rented out GPU clusters as central heaters, while using the GPUs to mine crypto. I believe they went backrupt during the whole crypto crash and energy crisis.

gpuhacker | 2 years ago | on: Zombie fly fungus lures healthy male flies to mate with female corpses (2022)

I think they refer to Lich. From wikipedia: In fantasy fiction, a lich (/ˈlɪtʃ/;[1] from the Old English līċ, meaning "corpse") is a type of undead creature. https://en.m.wikipedia.org/wiki/Lich

gpuhacker | 2 years ago | on: AST-grep(sg) is a CLI tool for code structural search, lint, and rewriting

Does anyone happen to know of a similar tool that can compare two codes for semantic similarity?

gpuhacker | 2 years ago | on: Optimization Techniques for GPU Programming [pdf]

It's incredibly useful if you have many threads that produce a variable number of outputs. Imagine you're implementing some filtering operation on the GPU, many threads will take on a fixed workload and then produce some number of outcomes. Unless we take some precautions, we have a huge synchronization problem when all threads try to append their results to the output. Note that GPUs didn't have atomics for the first couple of generations that supported CUDA, so you couldn't just getAndIncrement an index and append to an array. We could store those outputs in a dense structure, allocating a fixed number of output slots per thread, but that would leave many blanks in between the results. Now once we know the number of outputs per thread we can use a prefix sum to let every thread know where they can write their results in the array.

The outcome of a prefix sum exactly corresponds with the "row starts" part of the CSR sparse matrix notation. So they are also essential when creating sparse matrices.

gpuhacker | 2 years ago | on: Optimization Techniques for GPU Programming [pdf]

I haven't tried WebGPU yet, is there an overall performance hit compared to direct CUDA programming?

AFAIK Thrust is intended to simplify GPU programming. It could well be that for specific use cases, in particular when it is possible to fuse multiple operations into single kernels, you could outperform Thrust.

gpuhacker | 2 years ago | on: Optimization Techniques for GPU Programming [pdf]

Ah thanks! That's good to know.

gpuhacker | 2 years ago | on: Optimization Techniques for GPU Programming [pdf]

I bought the first edition when it came out, and definitely it was a gold mine of information on the subject. I wonder though, is the fourth edition worth buying another copy? Nvidia has been advancing CUDA, in particular moving more towards C++ in the kernel language. But none of that was present when this book came out in 2007. Now more and more stuff is happening at thread block level with the cooperative group C++ API and warp level for tensor cores. It would be great if the authors revisited all the early chapters to modernize that content, but that's a lot of work so I don't usually count on authors making such an effort for later editions.

gpuhacker | 2 years ago | on: Optimization Techniques for GPU Programming [pdf]

If you want to go really in-depth I can recommend GTC on demand. It's Nvidia streaming platform with videos from past GTC conferences. Tony Scuderio had a couple of videos on there called GPU memory bootcamp that are among the best advanced GPU programming learning material out there.

gpuhacker | 2 years ago | on: When did people stop being drunk all the time?

As a home brewer myself, I'd say it's not harder to make just more expensive.

gpuhacker | 2 years ago | on: When did people stop being drunk all the time?

It does. There's an overuse of graphs and especially the lack of units and y-axis labels on some of the graphs was annoying, but overall it's still a quite interesting and entertaining read in my opinion.