rd11235's comments

rd11235 | 9 months ago | on: Gaussian integration is cool

Yeah - my guess is this was just a very roundabout solution for setting axis limits.

(For some reason, plt.bar was used instead of plt.plot, so the y axis would start at 0 by default, making all results look the same. But when the log scale is applied, the lower y limit becomes the data’s minimum. So, because the dynamic range is so low, the end result is visually identical to having just set y limits using the original linear scale).

Anyhow for anyone interested the values for those 3 points are 2.0000 (exact), 1.9671 (trapezoid), and 1.9998 (gaussian). The relatives errors are 1.6% vs. 0.01%.

rd11235 | 11 months ago | on: Samsung Q990D unresponsive after 1020 firmware update

Good motivation for a PSA:

This happens more and more often, and there is a fairly easy + popular workaround (which also comes with 99% ad blocking as a bonus). Just either set up pi-hole locally OR use a hosted DNS service that does essentially the same thing.

Main idea: Ads, updates, etc. typically (not always) need to resolve hosts before connecting to servers. Simply resolve these hosts to 0.0.0.0 instead of a real IP.

Arguments for pi-hole or other local solution: Free. Private.

Arguments for hosted solution: No set-up headache, no local raspberry pi or other machine to maintain. Overall a bit simpler.

Guide for blocking updates after the service is set up (I just went through this a month or two ago to block updates to my LG TV):

Step 1: Search around for servers that correspond to updates for your device.

Step 2: Test these lists; realize that they are often incomplete.

Step 3: Shut your device off. Open pi-hole like service, and watch queries live. While doing so, turn on your device (and if you have the option, check for updates).

Step 4: Put all of the queried hosts you see into your block list.

Step 5: Later, you may encounter broken functionality. When this happens, look at your logs, and see which server(s) were blocked at that moment. Remove only those from the blocklist. (And cross your fingers that the manufacturer doesn't use the same hosts for typical functionality and updates.)

rd11235 | 1 year ago | on: A year of uv: pros, cons, and should you migrate

You didn’t mention an important point: speed.

Suppose conda had projects. Still, it is somewhat incredible to see uv resolve + install in 2 seconds what takes conda 10 minutes. It immediately made me want to replace conda with uv whenever possible.

(I have actively used conda for years, and don’t see myself stopping entirely because of non Python support, but I do see myself switching primarily to uv.)

rd11235 | 1 year ago | on: Minimum effective dose

> The meticulous PT format and exercise selection allows them to achieve more muscle gain in 20 minutes per week than median trainees achieve in 2-3 hours.

This is a strong statement that is presented without evidence, and which conflicts with scientific consensus.

Studies show again and again that the most important factors are consistency and training volume.

If the program is more effective, which is questionable in itself, then the reasons are likely 1. reduction of burnout and/or 2. much more intense training. Not PT format or exercise selection.

rd11235 | 1 year ago | on: Generate audiobooks from E-books with Kokoro-82M

I agree but the opposite can be true too. Sometimes the narrator seems to target some general audience that doesn’t fit me at all, in a way that makes me cringe when I listen, until I stop listening altogether. In these cases I’d rather listen to a relatively flat narration from a tool like this.

rd11235 | 1 year ago | on: Fructose in diet enhances tumor growth: research

An obvious question that isn’t answered (in this article - not sure about the paper itself) is whether feeding fructose results in MORE tumor growth than feeding glucose (or other sources of calories)

Without knowing this, it doesn’t make any sense to assume that there is anything inherently bad about fructose, at least other than the mechanistic arguments mentioned in the article (which are weak if not backed up by empirical evidence)

rd11235 | 1 year ago | on: Francois Chollet is leaving Google

> it was just the wrong abstraction: too easy to start with, too difficult to create custom things

Couldn’t agree with this more. I was working on custom RNN variants at the time, and for that, Keras was handcuffs. Even raw TensorFlow was better for that purpose (which in turn still felt a bit like handcuffs after PyTorch was released).

rd11235 | 2 years ago | on: Understanding Automatic Differentiation in 30 lines of Python

But the chain rule for ordered derivatives is exactly the backprop rule. It's just the mathematical representation of 'the simple implementation' I mentioned.

I think what you're saying is that you find the process intuitive. I don't have much of a way to argue with that. But I think it's important to note that we're dealing with two things: 1. a process that we follow (backprop), 2. a true answer that is obtainable using only the chain rule. And yes it turns out that (1) and (2) both give the same answer. But (2) requires much more work, and I question anyone who claims that (1) is 'obvious' from (2): getting (1) from (2) requires work.

I'm guessing you'll agree that using only the chain rule takes much more work, but in case you don't: consider a fully connected graph with at least 5 variables, say a = 5; b = 2 a; c = 2 a b; d = 2 a b c; e = 2 a b c d. If you use backprop, you can compute de/da rapidly. If you use only the chain rule, it will take a long time to compute de/da, because the number of terms you have to deal with increases exponentially fast with the number of variables.

rd11235 | 2 years ago | on: Understanding Automatic Differentiation in 30 lines of Python

I don't have a great answer. Most modern descriptions are shallow and/or unclear. My favorite discussions were actually in Werbos's original papers.

A nice overview was Backpropagation through time: what it does and how to do it, 1990. The rule itself is stated very clearly there, but without proof. The proof can be found in Maximizing long-term gas industry profits in two minutes in lotus using neural network methods, 1989 (which I believe was copied over from his earlier thesis, which I could never find a copy of).

rd11235 | 2 years ago | on: Understanding Automatic Differentiation in 30 lines of Python

> chain rule is defined for partial derivatives

I agree. That's what I'm referring to as 'the ordinary chain rule'.

> so it's still technically just chain rule

No. Go try to derive backprop for general DAGs using only the chain rule. If you complete the proof, then you will agree that the proof was more elaborate than you ever expected.

rd11235 | 2 years ago | on: Understanding Automatic Differentiation in 30 lines of Python

Anyone who believes that this completes their understanding of automatic differentiation is tricking themselves.

When your graph is a TREE, then everything is very simple, as in this post.

When your graph is instead a more general directed acyclic graph (e.g., x = 5; y = 2x; z = xy), then the IMPLEMENTATION is still very simple, but understanding WHY that implementation works is not as simple (repeat: if you think it’s ‘just the ordinary chain rule’, you are tricking yourself).

One of the earliest descriptions of this was by Paul Werbos. He called the required rule “the chain rule for ordered derivatives”, which he proved by induction from the ordinary chain rule. But it is nevertheless not immediately evident from the ordinary chain rule.

I welcome anyone who believes otherwise to prove me wrong. If you do I will be very happy.

rd11235 | 2 years ago | on: How to stop the “login with Google” pop up window?

I’m surprised no one has mention AdGuard.

In settings, blocking “Annoyances” is off by default. By toggling it to true, the Google login banner will be blocked, along with the “Chrome is the best!!11” banner and others.

rd11235 | 2 years ago | on: PyTorch for WebGPU

It seems that many agree with this. At the risk of getting downvoted I want to share an opposing opinion:

This way of thinking is not just unhelpful but even harmful. If one would often benefit from these checks while coding, then they should not be relying on a type checker. They should be thinking more, and writing comments is a great way to do that.

This is especially true because many operations on ndarrays / tensors can yield perfectly valid shapes with completely unintended consequences. When comments are written reasonably well they help avoid these difficult-to-debug, correct-output-shape-but-unintended-result mistakes. Not to mention the additional clear benefit of helping one quickly re-understand the tensor manipulations when coming back to the code weeks or months later.

And more generally, if one can get in the habit of writing these comments before the code, it can help push them away from the write-quickly-now-debug-later mentality. I have seen this bite folks many times, both while teaching ugrad + grad courses and while working at large tech companies.

rd11235 | 3 years ago | on: Less gym time, same results: Why ‘lowering’ weights is all you need to do

From the actual paper:

> Methods Non-resistance-trained young adults were assigned to one of the four groups: CON-ECC (n= 14), CON (n=14) and ECC (n= 14) training groups, and a control group (n=11) that had measurements only.

Not sure why this is getting so much attention. Evidence wise it’s only a hair stronger than the anecdotal stories / advice in the top comments.

rd11235 | 4 years ago | on: Willingness to look stupid

Agree. To put it in the context of the article, if the author looked at getting desired outcomes as a video game, then this article is just a rant, and time would have been better spent analyzing the situations for how to improve. The recurring failure mode is ineffective communication.
page 1