suyjuris's comments

suyjuris | 8 months ago | on: Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

Yes, of course! In this case, the judge identified three separate instances of copying: (1) downloading books without authorisation to add to their internal library, (2) scanning legitimately purchased books to add to their internal library, and (3) taking data from their internal library for the purposes of training LLMs. The purchasing part is only relevant for (2) — there the judge ruled that this is fair use. This makes a lot of sense to me, since no additional copies were created (they destroyed the physical books after scanning), so this is just a single use, as you say. The judge also ruled that (3) is fair use, but for a different reason. (They declined to decide whether (1) is fair use at this point, deferring to a later trial.)

suyjuris | 8 months ago | on: Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

Yes, that would by an interesting trial. But it is only about six books, and all claims regarding Claude have been dismissed already. So only the internal copies remain, and there the theory for them being infringing is somewhat convoluted: you have to argue that they are not just for purposes of training (which was ruled fair use), and award damages even though these other purposes never materialised (since by now, they have legal copies of those books). I can see it, but I would not count on there being a trial.

suyjuris | 8 months ago | on: Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

The judge appears to disagree with you on this. They found that training and selling an LLM are fair use, based on the fact that it is exceedingly transformative, and that the copyright holders are not entitled to any profits thereof due to copyright. (They also did get paid — Anthropic acquired millions of books legally, including all of the authors in this complaint. This would not retroactively absolve them of legal fault for past infringements, of course.)

suyjuris | 8 months ago | on: Anthropic cut up millions of used books, and downloaded 7M pirated ones – judge

Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.

suyjuris | 8 months ago | on: Ask HN: Who wants to be hired? (July 2025)

  Location: Munich, Germany
  Remote: Flexible
  Willing to relocate: Yes
  Technologies: Python, C++
  Résumé/CV: https://nicze.de/philipp/cv.pdf
  Email: [email protected]
I am currently finishing a PhD in theoretical computer science and looking for a position in industry. I do a lot of programming, usually in Python and C++, and with a mathematical or algorithmic focus. For example, as part of my research I developed a BDD library that uses complexity theory to guarantee correctness with small overhead. Or recently, in my spare time, I created an AI for MTG (drafting, to be precise), using a transformer architecture with PyTorch. I am very enthusiastic about using or coming up with theoretical ideas that can improve outcomes in practice. I rely on having strong fundamentals, from mathematics over algorithms to knowledge of the actual hardware, to enable me to easily pick up new areas and tackle novel problems. Most of my research is the result of collaborations — it is important to me to both support my colleagues and be able to learn from them.

Please feel free to refer to my CV for more detailed information, or my website (https://nicze.de/philipp/). Currently, I am very interested in AI, but I am also happy to discuss any other interesting opportunities!

suyjuris | 1 year ago | on: Waymos crash less than human drivers

If you are caught driving above the legal limit of 0.05% you are fined roughly $570, are prohibited from driving for 1 month, and receive 2 “points”. Points accumulate and once you reach 8 you lose your drivers license. In this case you would keep the points for five years. Many different driving offences give you points.

For comparison, to get a similar penalty by speeding you would have to exceed the speed limit by 51 km/h (32 mph).

There are many additional related offences you could commit, with different consequences. Repeat offences to the above, for example, are punished more severely: you get 3 months instead of 1 and the fine is doubled and tripled for the second and third offence, respectively. Already with a blood alcohol level of 0.03% you risk legal consequences, e.g. if you make an error while driving. If you endanger someone else (or property) with that level you are committing a crime, will lose your license, and can go to prison. If you are in your probationary period (two years after acquiring your license), any nonzero level is an offence.

Losing your license is generally temporary. You are blocked from re-acquiring it for some time, depending on the offence (at least 6 months, but can be multiple years). You have to complete an MPU, which certifies your ability to safely drive. For alcohol based offences, this would include demonstrating that you have reduced your consumption significantly. This can be quite harsh; you may, for example, be required to show complete abstinence for a period of one year. Of course, you are also looking at costs close to $1000 for the MPU alone. It is possible to get permanently blocked from driving, but it's quite difficult, I believe.

suyjuris | 1 year ago | on: 18 Months with a Framework 13

I have a Framework 13 (AMD 7640U), running Arch Linux, and overall it is nice. It is convenient to have an HDMI port again and I did not encounter issues with hardware. (Sleep drains the battery somewhat quickly over longer time periods, but I put it in hibernate anyway for that.)

However, the screen broke down after a few months and support initially refused to replace it, citing “customer induced damage”. As far as I can tell, this is both untrue and illegal (under German law, within 12 months after purchase all defects are presumed to be due to the original condition of the product, for which the seller is liable; that presumption can be overcome, but you would need some reason). They relented eventually, but it certainly soured my opinion on both the product and the company.

suyjuris | 2 years ago | on: Great ideas in theoretical computer science

Let us say that an index i is bad, if every finite subsequence of s starting at i is red (i.e. for every j ≥ i we have χ(s_i ... s_j) = red). Two cases:

Case 1: there are infinitely many bad indices. Here we go to the first bad index then the second, and so on. The colour of w₀ does not matter, and since subsequent words start at a bad index, they will all be red.

Case 2: there are finitely many bad indices. Then there is some k which is larger than all bad indices. We start by going to k (again, the colour of w₀ does not matter). Since k is not bad, there must be some blue word starting at k. We take that one and move to a larger index. Again, that index is not bad. We repeat this process to find our sequence.

suyjuris | 2 years ago | on: German court prohibits LinkedIn from ignoring "Do Not Track" signals

The full decision can be found here [1]. The consumer protection agency did also seek that LinkedIn be forced to respect DNT, but the court did not grant this relief, reasoning that it was overly broad in two ways. First, it did not specify precisely enough what is meant by DNT — in particular, the suit did not limit itself to the DNT header, but referred to any kind of configured signals sent by the browser. Second, it described the behaviour that LinkedIn is supposed to cease when encountering such a signal in an overly broad manner.

If upheld, the judgement certainly seems to open the door for future litigation, and one might even hope for potential targets to adjust their behaviour in anticipation of it, but I would not hold my breath there.

[1] https://www.vzbv.de/sites/default/files/2023-10/23-10-10_Stn...

suyjuris | 2 years ago | on: EU Grabs ARM for First ExaFLOP Supercomputer

The name of the city is “Garching bei München” which translates to “Garching near Munich”. This disambiguates it from „Garching an der Alz“. (Although Jülich is just called Jülich.)

suyjuris | 2 years ago | on: CCC Invites to the 37th Chaos Communication Congress in Hamburg

I use these as well (3M Aura). They are much more comfortable than the more common types of FFP2 masks and I can wear them for prolonged periods of time without issue, but I would not classify them as “no discomfort whatsoever”. Also, they have a very good fit – other FFP2 masks I had to fiddle around with quite a bit to get a good seal around my nose.

suyjuris | 2 years ago | on: The odd appeal of absurdly long YouTube videos

Someone I know wanted to have a movable white rectangle on the screen to cover up things (for a presentation). They had a creative solution: open a “10h white background” video and used Firefox's picture-in-picture feature. Unfortunately, the recommendation algorithm picked up on this, and started recommending a bunch of similar videos...

suyjuris | 2 years ago | on: Problems harder than NP-Complete

You are thinking of the union, where you can indeed just put the two NFAs “side-by-side”. This does not work for the intersection though, there you need the product construction.

suyjuris | 3 years ago | on: Joint statement by the Department of the Treasury, Federal Reserve, and FDIC

The government needs to get its money from somewhere. If it spends the taxpayers' money, that money cannot be spent on other things. So instead of doing things that are useful to society, like maintaining roads, the money is just sitting there until the bond matures. If it creates money out of thin air, the effect is the same, except that now every market participant pays (in the form of increased inflation). So in either case, the losses are socialised.

Of course, you can still argue that stabilising the system is worth it.

suyjuris | 3 years ago | on: “Clean” code, horrible performance

It is also important to consider that better performance also increases your productivity as a developer. For example, you can use simpler algorithms, skip caching, and have faster iteration times. (If your code takes 1min to hit a bug, there are many debugging strategies you cannot use, compared to when it takes 1s. The same is true when you compare 1s and 10ms.)

In the end, it is all tradeoffs. If you have a rough mental model of how code is going to perform, you can make better decisions. Of course, part of this is determining whether it matters for the specific piece of code under consideration. Often it does not.

suyjuris | 3 years ago | on: Why SAT Is Hard

I would guess that this can work, but it seems impossible to prove. I do not have any good candidates for an NP problem that is provably harder than SAT for a deterministic Turing machine, though I would not be surprised if some tricky diagonalisation argument works.

suyjuris | 3 years ago | on: Why SAT Is Hard

> It is of course possible to construct a problem that will require strictly more steps than the most efficient algorithm for SAT; for example, we could give a language SAT-PRIME = {<φ,n>: φ is satisfiable and n is prime}, which will take longer to decide than SAT.

A minor point, admittedly, but the second part of this statement is not true. Running time is w.r.t. the length of the input, and <φ,n> is longer than just φ. Deciding whether n is prime is easy, so per bit of input the algorithm is faster.

suyjuris | 3 years ago | on: I analyzed shuffling in a million games of MtG Arena (2020)

An easy argument why it cannot work: consider an array with 3 elements, [a,b,c]. “Shuffling” it could look as follows.

    Step 1. Swap a, c -> [c,b,a]
    Step 2. Swap b, c -> [b,c,a]
    Step 3. Swap a, a -> [b,c,a]
What is the probability that we do exactly these steps? At each step, we have 3 choices, so (1/3) * (1/3) * (1/3) = 1/27. What is the probability that we end up with [b,c,a] ? You might think 1/27 as well, but that is not quite true – it is possible, that we choose different steps, but end up with the same result. For example, we can do [a,b,c]->[b,a,c]->[b,c,a]->[b,c,a]. But the probability will always be a multiple of 1/27 – it is just 1/27 times the number of possible paths that leads to [b,c,a].

Now, what should the probability be? There are exactly 6 ways to shuffle [a,b,c] (this is the number of permutations, 3! = 3 * 2 * 1 = 6). So we want to get [b,c,a] with a probability of 1/6. But 1/6 is not a multiple of 1/27 ! (You can see that by looking at the equation 1/6 = x/27, which is the same as x = 27 / 6 = 4.5 .)

The same argument works for any length n > 2, as n*n is not divisible by n-1, but n! is.

page 1