eiz's comments | WingNews

eiz | 2 years ago | on: RedPajama: Reproduction of LLaMA with friendly license

https://arxiv.org/pdf/2302.13971.pdf table 15. 1770394 A100-80GB hours to train the entire model suite at the going rate for cloud 8xA100-80GBs (~$12/hr if you could actually get capacity) is ~$2.6M, under extremely optimistic assumptions. YMMV on bulk pricing ;) "the more you buy the more you save"

eiz | 2 years ago | on: What are transformer models and how do they work?

> Where is the connection between computational details and the model's high-level behavior? Do we even know?

This is an active area of study ("mechanistic interpretability") and it's very early days. For instance here's a paper I read recently that tries to explain how a very simple transformer learns how to do modular arithmetic: https://arxiv.org/abs/2301.05217

Curious what interesting results people are aware of in this area.

eiz | 2 years ago | on: What are transformer models and how do they work?

> 4. Describing positional embeddings as multiplicative. They are generally (and very counterintuitively to me, but nevertheless) additive with token embeddings.

Worth noting that rotary position embeddings, used in many recent architectures (LLaMA, GPT-NeoX, ...), are very similar to the original sin/cos position embedding in the transformer paper but using complex multiplication instead of addition

eiz | 3 years ago | on: Anthropic’s $5B, 4-year plan to take on OpenAI

Fair enough. In the GH100 architecture doc https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepa... (page 18) they do mention retaining 2 graphics-capable TPCs but it's clearly not the focus.

eiz | 3 years ago | on: Anthropic’s $5B, 4-year plan to take on OpenAI

> They do not have any graphics rendering circuitry.

What? Not having a display output is not the same as not having graphics rendering circuitry. Here's vulkaninfo from an A100 box: https://gist.github.com/eiz/c1c3e1bd99341e11e8a4acdee7ae4cb4

eiz | 3 years ago | on: The LLama Effect: Leak Sparked a Series of Open Source Alternatives to ChatGPT

Here's a couple more quotes from Yann LeCun, their Chief AI Scientist:

https://twitter.com/ylecun/status/1643945742850031616

https://twitter.com/ylecun/status/1644503237699969026

pretty clear the non-release and extremely half assed response to the leak is just CYA

eiz | 3 years ago | on: Building a Virtual Machine Inside ChatGPT

I tried this:

> Solve the following word problem step by step:

> "John is in a train moving west at ten miles per hour. Jill is in a train moving north at twenty miles per hour. How fast are John and Jill moving apart from each other?"

To solve this problem, we first need to understand the concept of speed. Speed is a measure of distance traveled over time. In this problem, we are told that John and Jill are traveling at different speeds in different directions.

Since the two trains are moving in different directions, we cannot simply add or subtract their speeds to find the combined speed at which John and Jill are moving apart from each other. Instead, we need to use the Pythagorean theorem to find the combined speed.

The Pythagorean theorem states that in a right triangle, the square of the length of the hypotenuse (the longest side of the triangle) is equal to the sum of the squares of the lengths of the other two sides. In this problem, we can consider the trains to be moving along the two sides of a right triangle, with the distance between them as the hypotenuse.

We can use the Pythagorean theorem to find the distance between John and Jill as follows:

First, we square the speed of each train and add the results:

10 mph * 10 mph + 20 mph * 20 mph = 100 + 400 = 500

Next, we take the square root of the result to find the distance between John and Jill:

sqrt(500) = 22.36 mph

Therefore, the combined speed at which John and Jill are moving apart from each other is approximately 22.36 mph.

---

Results seem pretty inconsistent though. Other attempts have given complete nonsense.

eiz | 3 years ago | on: Pet Airways

I've done this a few times too.

> There’s no way to pre-register the pet.

At least on Delta, you can. You need to call the airline and arrange it.

> Oh, and to go through security, you have to take a nervous and scared cat out of their carry case in the middle of the security line, walk through the metal detector, and then they have to test both of your hands for residue while still carrying the cat.

I always refuse the normal line and get a private screening. So far I haven't had any issues doing that.

Still, it's a huge pain in the ass.

eiz | 4 years ago | on: The K Programming Language

> This means no closures, which K implementers consider a feature (I don't).

having not touched K in about 15 years, when did this change? in k3:

    K 3.2 2004-09-23 Copyright (C) 1993-2004 Kx Systems
    LIN32 16CPU 15985MB ubuntu 0 EVAL  
    
      f:{a:x+1;{a+x}}
      g:f 1
      g
    {a+x}
      g 2
    4
      a
    value error
    a
    ^
    parse error

eiz | 5 years ago | on: Git is too hard

> If you think Chat apps are winning, let me know when you can buy an item online without an account linked to an Email. I'd love to see an example.

https://www.amazon.com/gp/help/customer/display.html?nodeId=...

eiz | 5 years ago | on: Ask HN: Who is hiring? (October 2020)

We can do full (permanent) remote.

eiz | 7 years ago | on: Audiophiles in Japan Are Installing Their Own Power Poles

> It's an HDMI cable! The video signal has CRC in it and is packetized, it's either going to make it or it isn't.

I don't disagree with your main point, but this actually isn't quite true. The HDMI signal is split into 3 distinct interleaved periods: video data, data island and control. Video data is not packetized and the only possible error detection it has is from TMDS signaling, but no such error handling is required by the TMDS spec. You can absolutely get imperfect transmission of an HDMI video signal due to cable or other electrical problems. Auxiliary packets in the data island, including audio data, do have an error correction scheme (BCH + TERC4).

Feel free to check out the spec: https://glenwing.github.io/docs/HDMI-1.4b.pdf

eiz | 8 years ago | on: Apple’s Guidelines Now Allow Executable Code in Educational Apps and Dev Tools

Last time I checked the "free" provisioning profiles it generates are valid for something like 7 days. It's not really convenient for anything but experimenting with iOS development.

eiz | 9 years ago | on: Google Fiber Was Doomed from the Start

The calculator you linked to assumes a maximum TCP window size of 64KB for everything but "replication". TCP window scaling has been on by default in every major OS for 10 years or more, allowing much greater throughput. It's true that latency sets a limit on TCP throughput but it's not nearly as bad as your calculator would indicate.

eiz | 10 years ago | on: David Bowie Has Died

Relevant: https://www.youtube.com/watch?v=Q0-51IkWpFE

eiz | 10 years ago | on: libimobiledevice – A cross-platform library to communicate with iOS devices

house_arrest used to allow direct access to the documents and container of any app. It's used by the Xcode "installed applications" list, to allow you to download and upload container contents. In 8.3 they changed it to only allow VendDocuments access to apps which actually have document sharing enabled in their Info.plist (and VendContainer access for ad-hoc provisioned apps, iirc).

Backup is done via a completely separate service.

iOS releases do tend to arbitrarily change the security policy of these services, but it's not clear that "Rootless" is anything more than business as usual. Guess we'll find out Monday.