top | item 42421521

(no title)

C-programmer | 1 year ago

> I genuinely believe Groq is a fraud. There is no way their private inference cloud has positive gross margins.

> Llama 3.1 405B can currently replace junior engineers

I'd like more exposition on these claims.

discuss

order

Workaccount2|1 year ago

Not Llama but with Sonnet and O1 I wrote a bespoke android app for my company in about 8 hours of work. Once I polish it a bit (make a prettier UI), I'm pretty sure I could sell it to other companies doing our kind of work.

I am not a programmer, and I know C and Python at about a 1 day crash course level (not much at all).

However with sonnent I was able to be handheld all the way from downloading android studio to a functional app written in kotlin, that is now being used by employees on the floor.

People can keep telling themselves that LLMs are useless or maybe just helpful for quickly spewing boilerplate code, but I would heed the warning that this tech is only going to improve and already helping people forgo SWE's very seriously. Sears thought the internet was a cute party trick, and that obviously print catalogs were there to stay.

kortilla|1 year ago

This is meaningless without talking about the capabilities of the app. I’ve seen examples of this before where non-programmers come up with something using an LLM that could just be a webpage with camera access and some javascript

refulgentis|1 year ago

Today, I wrote a full YouTube subtitle downloader in Dart. 52 minutes from starting to google anything about it, to full implementation and tests, custom formatting any of the 5 obscure formats it could be in to my exact whims. Full coverage of any validation errors via mock network responses.

I then wrote a web AudioWorklet for playing PCM in 3 minutes, which complied to the same interface as my Mac/iOS/Android versions, ex. Setting sample rate, feedback callback, etc. I have no idea what an AudioWorklet is.

Two days ago, I stubbed out my implementation of OpenAI's web socket based realtime API, 1400 LOC over 2 days, mostly by hand while grokking and testing the API. In 32 minutes, I had a brand spanking new batch of code, clean, event-based architecture, 86% test coverage. 1.8 KLOC with tests.

In all of these cases, most I needed to do was drop in code files and say, nope wrong a couple times to sonnet, and say "why are you violating my service contract and only providing an example solution" to o1.

Not llama 3.1 405B specifically, I haven't gone to the trouble of running it, but things turned some sort of significant corner over the last 3 months, between o1 and Sonnet 3.5. Mistakes are rare. Believable 405B is on that scale, IIRC it went punch for punch with the original 3.5 Sonnet.

But I find it hard to believe a Google L3, and third of L4s, (read: new hires, or survived 3 years) are that productive and sending code out for review at a 1/5th of that volume, much less on demand.

So insane-sounding? Yes.

Out there? Probably, I work for myself now. I don't have to have a complex negotiation with my boss on what I can use and how. And I only saw this starting ~2 weeks ago, with full o1 release.

Wrong? Shill? Dilletante?

No.

I'm still digesting it myself. But it's real.

nightski|1 year ago

Most software is not one off little utilities/scripts, greenfield small projects, etc... That's where LLMs excel, when you don't have much context and can regurgitate solutions.

It's less to do with junior/senior/etc.. and more to do with the types of problems you are tackling.

lz400|1 year ago

I don't understand what you guys are doing. For me sonnet is great when I'm starting with a framework or project but as soon as I start doing complicated things it's just wrong all the time. Subtly wrong, which is much worse because it looks correct, but wrong.

tonetegeatinst|1 year ago

YouTube dlp has a subtitle option. To quote the documentation: "--write-sub Write subtitle file --write-auto-sub Write automatically generated subtitle file (YouTube only) --all-subs Download all the available subtitles of the video --list-subs List all available subtitles for the video --sub-format FORMAT Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best" --sub-lang LANGS Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags"

xbmcuser|1 year ago

I agree with you that they are improving not being a programer I can't tell if the code has improved but as user that uses chat gpt or Google gemini to build scripts or trading view indicators. I am seeing some big improvements and many times wording it better detail and restricting it from going of tangent results in working code.

az226|1 year ago

Show the code

idiocache|1 year ago

Can you briefly describe your work flow? Are you exchanging information with Sonnet in your IDE?

throwawaymaths|1 year ago

> groq

i went to a groq event and one of their engineers told me they were running 7 racks!! of compute per (70b?) model. that was last year so my memory could be fuzzy.

iirc, groq used to be making resnet-500? chips? the only way such an impressive setup makes any kind of sense (my guess) would be they bought a bunch of resnet chips way back when and now they are trying to square peg in round hole that sunk cost as part of a fake it till you make it phase. they certainly have enough funding to scrap it all and do better... the question is if they will (and why they haven't been able to yet)

wmf|1 year ago

Yes, Groq requires hundreds or thousands of chips to load an LLM because they didn't predict that LLMs would get as big as they are. The second generation chip can't come soon enough for them.

arisAlexis|1 year ago

also, he believes cerebras is shit but also cerebras runs llama the most efficiently and at top speed. biased ^10

throwawaymaths|1 year ago

So I interacted with people at cerebras at a tradeshow and it seems like you have to have extremely advanced cooling to keep that thing working. IIRC the user agreement says "you can't turn it off or else the warranty is void". With the way their chip is designed, I would be strongly worried that the giant chip has warping issues, for example, when certain cores are dark and the thermal generation is uneven (or, if it gets shut down on accident while in the middle of inferencing an LLM). There may even be chip-to-chip variation depending on which cores got dq'd based on their on-the-spot testing.

Already through the gapevine I'm hearing that H100s and B100s have to be replaced more often.... than you'd want? I suspect people are mum about it otherwise they might lose sweetheart discounts from nvidia. I can't imagine that cerebras, even with their extreme engineering of their cooling system, have truly solved cooling in a way that isn't a pain in the ass (otherwise they wouldn't have the clause?) and if I were building a datacenter I would be very worried about having to do annoying and capital intensive replacements.