top | item 42378994

Devin is now generally available

155 points| neural_thing | 1 year ago |cognition.ai | reply

132 comments

order
[+] winkle|1 year ago|reply
First place I usually go is the terms of service and what they are granting themselves rights to. Not excited about how broad this is "3.2 License: By using the Services, you hereby grant to Cognition, its affiliates, successors, and assigns a non-exclusive, worldwide, royalty-free, fully paid, sublicensable, transferable license to reproduce, distribute, modify, and otherwise use, display, and perform all acts with respect to the Customer Data as may be necessary for Cognition to provide the Services to you."
[+] CaptainFever|1 year ago|reply
"as may be necessary for Cognition to provide the Services to you" kind of makes sense IMO. Does that mean they'll only use the license (note: they only get a license, not ownership) to provide services to you? Is it a restriction?
[+] bigs|1 year ago|reply
I always wonder how enforceable these blanket rights would be in court. Didn’t Meta claim to own end users’ photos in the T&Cs back around 2009 and it got challenged and shot down (ianal)?
[+] Topfi|1 year ago|reply
No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.

What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.

Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.

If this were e.g. Anthropic launching a new beyond Opus size model that was still performant and came with "chain-of-thought" capabilities, a far more extensive context window that still fully passes needle in haystack and is absolutely solid in sourcing from provided files, keeps on track even when provided with large documents, has few or no restrictions on usage and comes with extensive, verifiable benchmarks that showcase this offering being a significant upgrade over other models, maybe such a price could be justified.

You know why Cognition? Because they haven’t actively lied. What they did instead was let people use their models and actually test the advantages. Even Claude Instant way back when had certain use cases that made them have their own niche and showed they could execute before expanding with 2 and the larger context, then 3 with more applications. You never did any of that, you never gave anyone reason to believe what you claim, you didn’t even release benchmarks. See the difference?

Seems more like a simple cash grab, attempting to ride the O1 wave. OpenAI has a hard time justifying their Pro pricing, you doubling that makes this an out of season April fools joke. Waiting for the inevitable reporting that this is just another API wrapper for Claude or ChatGPT with our old faithful RAG.

[0] https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gY...

[+] preommr|1 year ago|reply
From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".

But these are the kinds of problems that help shape the product. The software archictecture should be a compression of a deep and intuitive understanding of the problem space. How can you develop that knowledge if you're just delegating it to a black box that can't operate at a near-human level?

I've used ai based tools to great success, but on an ad-hoc basis, for specific and small functions or modules. To do the integration part requires an understanding of what abstraction is appropriate where. I don't think these tools are good that.

[+] cowsup|1 year ago|reply
Good software can be art. And like all art, we have hit the stage in which code can also be cranked out en masse, thoughtlessly, for a quick buck. It was only inevitable.
[+] a-arbabian|1 year ago|reply
Mike from Vesta (first demo video) claims Devin saved "at least a hundred hours" debugging API integrations. That seems crazy to me - API integrations rarely take that long, and any engineer would spot issues like wrong API keys almost immediately. The tool might be more valuable for non-engineers creating initial drafts, but by the time you've written all the detailed specs for Devin, a mid-level engineer could have made significant progress on the task.
[+] jlund-molfese|1 year ago|reply
I wish API integrations never took that long! But it's dependent on who you're integrating with and what your product looks like. I'm the engineering manager of the payroll integrations team at a company that does workplace savings plans.

Sometimes even when you're making calls to dozens of different endpoints they're easy, but other times, you end up guessing at how to access undocumented functionality within a GraphQL API that has introspection turned off, or working around entity modeling that's completely different from your system and requires a lot of translation. Or you work with an API whose indexes variably start from 1, 0, -1, and -2 in different endpoints. These generally aren't hard technical challenges to solve, and something like Devin that could take care of most surface-level problems you see while integrating with some XML API from 2007 would be welcome.

There are companies like https://www.tryfinch.com and https://www.merge.dev that try to solve these issues, but their abstractions also reduce flexibility and aren't a perfect for all HRIS integration use cases right now.

[+] mike_yu|1 year ago|reply
clearly nobody else has spent all the time i have integrating really old mortgage software :(
[+] AlwaysRock|1 year ago|reply
Debugging is a pretty vague word. I know a LOT of api endpoints with shit documentation. Could Devin generate documentation for a vast number of api endpoints that could have theoretically taken a hundred hours to write?
[+] paradite|1 year ago|reply
The trend of AI tools to make a bold claim at launch, just have lots of caveats caveats caveats caveats when actually releasing to public.
[+] Yusefmosiah|1 year ago|reply
Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.

In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.

I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.

My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?

[+] cbhl|1 year ago|reply
This predates the o1 release, but the folks behind Devin did do some early evaluation of o1 vs 4o vs Devin back in September:

https://x.com/cognition_labs/status/1834292718174077014

I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.

[+] gexla|1 year ago|reply
Should have come with a prominent warning at the app site that you're heading towards a $500 sub. I'm sure it's mentioned in places I didn't see it. Ideally, you would agree to the sub before you even create an account. This could save LOADS of signups from people who aren't your intended users.
[+] anticensor|1 year ago|reply
They have a $50 tier too, but that one is not currently open to new members.
[+] mfdupuis|1 year ago|reply
I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.

That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.

[+] waldenyan20|1 year ago|reply
hey guys - Walden here, one of the founders. Excited to have you try out Devin. Reach out here if you have any questions!
[+] Buttons840|1 year ago|reply
Hi Walden,

my name is Devin and I don't like sharing a name with a product. Will you please consider changing the name?

There is always the chance that someone named Devin will do something that gives your product a bad name. Perhaps some new scandal will involve someone named Devin or something.

I'd also like you to imagine that a hot new erotic AI was named "Walden", and people said things like "I was talking with Walden last night" as a euphemism. How would that make you feel?

[+] mrieck|1 year ago|reply
I'd try it out if you allowed paying $50 for some credits instead of requiring subscription.

Even if that version is limited to only editing public Github repos. $500 to see how well it works is too much.

[+] badFEengineer|1 year ago|reply
The price seems reasonable, but my main hesitation is on data storage + third party providers- there doesn't seem to be much available information on:

* will you store my code + train on workflows that Devin does for me? * are you piping data to other third party providers (i.e. anthropic, openAI)?

[+] JTyQZSnP3cQGa8B|1 year ago|reply
Why don't any LLM show examples of C++ applications? I have yet to see a tool like that which I would be happy to use at work.
[+] cloudking|1 year ago|reply
When crafting projects from scratch, does your system actually fix it's own errors?

That seems to be the challenge with Cursor Agent in it's current form, it generates a bunch of code that has bugs and requires a lot of iteration.

[+] swyx|1 year ago|reply
as someone who has been trying you guys out for the past 8 months... you need a speed lever. default devin is way too slow for me :/ i asked scott for a "demo mode" first time we met
[+] anticensor|1 year ago|reply
You should really add an option to spawn a VM with immutable rootfs, current VMs all have writable rootfs which cost a lot to run, immutable VMs could be much much cheaper to operate (possibly enabling free tiers even).

Also to mention, "suggest knowledge" modal is broken (it silently ignores changes made if you edit the suggested knowledge).

Another issue, sleep&snapshot system is still prone to race conditions in certain cases.

[+] k2xl|1 year ago|reply
What model does it use under the hood?

How much context window does it load when it is solving tasks?

How does it determine which files to load into context?

[+] thekevan|1 year ago|reply
Can you only use it with a $500 / month subscription?

The word "try" is VERY different than the actual case, which is "pay for use".

If the answer to the first line is yes, how do I request my email be deleted? I started to sign up but I am not a use case for $500 a month at the moment.

[+] adamgordonbell|1 year ago|reply
I'm excited to try it. I use aider quite a bit and tried opendevin at some point.

What is the pricing story?

Can I use it as side project dev or is the target enterprise customers only / mainly?

[+] tsak|1 year ago|reply
Is it just me that finds it ironic that you're looking for software developers?
[+] anticensor|1 year ago|reply
Hey, can you fix the issue where the editor times out and Devin gets stuck?
[+] thekhatribharat|1 year ago|reply
How does one estimate the number of ACUs required to finish a task?
[+] yuppiemephisto|1 year ago|reply
Does it work with more obscure languages like Lean 4?
[+] throw83288|1 year ago|reply
Not really product related: The current trajectory of LLMs/Agents, what is your career advice to someone in school for Computer Science right now?
[+] adamgordonbell|1 year ago|reply
It seems like a lot of the magic is providing LLMs with tools that let it work like a human would. This approach makes more sense to me then the model of expecting an LLM to just emit a giant block of code for a change, given a pile of RAG context.

( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )

[+] binarynate|1 year ago|reply
Am I the only one who laments this trend of using a common first name as a product name? When I see this, my first reaction is that the company lacks any empathy for people who have the name they're co-opting.

https://www.washingtonpost.com/technology/interactive/2021/p...

https://archive.is/w8r58

[+] slickdork|1 year ago|reply
As someone named Devin who works in tech, I greatly hope this project fails. :)
[+] arockwell|1 year ago|reply
100% agree. It is shitty and rude. Not to mention it does not even make sense.
[+] alexjplant|1 year ago|reply
The short version of my name is one letter away from "Alexa". You can imagine how many comments and jokes about Amazon's AI assistant I've been party to for the past decade. Although it may be hard for you to believe I actually don't really care, much as you probably don't care about the hot dogs bearing your name that you see when you walk down the cold aisle in the grocery store. Should they instead call the anthropomorphized AI assistant something like "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?
[+] zamadatix|1 year ago|reply
I think it's different when the product is an tool you call by name to use vs just the name of the tool. E.g. the article is about "Alexa" and I'm not sure most people even realize there are ways to use it without saying "Hey Alexa" every time. Without that type of callback association it's not a very serious concern.
[+] mewpmewp2|1 year ago|reply
I don't care about it potentially being a real name, because I doubt it would be a household item, but somehow the name itself for this particular product seems offputting.

If it had to be a name for a product, it seems like to give me some sort of cheap male grooming or AXE body spray product vibes.

[+] debacle|1 year ago|reply
I couldn't find anywhere a list of languages that this tool supports. What makes this tool better than e.g. cursor?
[+] didip|1 year ago|reply
Aren't you guys afraid that Copilot will simply crushed you? They have all the training data afterall.
[+] anticensor|1 year ago|reply
Can you also add Discord, Telegram, Gitlab, Forgejo integrations for those whose use them for their software development discussions?
[+] Oras|1 year ago|reply
> Small frontend bugs and edge cases - tag Devin in Slack threads

And other points where it should shine. How does it compare to using Cursor? Is it the slack integration?

[+] allusernamesare|1 year ago|reply
How does Devin compare to lovable.dev ? I've been thoroughly impressed by their ability to build and host functioning apps from very basic prompts.
[+] daft_pink|1 year ago|reply
Is there any evidence this works better than Claude 3.5?
[+] WesleyJohnson|1 year ago|reply
Any plans or capabilities for something local? Not a locally hosted Devin, mind you, but a way to interact with on-prem source control repos?
[+] nextworddev|1 year ago|reply
Devin really wasted a lot of time going GA because they lost a lot of their initial buzz
[+] DidYaWipe|1 year ago|reply
Might be an interesting headline if it said what "Devin" is.