Deepseek is the real "open<something>" that the world needed. Via these three projects, Deepseek has addressed not only efficient AI but also distributed computing:
how many companies will actually adopt 3FS now that it's open source?
not a hater, just know that theres a lot of hurdles to adoption even if something if open source - for example not being an industry standard. i dont know a ton about this space - what is the main alternative?
Was it already happening when platforms started supporting stuff like Iceberg? But is kinda nice to see things like Snowflake have definitely their place on the ecosystem but too often at margins especially with huge workloads Snowflake creates more issues than solves them
These type of models need to be trained across thousands of GPUs, which requires distributed engineering on a much higher level than "normal" distributed systems.
This is true for DeepSeek as well as for others. There are a few companies giving insights or open-sourcing their approaches, such as Databricks/Mosaic and, well, DeepSeek.
The latter also did some particularly clever stuff, but if you look into details so did Mosaic.
OpenAI and Anthropic likely have distributed tools of even larger sophistication. They are just not open source.
spark is getting a bit long in the tooth.. interesting to see duckdb integrated with Ray for data-access partitioning across (currently) 3FS. probably a matter of time before they (or someone) supports S3. It should be noted that duckdb (standalone) actually does a pretty good job scanning s3 parquet on its own.
OutOfHere|1 year ago
1. smallpond: https://github.com/deepseek-ai/smallpond
2. 3fs: https://github.com/deepseek-ai/3FS
3. deepep: https://github.com/deepseek-ai/DeepEP
swyx|1 year ago
not a hater, just know that theres a lot of hurdles to adoption even if something if open source - for example not being an industry standard. i dont know a ton about this space - what is the main alternative?
dkdcwashere|1 year ago
jakozaur|1 year ago
https://news.ycombinator.com/item?id=43200793
https://news.ycombinator.com/item?id=43232410
ogarten|1 year ago
Not saying this is bad, but it's just interesting to see after being in the industry for 8 years.
antupis|1 year ago
nemo44x|1 year ago
this_user|1 year ago
2. The distributed technology is powerful but complex, and most user don't need most of what it offers. Let's build a simple solution.
3. GOTO 1
calebm|1 year ago
biophysboy|1 year ago
benrutter|1 year ago
- Created comparable LLM performance for a fraction of the cost of OpenAI using more off-the-shelf hardware.
- Seem to be open sourcing lots of distributed stuff.
My question is, are those two things related? Did distributed computing allow the AI model somehow? If so how? Or is it not that simple?
zwaps|1 year ago
This is true for DeepSeek as well as for others. There are a few companies giving insights or open-sourcing their approaches, such as Databricks/Mosaic and, well, DeepSeek. The latter also did some particularly clever stuff, but if you look into details so did Mosaic.
OpenAI and Anthropic likely have distributed tools of even larger sophistication. They are just not open source.
maknee|1 year ago
A lot of blogs praise these new systems, but don't really provide any numbers :/
cmollis|1 year ago
unknown|1 year ago
[deleted]