Show HN: ML Blocks – Deploy multimodal AI workflows without code
112 points| neilxm | 2 years ago |mlblocks.com
ML Blocks is a node-based workflow builder to create multi-modal AI workflows without writing any code.
You connect blocks that call various visual models like GPT4v, Segment Anything, Dino etc. along with basic image processing blocks like resize, invert color, blur, crop, and several others.
The idea is to make it easier to deploy multi-step image processing workflows, without needing to spin up endless custom OpenCV cloud functions to glue together AI models. Usually, even if you're using cloud inference servers like Replicate, you still need to write your own image processing code to pre and post-process images in your pipeline. When you're trying to move fast, that's just unnecessary overhead.
With ML Blocks, you can build a workflow and deploy the whole thing as a single API. AFAIK, ML Blocks is the only end-to-end workflow builder built specifically for image processing.
If you're curious, our models run on Replicate, HuggingFace & Modal Labs cloud GPUs and we use React Flow for the node UX.
starwaver|2 years ago
However soon creating a "shader that works" was no longer an issue but how to create X effect using shaders was my next blocker, and luckily there were ton of YouTube tutorials on these, which was very helpful, but this continues to be a pain point even now
Since now we are in the age of AI, would it be possible to prompt something like "create me a workflow to take image A, a concept art of a character and convert into into a walking animation sprite sheet with 16 frames for each animation walking up, down, left, right and all diagonal directions" and have it not only generate the result, but a workflow to create the result so it can be edited and tweaked.
neilxm|2 years ago
One way to leverage that is building the graphs via a prompt, but another way might be to not think of the workflow as a pre-constructed graph at all. Rather perhaps we build dynamic graphs whenever you ask for a certain action - like a conversational image editing interface.
So you say something like make the woman's hair purple. We apply segmentation to the hair, and then add a puple color overlay exactly to that area.
soulofmischief|2 years ago
It was some amazing tech with a ton of applications, but sadly leadership had other plans and pivoted to a highly derivative, slapped-together AI sex bot.
jncfhnb|2 years ago
The walking animation is going to be a lost cause without specific inputs. We can do ControlNet stuff to make a character match a pose, and you can supply a series of poses that represent the walking animation.
On some level it seems silly to try and get anything to generate the workflow to do that. What you really want is a workflow to generate an image off of a pose, and then pass in the poses you want. Side tangent, I don’t know why the ai generation community has decided “workflow” is what they’re going to call “functions”?
After that your problem is that the results will be kind of meh. And that’s the brunt of where it’s at right now. You can make assets that satisfy descriptive conditions. But you can’t demand they be good. And you can’t demand they be consistent across different drawings. Can you hire an artist to fix your generated directionally correct assets? Yeah, maybe. Sounds depressing and error prone though.
echelon|2 years ago
https://github.com/comfyanonymous/ComfyUI
ljouhet|2 years ago
yanma|2 years ago
I would say that although the form factors look similar, we are operating at a different abstraction level. ComfyUI focuses on components within the HuggingFace diffusers ecosystems and allow artist to recompose different workflow to come up with amazing visual effects.
We're trying to offer a way for people to recompose apps/apis with foundation models!
sorenjan|2 years ago
itake|2 years ago
This supports other models.
sgrove|2 years ago
https://linzumi.com/
Definitely think this sort of idea could become the "serverless" equivalent for ml-using apps. I'm curious what you think re: versioning, consumption from various client languages, observability/monitoring/queueing, etc.? Feels like it could grow into a meaningful platform.
neilxm|2 years ago
Re: version / client languages etc - right now we don't have block versioning but it's definitely going to be required. As of now the blocks are each their own endpoint, by design. We're thinking about allowing people to share their own blocks and perhaps even outsource compute to endpoint providers, while we focus on the orchstration laters.
Better observability and monitoring is definitely on the docket as well. Especially because some of these tasks take a really long time - some times even going past the expiry window of the REST api. We'll be switching over to queued jobs and webhooks
syed99|2 years ago
neilxm|2 years ago
esfahani|2 years ago
neilxm|2 years ago
chaoz_|2 years ago
pj_mukh|2 years ago
AI Blocks: - Multimodal LLM (GPT4v)
- Remove objects in Images
- AI Upscale 4x
- Prompted Segmentation (SAM w/ text prompting)
Editing Blocks: - Change format
- Rotate
- Invert Color
- Blur
- Resize
- Mask to Alpha
If we've missed something please let us know, we just went through a big exercise in making sure we can quickly add new blocks.
neilxm|2 years ago
We are adding more blocks constantly. We're also considering allowing the community to push their own blocks using an open api schema.
moralestapia|2 years ago
A small suggestion, I don't think ML is a memorable term for non-technical people. I would prob. try a different name.
neilxm|2 years ago
genman|2 years ago
neilxm|2 years ago
animal_spirits|2 years ago
neilxm|2 years ago