Apple releases MGIE, an AI-based image editing model

[+] nickthegreek|2 years ago|reply

I was not expecting for Apple to be releasing any open source generative ai product. So the big players in the download it yourself gang are Meta & Apple? I don't think anyone would have taken that bet when OpenAI blew up.

[+] jedberg|2 years ago|reply

Apple makes a ton of sense. They sell hardware (and services for that hardware). It would make sense that they want to commoditize foundation models while at the same time creating yet another reason to buy their most powerful hardware.

Meta is the real wildcard here. They're betting on "commoditize your compliment" and hoping by making GenAI training nearly free (by doing it for you) that none of their competitors can use GenAI models to gain an advantage over them.

[+] frankfrank13|2 years ago|reply

They do have quite a few open source projects, I don't think they would open source something like LLama, but something on this scale makes sense

[+] gnicholas|2 years ago|reply

The Github repo is here: https://github.com/apple/ml-mgie

[+] MrThoughtful|2 years ago|reply

I have an AWS account. Is there a type of EC2 instance I can fire up and run this thing on?

[+] barrkel|2 years ago|reply

The code repo this article is based on - https://github.com/apple/ml-mgie - is based around InstructPix2Pix (which is based on Stable Diffusion) and LLaVA (multi-modal LLM that supports vision input).

It's research code, not a product. I don't think Apple would ever productionize anything with data lineage from Stable Diffusion.

[+] achow|2 years ago|reply

Demo:

https://huggingface.co/spaces/tsujuifu/ml-mgie

(Long queue, I could not test it)

[+] devinprater|2 years ago|reply

This is going to be really cool. Being able to have pictures described, with AI, and then edit them using text, like underline this part of the screenshot, or something like that. Especially with Llava 1.6, and Apple's other model where it understands spatial parts of images, this should be pretty possible for me, as a blind person, to do. So yeah I think I'll wait to get this September's iPhone.

[+] gcr|2 years ago|reply

The significant part of this, for me, is not the image model. It's that Apple rarely publishes in the vision field. Back in Neurips (2016 I want to say?) Apple had a workshop where they said "we're going to engage more with the academic community, we're going to publish more," but since then only a small number of CV projects have seen the academic light of day. I remember one on detection of flashing lights, a whitepaper/tech report on Hey Siri wakeword detection, some interesting synthetic image generation for iris recognition, now this.

If I had to guess, I think Apple Research may have some institutional obstacles to getting research work out there. Perhaps this could indicate those walls are continuing to dissolve bit by bit?

[+] Someone|2 years ago|reply

I wouldn’t know how good they are, but https://machinelearning.apple.com/research?page=1&domain=Com... has about 50 papers in 2023. Aren’t those in computer vision?

[+] imjonse|2 years ago|reply

They have at least two recent papers (MobileOne and FastViT) with code and weights for low-latency (as measured on iPhone) vision backbones. Also AIM (Autoregressive Image Models) released last month.

[+] jahabrewer|2 years ago|reply

Important clarification from reading tfa for a second: image _editing_. I can't believe Apple would sully their image with actual generation.

[+] ericmcer|2 years ago|reply

It feels like Apple has been very quiet about their approach in some of the future spaces (self-driving, VR/AR, AI) but then they suddenly burst out with something like Vision Pro.

It seems like they are biding their time and avoiding the hype knowing that they don't need to get their first. If they can produce a superior product their brand will let them dominate those markets whenever they decide to release.

[+] rchaud|2 years ago|reply

> It seems like they are biding their time and avoiding the hype knowing that they don't need to get their first

This idea that Apple gets to things late to "perfect them" is an artifact of the iPod/iPhone era. Nothing Apple put out since then has met this bar.

Apple does things low-key these days because it has to. Remember the Apple Maps fiasco? A few memes was all it took to send everyone scurrying back to GMaps for a decade.

[+] gnicholas|2 years ago|reply

> If they can produce a superior product their brand will let them dominate those markets whenever they decide to release.

I anticipate that Apple's Pro devices (iPhone, MacBooks) will have beefier hardware designed specifically for this use case. My guess is they will sell a much higher proportion of Pro iPhones as a result. They'll probably also sell a lot more new devices overall, contra the trend toward slower upgrade cycles.

At the same time, I think third-party companies will find a way to do pretty decent on-device (and very good cloud-based) AI, so it will be a tradeoff in terms of speed and privacy. If you want the best speed and no data being shared with a cloud-based AI provider, the Pro devices will be aimed at you.

[+] JohnFen|2 years ago|reply

Apple has always been aware that "first mover advantage" is largely bullshit. "The pioneers get all the arrows." The sweet spot is to be second or third to market.

[+] rujkking|2 years ago|reply

Someone just noticed Apple’s decades old SOP. Watch the aggregate fumble about, fail over and over, then take the few ideas that survive rapid, aimless iteration attempts and add the last mile of polish.

A decade of crappy Palm, WinCE, and easily forgotten Java based mobile devices came and went before iPhone.

They know what software people refuse to accept; it’s the hardware experience that matters and they’ll wait until the hardware is there.

Modern hardware is responsible for AI, more reliable networks, and the rest of modern compute. Has little to do with the bloated software stacks we git pull into the data center

Edit: Also consider that Intel and Apple don’t just make a consumer device. They design an advanced manufacturing pipeline. The difference between print(“hello world”) and for index in array: print(contents of index)

That’s the basis of their value and why they are propped up by government and why software focused startups will always just be pump and dump schemes.

[+] seydor|2 years ago|reply

yet every review calls the vision Pro a "work in progress". I don't buy this

I also wonder why they broke with their naming scheme ("Pro" is not the Pro of any smaller model)

[+] RyanHamilton|2 years ago|reply

What makes you think vision pro is dominating? Unlike iPhone etc it has no distinguishing feature I know of.

[+] _ncyj|2 years ago|reply

I’m curious why so many AI papers consist mostly of people with Chinese-like names. I’m surprised there aren’t more witch hunts from the far right & skepticism that they can embed CCP propaganda into the models.

[+] Am4TIfIsER0ppos|2 years ago|reply

Why would we, the right-wing conspiracy nuts, need chinamen to do that when there are so many communists in america and europe who do it anyway.

[+] dosinga|2 years ago|reply

Interesting read, but then suddenly there's this passive aggressive paragraph:

Apple stock has taken a beating as of late, in part because analysts have loudly proclaimed that the company is behind Meta, Google, and Microsoft in generative AI implementation. It's not clear why this wasn't a problem when it wasn't first to a mobile phone, a tablet, a smartwatch, or a VR headset, but is with generative AI.

somebody is hurting

[+] system16|2 years ago|reply

I also found that an odd statement. If you need a “why” analysts may not feel confident about Apple’s ability to deliver when it comes to AI, look no further than Siri and Apple Maps.

[+] pksebben|2 years ago|reply

I dunno, the tone may be what it is but I kinda see the message re: analysts and their forecasts. Beat the managers, buy the index, yaknow.

108 comments