top | item 16210834

Facebook open-sources Detectron

801 points| rmason | 8 years ago |research.fb.com | reply

178 comments

order
[+] evmar|8 years ago|reply
Also noteworthy: the Apache 2 license, which includes a patent grant (unlike the previous Facebook licenses that have caused concern in the past).
[+] haeffin|8 years ago|reply
caffe2 (which this is built on) was switched from bsd+patents to Apache2 a while ago too.
[+] megous|8 years ago|reply
So is this the end of Google captchas asking for where the car/sign/whatever is? Will there be a final battle of AIs, where they will kill each other, and the unfettered access to websites over VPN/tor wins and laughs the last laugh?
[+] kurtisc|8 years ago|reply
For the car captchas, I've found actually clicking all the boxes with part of a car will always be a wrong answer (distinct from when it just makes you answer twice). Instead, you have to click on the squares that you know it thinks are cars.

This creates a twisted Turing test situation where, to prove you are a human, you have to pretend to be a machine's idea of what a human is.

[+] jraph|8 years ago|reply
Has someone ever tried to submit images from Google captchas to Google Images?

An answer like "This is definitely a sign" from Google Images would be funny.

[+] schrep|8 years ago|reply
This is some of the most advanced work out there - but CV is not “solved” most vision systems only can label about 1k categories of objects. So capatchas can still be easiy constructed that would fool these systems. Part of why it is exciting to get this out there others can help us improve it.
[+] kbenson|8 years ago|reply
Google doesn't even show a catpcha if they have enough tracking info for toy to verify you're a human, which is pretty simple for them if you don't clear all your cookies for a few days. I'm pretty sure Google thinks if they have to show you a catpcha they've failed, but along with that they don't feel the need to make the catpchas particularly easy if they do have to show it.
[+] yohann305|8 years ago|reply
my overall feeling as someone that wants to start getting into visual recognition is that there are a bunch of great libraries/ecosystems to choose from and all of them have pros and cons, but i honestly don't want to make the wrong decision and end up being stuck later on. Anyone here has any advise on what i should use to have a camera(rpi) recognize most common objects and then add a layer where we can teach specific objects, ie (putting a name on a person or a pet), thank you!
[+] newscracker|8 years ago|reply
This looks amazing from the computing point of view (and is an achievement of sorts), yet the confidence percentages are lower than what an average human might be able to solve for (like in the case of CAPTCHA tests).

Meanwhile, I wonder about the human costs if systems like these are adopted for purposes where they may be ill suited for, especially cases where their confidence scores are ignored (or mistakenly assumed to be 100% even when they're lower). Anyone have reading material on this?

[+] franciscop|8 years ago|reply
Does anyone know an alternative that works on RaspberryPi? This states: "Detectron operators currently do not have CPU implementation; a GPU system is required."

Even low FPS (3-5) would be acceptable.

[+] fmntf|8 years ago|reply
With such deep networks, I think it would be hard to get 3-5 FPS on an Intel iX; forget the Raspberry PI CPU!
[+] kurtisc|8 years ago|reply
FWIW, A Pi does have a GPU with a full OpenGL ES implementation. However, this requires NVIDIA CUDA.
[+] haeffin|8 years ago|reply
It's a bit disappointing, when caffe2 was released it was stated that mobile is a big focus, but things like this don't support mobile (even though some of this was demoed by FB on a phone).
[+] eggie5|8 years ago|reply
i ras running a NAS-based object detection model on my big-new MacBook and it was taking about 30s/image on an unoptimised tensorflow build. I then tried a model/net model which took about 3-4s/image.
[+] dapreja|8 years ago|reply
Use your last android phone.
[+] drdrey|8 years ago|reply
> Beyond research, a number of Facebook teams use this platform to train custom models for a variety of applications including augmented reality and community integrity.

Any idea what they mean by "community integrity"?

[+] readams|8 years ago|reply
detecting porn, presumably.
[+] eb0la|8 years ago|reply
Maybe you want to censor internally part of an image for privacy reasons.

For instance, in my country you cannot use or publish children images without parents consent.

The fine for doing that is way higher than your benefits even discounting bad press.

[+] JepZ|8 years ago|reply
Anybody knows a way to run CUDA programs with the open source driver (nouveau)?
[+] yorwba|8 years ago|reply
CUDA needs driver support to talk to the GPU, and since it is proprietary nVidia technology, the open-source driver can't support it. So either you run the nVidia driver or you have to use OpenCL.
[+] skate22|8 years ago|reply
I had to specifically disable nouveau to get cuda to install correctly on ubuntu 16.04. You need an nvidia card and drivers that are new enough to run the later versions of cuda.
[+] aaroninsf|8 years ago|reply
Can someone with GPUs and love in their hearts, bundle this with trained models in a Docker container?

(serious request... I got a cluster, and something like a million pictures; but no GPUs or time for another side project...)

[+] burningion|8 years ago|reply
I’ve started working on this, it seems the current Dockerfile for caffe2 doesn’t work out of the box because of a forced push.

Follow me on Twitter, and I’ll post it there when it’s finished. Same username as here.

* edit: I've put a pull request in that builds the Dockerfile for the GPU for now: https://github.com/facebookresearch/Detectron/pull/15

[+] yters|8 years ago|reply
Is there a class of math problem humans can solve but computers cannot? Then we could just use these problems as a guaranteed test instead of the current CAPTCHA arms race.
[+] m_ke|8 years ago|reply
There is no arms race. 99.9% of the time Google knows if you're a robot based on your browser state. They make you label the images because it's a free way to get training data.
[+] ibdf|8 years ago|reply
I was just looking into trying out YOLO. Does anyone know how both compare?
[+] xiphias|8 years ago|reply
I'm happy that tech companies are open sourcing basic research all the time, and thinking a lot about what would have happened if large pharmacy companies did the same thing. I'm just hopeful that with new biotech companies the science behind curing people will get faster as well.
[+] visarga|8 years ago|reply
> thinking a lot about what would have happened if large pharmacy companies did the same thing

There is a company creating a 3d-printed chemical reactor. By downloading a schematic and buying some raw substances, you can create your own lab. It can be used to synthesise drugs in remote areas, such as on Mars, or to make generics for cheap. The exciting part is that the reactor schematic can be downloaded and shared easily. It can also make illegal drugs just as easily as 3d-printers can print guns.

http://www.sciencemag.org/news/2018/01/you-could-soon-be-man...

[+] ejstronge|8 years ago|reply
Unlike the case in tech, pharma basic research is far less important in advancing our knowledge when compared to academia. A good example comes from the last few blockbuster cancer therapies - CAR-T cells and checkpoint blockade all arose in academic labs.

Also, for drugs that do make it to market, efficacy and side effect information is published as a condition of drug approval, at least for new drugs.

Whether basic science research papers should be behind a paywall is a wholly separate issue, but the life science community largely shares its finished products. Indeed, there’s even a push to share early stage data, too.

[+] fjsolwmv|8 years ago|reply
Code is cheap. Training data is expensive
[+] wazoox|8 years ago|reply
This is relying on proprietary CUDA technology. This doesn't qualify as Free Software to me.
[+] stmw|8 years ago|reply
This is great! I do wish this were written in something other than Python. What is the carbon footprint of all this computer vision, compute-intensive code still being run billions of times a day in Python? Someone should calculate...
[+] minimaxir|8 years ago|reply
The actual computationally-hard part of the code is run in the GPU using CUDA.
[+] inlined|8 years ago|reply
What was the carbon footprint of the turk machines the Python can replace?
[+] stmw|8 years ago|reply
It is funny to see this comment get "-4" already... What's so offensive? After all, Facebook has rocksdb in C++, percona in java, and a PHP->C++ compiler, so they clearly have both the belief and the skill in moving away from interpreted programming languages for performance-sensitive code.