Facebook open-sources Detectron

[+] evmar|8 years ago|reply

Also noteworthy: the Apache 2 license, which includes a patent grant (unlike the previous Facebook licenses that have caused concern in the past).

[+] haeffin|8 years ago|reply

caffe2 (which this is built on) was switched from bsd+patents to Apache2 a while ago too.

[+] megous|8 years ago|reply

So is this the end of Google captchas asking for where the car/sign/whatever is? Will there be a final battle of AIs, where they will kill each other, and the unfettered access to websites over VPN/tor wins and laughs the last laugh?

[+] kurtisc|8 years ago|reply

For the car captchas, I've found actually clicking all the boxes with part of a car will always be a wrong answer (distinct from when it just makes you answer twice). Instead, you have to click on the squares that you know it thinks are cars.

This creates a twisted Turing test situation where, to prove you are a human, you have to pretend to be a machine's idea of what a human is.

[+] jraph|8 years ago|reply

Has someone ever tried to submit images from Google captchas to Google Images?

An answer like "This is definitely a sign" from Google Images would be funny.

[+] schrep|8 years ago|reply

This is some of the most advanced work out there - but CV is not “solved” most vision systems only can label about 1k categories of objects. So capatchas can still be easiy constructed that would fool these systems. Part of why it is exciting to get this out there others can help us improve it.

[+] kbenson|8 years ago|reply

Google doesn't even show a catpcha if they have enough tracking info for toy to verify you're a human, which is pretty simple for them if you don't clear all your cookies for a few days. I'm pretty sure Google thinks if they have to show you a catpcha they've failed, but along with that they don't feel the need to make the catpchas particularly easy if they do have to show it.

[+] yohann305|8 years ago|reply

my overall feeling as someone that wants to start getting into visual recognition is that there are a bunch of great libraries/ecosystems to choose from and all of them have pros and cons, but i honestly don't want to make the wrong decision and end up being stuck later on. Anyone here has any advise on what i should use to have a camera(rpi) recognize most common objects and then add a layer where we can teach specific objects, ie (putting a name on a person or a pet), thank you!

[+] newscracker|8 years ago|reply

This looks amazing from the computing point of view (and is an achievement of sorts), yet the confidence percentages are lower than what an average human might be able to solve for (like in the case of CAPTCHA tests).

Meanwhile, I wonder about the human costs if systems like these are adopted for purposes where they may be ill suited for, especially cases where their confidence scores are ignored (or mistakenly assumed to be 100% even when they're lower). Anyone have reading material on this?

[+] franciscop|8 years ago|reply

Does anyone know an alternative that works on RaspberryPi? This states: "Detectron operators currently do not have CPU implementation; a GPU system is required."

Even low FPS (3-5) would be acceptable.

[+] m_ke|8 years ago|reply

You could try tensorflow object_detection api with tensorflow lite

https://github.com/tensorflow/models/tree/master/research/ob...

google also recently put up their mobilenet v2 paper which handles segmentation https://arxiv.org/abs/1801.04381

[+] fmntf|8 years ago|reply

With such deep networks, I think it would be hard to get 3-5 FPS on an Intel iX; forget the Raspberry PI CPU!

[+] kurtisc|8 years ago|reply

FWIW, A Pi does have a GPU with a full OpenGL ES implementation. However, this requires NVIDIA CUDA.

[+] byte1918|8 years ago|reply

Maybe https://pjreddie.com/darknet/install or one of the various forks. YOLO is supposed to be one of the best object detection systems as far as accuracy/speed is concerned.

[+] pilooch|8 years ago|reply

See https://github.com/jolibrain/dd_performances you can reach 1fps on image classification and objects detection tasks on rpie3 with custom caffe and models.

[+] vitorgrs|8 years ago|reply

There's CNTK, but I don't know how well does it work :) https://github.com/Microsoft/CNTK

[+] haeffin|8 years ago|reply

It's a bit disappointing, when caffe2 was released it was stated that mobile is a big focus, but things like this don't support mobile (even though some of this was demoed by FB on a phone).

[+] eggie5|8 years ago|reply

i ras running a NAS-based object detection model on my big-new MacBook and it was taking about 30s/image on an unoptimised tensorflow build. I then tried a model/net model which took about 3-4s/image.

[+] dapreja|8 years ago|reply

Use your last android phone.

[+] drdrey|8 years ago|reply

> Beyond research, a number of Facebook teams use this platform to train custom models for a variety of applications including augmented reality and community integrity.

Any idea what they mean by "community integrity"?

[+] readams|8 years ago|reply

detecting porn, presumably.

[+] eb0la|8 years ago|reply

Maybe you want to censor internally part of an image for privacy reasons.

For instance, in my country you cannot use or publish children images without parents consent.

The fine for doing that is way higher than your benefits even discounting bad press.

[+] unknown|8 years ago|reply

[deleted]

[+] tomschlick|8 years ago|reply

Hotdog / Not Hotdog

[+] JepZ|8 years ago|reply

Anybody knows a way to run CUDA programs with the open source driver (nouveau)?

[+] yorwba|8 years ago|reply

CUDA needs driver support to talk to the GPU, and since it is proprietary nVidia technology, the open-source driver can't support it. So either you run the nVidia driver or you have to use OpenCL.

[+] skate22|8 years ago|reply

I had to specifically disable nouveau to get cuda to install correctly on ubuntu 16.04. You need an nvidia card and drivers that are new enough to run the later versions of cuda.

[+] aaroninsf|8 years ago|reply

Can someone with GPUs and love in their hearts, bundle this with trained models in a Docker container?

(serious request... I got a cluster, and something like a million pictures; but no GPUs or time for another side project...)

[+] burningion|8 years ago|reply

I’ve started working on this, it seems the current Dockerfile for caffe2 doesn’t work out of the box because of a forced push.

Follow me on Twitter, and I’ll post it there when it’s finished. Same username as here.

* edit: I've put a pull request in that builds the Dockerfile for the GPU for now: https://github.com/facebookresearch/Detectron/pull/15

[+] candlefather|8 years ago|reply

Reminds me of this pic from Terminator https://s3.amazonaws.com/pbblogassets/uploads/2015/08/Termin...

[+] ry_ry|8 years ago|reply

Accompanied by endless banner ads for your clothes, your boots and your motorcycle.

[+] make3|8 years ago|reply

huh it outlines even through other objects

[+] yters|8 years ago|reply

Is there a class of math problem humans can solve but computers cannot? Then we could just use these problems as a guaranteed test instead of the current CAPTCHA arms race.

[+] m_ke|8 years ago|reply

There is no arms race. 99.9% of the time Google knows if you're a robot based on your browser state. They make you label the images because it's a free way to get training data.

[+] unknown|8 years ago|reply

[deleted]

[+] ibdf|8 years ago|reply

I was just looking into trying out YOLO. Does anyone know how both compare?

[+] xiphias|8 years ago|reply

I'm happy that tech companies are open sourcing basic research all the time, and thinking a lot about what would have happened if large pharmacy companies did the same thing. I'm just hopeful that with new biotech companies the science behind curing people will get faster as well.

[+] fulafel|8 years ago|reply

Very sorry to nitpick, but this and pharma research are applied research - basic research is things like string theory and abstract math. See eg http://www.sjsu.edu/people/fred.prochaska/courses/ScWk170/s0...

Companies rarely do basic research, and that's why it's very important to keep up public funding for it.

[+] visarga|8 years ago|reply

> thinking a lot about what would have happened if large pharmacy companies did the same thing

There is a company creating a 3d-printed chemical reactor. By downloading a schematic and buying some raw substances, you can create your own lab. It can be used to synthesise drugs in remote areas, such as on Mars, or to make generics for cheap. The exciting part is that the reactor schematic can be downloaded and shared easily. It can also make illegal drugs just as easily as 3d-printers can print guns.

http://www.sciencemag.org/news/2018/01/you-could-soon-be-man...

[+] ejstronge|8 years ago|reply

Unlike the case in tech, pharma basic research is far less important in advancing our knowledge when compared to academia. A good example comes from the last few blockbuster cancer therapies - CAR-T cells and checkpoint blockade all arose in academic labs.

Also, for drugs that do make it to market, efficacy and side effect information is published as a condition of drug approval, at least for new drugs.

Whether basic science research papers should be behind a paywall is a wholly separate issue, but the life science community largely shares its finished products. Indeed, there’s even a push to share early stage data, too.

[+] geephroh|8 years ago|reply

Sounds like you might enjoy Annalee Newitz's _Autonomous_ (https://www.techsploitation.com/#/sciencefiction/)

[+] fjsolwmv|8 years ago|reply

Code is cheap. Training data is expensive

[+] gimmeayrwlt|8 years ago|reply

[deleted]

[+] alhafizco|8 years ago|reply

[deleted]

[+] wazoox|8 years ago|reply

This is relying on proprietary CUDA technology. This doesn't qualify as Free Software to me.

[+] stmw|8 years ago|reply

This is great! I do wish this were written in something other than Python. What is the carbon footprint of all this computer vision, compute-intensive code still being run billions of times a day in Python? Someone should calculate...

[+] minimaxir|8 years ago|reply

The actual computationally-hard part of the code is run in the GPU using CUDA.

[+] inlined|8 years ago|reply

What was the carbon footprint of the turk machines the Python can replace?

[+] chewxy|8 years ago|reply

Golang alternative being developed (by me and a bunch of others): https://gorgonia.org/gorgonia

[+] stmw|8 years ago|reply

It is funny to see this comment get "-4" already... What's so offensive? After all, Facebook has rocksdb in C++, percona in java, and a PHP->C++ compiler, so they clearly have both the belief and the skill in moving away from interpreted programming languages for performance-sensitive code.

178 comments