How The New York Times Uses Software to Recognize Members of Congress

[+] davidkuhta|7 years ago|reply

The example made me lol: "Mitch McConnell (red, almost certainly - confidence = 100.0)

On that note, they could utilize the box color to match the party affiliation.

[+] danschumann|7 years ago|reply

No no no! Snap chat filters with a donkey or an elephant!! XD

[+] kingbirdy|7 years ago|reply

It seems like the box color would be used to differentiate if multiple congresspeople were in the shot

[+] gordon_freeman|7 years ago|reply

what about right-center or left-center? ;)

[+] nkassis|7 years ago|reply

I'm waiting for journalists to walk around with google glass type device to do this on the fly. Bonus it could record what they see and hear for later use.

[+] baldeagle|7 years ago|reply

I always think the future of journalism will be something like in Garth Ennis's 'Transmetropolitan' where there are camera (drones) everywhere, watching everything and an escalating tension (and maybe technical arms race) between those hiding / burying the signal and those trying to bring them to light. Consider this a recommendation for anyone looking for a inspiring (though somewhat adult) tale of near-ish future sci-fi.

[+] SurrealSoul|7 years ago|reply

I can't wait to live in that future, glasses that let you know if you bumped into anyone famous

[+] IncRnd|7 years ago|reply

They aren't that famous if you don't recognize them after bumping into them.

[+] rhacker|7 years ago|reply

I was hoping to read an article about NYTimes setting up video cameras outside of popular restaurants in DC and using ML to perform facial recognition on everyone to try to find members of congress and well known lobbyists. oh well... it would be like TMZ-4-DC

[+] Maxious|7 years ago|reply

Why set up video cameras when people bring their own...

> Rachel Shorey found members of Congress at an event hosted by a SuperPAC by trawling through images found on social media and finding matches.

[+] ericsoderstrom|7 years ago|reply

The author says that training their own model would have been too hard due to lack of training data, but evidently Rekognition had sufficient training data to make it work? Why can't NYT use the same training set Rekognition uses? Does Amazon somehow have a secret non-public collection of celebrity photos?

[+] kevin_thibedeau|7 years ago|reply

It shouldn't take an intern too long to collect a representative set of Congress people and other high officials for training. Maintaining it would not be an undue burden. That would eliminate the false positive matches for all the unwanted celebs. Clearly Amazon's models aren't that great to begin with so there's little reason to stick with them.

Wrap it up into a simple native app and you can bypass the MMS BS. Even better, a sufficiently capable dev could integrate an opensource recognition library [1] to have it entirely implemented on the device.

[1] https://github.com/rudybrian/tuFace

[+] m_ke|7 years ago|reply

Rekognition crawled and annotated millions of images of different celebrities to train their face recognition model. Once you have an accurate model for a lot of classes it's much easier to add new ones with just a few samples.

[+] AdmiralAsshat|7 years ago|reply

I can't wait to see how long it takes Congress to pass a law making it illegal to use facial recognition software on members of Congress.

(And no one else)

[+] 2RTZZSro|7 years ago|reply

Thankfully, only high capacity assault facial recognition software is likely to be banned as a result.

[+] Isamu|7 years ago|reply

So you should be able to send a selfie to this api and it will tell you which member of congress you look most like

[+] reaperducer|7 years ago|reply

Except in Illinois, where sending the data off device is illegal.

(See previous HN discussion)

[+] jonknee|7 years ago|reply

It would be fun to see which members are the most requested by NYT reports.

[+] jeremyjbowers|7 years ago|reply

Oh, that is interesting. Also, hi Jon!

[+] otakucode|7 years ago|reply

I have wanted for awhile to build a site which trained a machine learning system on the various data made available surrounding Congresspeople and information on members which were eventually found to be guilty of adultery or other similar crimes - then produce a score for every member of Congress rating how likely it is that they are cheating on their spouse, or taking bribes, or similar. Give them a sneak preview into the types of systems they are aiding and abetting in the creation of. I am uncertain of whether it could be considered defamation to have a brainless machine learning system decide there's an 85% chance some random member of Congress is an adulterer. I don't actually believe that any such system could ever reach any reasonable level of actual effectiveness due to the fundamental complexities of human behavior and circumstance, but that's not stopping the law enforcement side of things from moving forward so I don't see why it ought to stop the side trying to point out fundamental flaws in the strategy.

[+] cachemiss|7 years ago|reply

I've considered something like that, but instead of trying to figure out crimes, it would produce a score for bills.

A corruption score for bills, almost like a facebook for bills "This bill is friends with Exxon". It would figure out who spent the most getting the bill passed, and who they bought off to get it.

Just a simple thing for people to point to when they say things are corrupt. Granted in today's environment, that score would be 100% most of the time, but it would be interesting to have some idea just who bought the bill.

[+] toomuchtodo|7 years ago|reply

I’d take it a step further and ingest all public record data including using FOIA requests to find any behavior that could have a representative charged with a crime (fraud, bribery, etc).

As sibling comment said, don’t generate an adultery score. That’s not productive or decent. Find actual evidence of wrongdoing, not draconian scoring systems.

[+] smacktoward|7 years ago|reply

> adultery or other similar crimes

Adultery is not a crime.

(You can argue that it's an indicator of a person's character, or lack thereof, sure. But that's something different.)

[+] smt88|7 years ago|reply

It's disturbing to me that you're so focused on adultery, which isn't a crime in most places and is a personal matter for the couple involved. More than 70% of people cheat on a significant other at some point, so you'd be casting a wide net.

Why not instead look at real crimes like pay-for-play, fraud, sexual assault, etc.?

[+] deaps|7 years ago|reply

> I don't actually believe that any such system could ever reach any reasonable level of actual effectiveness due to the fundamental complexities of human behavior and circumstance...

Absolutely it could - that would all be factored into the percentage. Human behavior and chance encounters are the exact reason you could never say 0% or 100%, however.

[+] danso|7 years ago|reply

Which existing public data sets would you be using to train against?

[+] sbarker|7 years ago|reply

How about score them on how they really vote. "You say you're a democrat but our party detector test says that is a lie!" 83/17

[+] dominotw|7 years ago|reply

> produce a score for every member of Congress rating how likely it is that they are cheating on their spouse

Sounds like a really mean spirited thing to do. They are people too.

[+] mlthoughts2018|7 years ago|reply

This is an embarrasingly bad approach to face recognition for a small set of frequently photographed people.

Several comments from the article give me concern

- They seem to think Rekognition is a panacea for their problem, but there are many known issues with Rekognition celebrity detection. Not to mention that the cost-per-request is often highly unfavorable compared with building a higher-accuracy, situation-specific solution with extensions to pre-trained models.

- They say some interns took a “novel approach” by creating a hard coded look-up table for disambiguating similar politician-celebrity pairs. This creates awful tech debt and failure cases. I’m not knocking it too hard because it’s pragmatic, which is a good sign about those interns, but this should be seen as a necessary wart to be improved, not a point of pride.

- As others have pointed out, even considering turnover in Congress, it seems like people who report on Congress for their full time job should recognize them. It truly seems like a silly, wasteful use of resources to solve this with computer vision.

This is all consistent with what I’ve heard from colleagues at NYT data science. As well as people I’ve known in data science bootcamps around New York, like Insight, who heard recruiting pitches.

Their department seems self-aggrandizing, using highly overwrought personalization models and seeming to have 538-envy for how they want their data science work to come off despite 538 exiting, among other important figures like Mike Bostock.

It just comes off as a place that wants to do status signalling to seem like a machine learning or data science thought-leader, but they don’t pay competitively or do what’s needed to retain good people and would rather do patchwork stuff like this with interns than to take the work a little more seriously.

I don’t get the impression it’s a place serious ML practitioners would want to go.

[+] smsm42|7 years ago|reply

Isn't this the same technology that would allow surveillance on every private citizen?

> Most recently, Rachel Shorey found members of Congress at an event hosted by a SuperPAC by trawling through images found on social media and finding matches.

I bet nothing in the technology says "member of Congress" or depends on the target being member of Congress. So anybody can mine social media and collect surveillance data on people. And that is probably already happening.

[+] asdsa5325|7 years ago|reply

TL;DR: They use a API from Amazon that's already trained for Congressmen.

[+] unknown|7 years ago|reply

[deleted]

[+] djhworld|7 years ago|reply

If anything this article doesn't reflect well on Rekognition

[+] DINKDINK|7 years ago|reply

>Nope, it’s too hard! Computer vision and face recognition are legitimately difficult computer science problems.

Someone is woefully ignorant how good facial-recognition surveillance is.

[+] SmooL|7 years ago|reply

There's a difference between "difficult" and "can't be done". Yes, facial recognition has come a long way, but it's still non-trivial to set up a custom facial recognition service for your particular needs.

[+] evan_|7 years ago|reply

the obvious next step to this would be to build a mobile app with a built-in model to recognize everyone deemed important using live video from the camera.

[+] dqpb|7 years ago|reply

Cool. Maybe next they can tackle subscriptions without ads.

[+] rootsudo|7 years ago|reply

This reminds me of Casino Royale. Wow.

[+] shozab|7 years ago|reply

[deleted]

[+] EmilyHealth|7 years ago|reply

[deleted]

[+] forapurpose|7 years ago|reply

Hmmm ... your job is to cover the actions of 540 people elected to DC, many of whom you already recognize, and you can't remember what they look like? I'm not a journalist, but that seems like an essential thing to memorize, along with some minor metadata (locale, party, a bit of bio). Spend a weekend and do it.

Every profession has things you can look up and things you just have to memorize. 540 people isn't much - can sports journalists recognize 540 athletes? Otherwise you'll be in situations where you don't have an opportunity to look them up (e.g., can't get a photo, no time, etc.), and you'll have many false negatives: If you don't know what they look like, you won't realize it's a member of Congress at the party with the coke.

[+] danso|7 years ago|reply

As the article states up top, there's decent churn in Congress, making this more than a one-time or annual thing. Also, it's not just members of Congress who are important to cover in a beat, but their senior staff members and aides.

Spending a significant amount of time developing a process for face memorization and undertaking it would be an example of needless/premature optimization, especially for people who may be covering Congress tangentially. Most of a Congress reporter's job does not depend on having random encounters with members of Congress.

[+] jonas21|7 years ago|reply

> can sports journalists recognize 540 athletes?

Well... I don't know if that's a fair comparison. Members of Congress don't generally walk around with their names embroidered on their shirts (but, hey, that might be a good idea!)

[+] nlawalker|7 years ago|reply

This sounds like 2018's version of "you won't always have a calculator."

[+] ThrustVectoring|7 years ago|reply

Not all brains have an equivalent ability to recognize faces. Like, "face blindness" is a real thing.

127 comments