naftaliharris | 6 years ago | on: Tauthon: Fork of Python 2.7 with new syntax, builtins, libraries from Python 3
naftaliharris's comments
naftaliharris | 6 years ago | on: Ask HN: Who is hiring? (July 2019)
SentiLink prevents synthetic fraud, an emerging fraud vector in which fraudsters open accounts using name/DOB/SSN combinations that don't correspond to real people. Our partners include top ten US banks, fintechs, and alternative lenders. We're backed by investors including Andreessen Horowitz, Max Levchin (Affirm CEO/PayPal Co-Founder), and former presidents/CEO's of Visa, Transunion, HSBC, and Citi.
We recently closed a $14M Series A [1] and are hiring software engineers to help us build our identity platform. Our tech stack uses Go (for the API part) and Python (for the ML part) on k8s and the work involves a lot of complex and sensitive data.
Please apply at https://jobs.lever.co/sentilink.
[1] https://businessinsider.com/synthetic-fraud-detection-startu...
naftaliharris | 7 years ago | on: Ask HN: Who is hiring? (May 2019)
SentiLink prevents synthetic fraud, an emerging fraud vector in which fraudsters open accounts using name/DOB/SSN combinations that don't correspond to real people. Our partners include top ten US banks, fintechs, and alternative lenders. We're backed by investors including Andreessen Horowitz, Max Levchin (Affirm CEO/PayPal Co-Founder), and former presidents/CEO's of Visa, Transunion, HSBC, and Citi.
We recently closed our Series A [1] and are hiring software engineers to help us build our identity platform. Our tech stack uses Go (for the API part) and Python (for the ML part) on k8s and the work involves a lot of complex and sensitive data.
Please apply at https://angel.co/sentilink/jobs or shoot a resume/github/linkedin to me, (my first name at a domain I'm sure you can guess).
[1] https://businessinsider.com/synthetic-fraud-detection-startu...
naftaliharris | 8 years ago | on: The New ID Theft: Millions of Credit Applicants Who Don’t Exist
1. Unlike with ID theft, there's no consumer victim. With ID theft, eventually the victim will find out about it, (by getting a call from a collections agent or seeing the trade on their credit report). They'll then contact the lender or the bureau, and contest the validity of the loan. The end result is that the lender gets a stream of loans that are labeled as identity theft losses. Since there's no consumer victim with synthetic fraud, though, lenders don't get this stream of labeled data and have a hard time knowing which of their losses are synthetic fraud (and which are just ordinary credit losses).
2. Synthetic fraud cuts right through typical ID theft prevention systems. ID theft prevention is about checking whether the applicant is the same as the identity they're using to apply for credit. So you check if the email the applicant uses matches the identity, (e.g. don't want [email protected] used as the email for Jane Smith), you check the phone number, you check if the applicant can complete KBA (knowledge based authentication, e.g. questions about previous addresses), you check the billing address, and so forth. But synthetic identities have their own aged phone numbers, emails, addresses, and credit histories, and so all of these verifications go through without any flags raised. Essentially the ID theft prevention system was checking whether the applicant is the same as the identity that they're using, but with synthetic fraud the applicant created the identity.
Source: my startup focuses heavily on preventing synthetic fraud for lenders, (PM me for details).
naftaliharris | 8 years ago | on: Ask HN: Who is hiring? (March 2018)
SentiLink is reinventing identity, beginning with financial services in the United States. The current system is broken: SSN's are used as both a username and a password, but after repeated data breaches are also effectively semi-public. Identity-verification data isn't shared, so the same fraudsters target every company and consumers have to continually reverify themselves with different institutions. Billions of dollars are lost every year to criminals who are very rarely caught or punished. SentiLink is building the arbiter of identity to bring identity into the 21st century.
Our investors include former co-founders and C-level execs at PayPal, Palantir, Affirm, Visa, and Citibank, including Max Levchin (SciFi) and Hans Morris (Nyca Partners).
Apply here: https://angel.co/sentilink/jobs or email me (first name at sentilink.com).
naftaliharris | 8 years ago | on: As Computer Coding Classes Swell, So Does Cheating
That said, I've got to imagine that claims that "as many as 20 percent of the students in one 2015 computer science course were flagged for possible cheating" are a misrepresentation or a misunderstanding, on the part of the journalist. I mean, sure, if you set the threshold for the plagiarism detector at a low level, you can flag 20%, 50%, or however many students you want for "possible cheating", but it's not necessarily a real thing.
naftaliharris | 8 years ago | on: Show HN: Velo.com – a marketplace for used bicycles
naftaliharris | 9 years ago | on: Paradoxes of probability and other statistical strangeness
naftaliharris | 9 years ago | on: Paradoxes of probability and other statistical strangeness
Even more crazy, the James-Stein Estimator which does this actually uses data about the football player and soccer player to make predictions about the baseball player, (and vice-versa). This is deeply unintuitive to most people since the players aren't related to each other at all. The phenomenon only holds with at least three players; it doesn't work for two.
(More generally, Stein's Paradox is the fact that if you have p >= 3 independent Gaussians with a known variance, you can do better in estimating their p-dimensional mean than just using their sample means).
I've spent a bunch of time trying to understand why this actually works [2]; to be honest I still don't deeply understand. But nonetheless the consensus is that the same shrinkage phenomenon is what causes improved performance for a variety of high-dimensional estimators, (lasso or ridge regression, e.g.), making the paradox very very influential.
[1] https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator [2] https://www.naftaliharris.com/blog/steinviz/
naftaliharris | 9 years ago | on: Software Engineer Starts Unlikely Business: A Weekly Newspaper
So I expect that a big factor in whether quality local newspapers can survive is the strength of the local housing market, (measured through e.g. median house price and yearly volume). As a practical matter, this means that only in relatively affluent places is local news financially feasible, (although the housing market isn't the only reason why that's the case). It also means that more people searching for property online may present a challenge for local news.
[1] http://www.paloaltoonline.com/morguepdf/2017/2017_03_24.paw....
naftaliharris | 9 years ago | on: H&R Block and Intuit Are Lobbying Against Making Tax Filling Free and Easy
naftaliharris | 9 years ago | on: Ask HN: Have you created a programming language and why?
It lets people with Python 2 code start to use new features from Python 3. (It's a backwards-compatible fork of Python 2.7 with features like async/await, function annotations, and keyword-only arguments backported from Python 3).
naftaliharris | 9 years ago | on: Python 2.8?
> Backporting features will be hard
It's actually pretty straightforward: I just find the relevant changes in the Python 3 history, and apply them to Python 2. Usually a handful of things have changed between 2 and 3, so I typically can't just pipe the diff to "git apply -3", but frankly backporting features is more tedious than difficult. If you're interested, for example, here's my recent implementation of the "nonlocal" keyword; you can see the commit messages reference the Python 3 commits: https://github.com/naftaliharris/placeholder/pull/60
> Why try to create an inferior python 3?
The ultimate goal is to build an interpreter that can run both Python 2 and 3 code. Unfortunately, there is some code that runs and has different behavior under Python 2 and Python 3 (e.g., 'print("a", "b")' ), so anyone who wants to write an interpreter that can run both kinds of code will need to decide what to do there. I decided to defer to Python 2 behavior in those cases, since most of my code is in Python 2 and I don't want to change it. :-)
naftaliharris | 9 years ago | on: Library-managed 'arXiv' spreads scientific advances rapidly and worldwide
naftaliharris | 9 years ago | on: Why I'm Making Python 2.8
When Python 3 was released, it offered Python users a trade: In exchange for a productivity loss (porting your Python 2 code), you'd get a productivity gain (new features in Python 3 and removed cruft). Some projects and companies thought this was a good trade, and have upgraded over the years, and many have not, and haven't. The interpreter I've been working on tries to improve on the terms of that deal for people who have not switched to Python 3.
> What a terrible, terrible situation. Now you'll have "python" code that will neither run on 2.7 nor run compliantly on 3.x.
That's the point, yes. Obviously any interpreter that's backwards compatible with 2.7 but includes new features from 3.x is going to let people write code that doesn't run under 2.7 or 3.x. But what does it matter if your code doesn't run under interpreters that you aren't using and don't intend to use?
> Just call it anything else
I'll change the name.
naftaliharris | 9 years ago | on: Why I'm Making Python 2.8
It is possible actually, that's kind of the point! The interpreter I've been working on passes the 2.7 unit tests (i.e. those in Lib/test/), and as well as unit tests for the new features that have been backported from Python 3.
Even if you don't believe me, it's interesting to note that, e.g., while Python 3.0 was being developed, function annotations and keyword-only arguments coexisted with tuple unpacking. I built the code and ran it myself, in fact: https://twitter.com/naftaliharris/status/784421498291310592. Tuple unpacking was actually removed later, introducing the backwards incompatibility after the new functionality had been added. Timeline:
Oct 2006, keyword-only arguments.
Dec 2006, function annotations.
Mar 2007, removing tuple unpacking.
There was also a promising backport of keyword only arguments to CPython 2.6 (!) that was never merged, (http://bugs.python.org/issue1745), due to lack of follow-through.
naftaliharris | 9 years ago | on: Why I'm Making Python 2.8
A lot of people here have strong opinions about the name "Python 2.8". I don't mind changing it, and intend to do so, (https://github.com/naftaliharris/python2.8/issues/47). I picked it initially since when talking with friends about this project it conveyed pretty darn immediately what the project is and does. I'd be very keen to hear people's suggestions for alternate names!
For those of you with 2.7 codebases or projects, I'd be extremely interested in hearing about whether you were able to get this interpreter to run your code. Personally, the biggest challenges I've had so far are with dependencies that check for `sys.version_info[:2] == (2, 7)` as opposed to something like `sys.version_info[0] < 3`. But I'd be very interested in other people's experiences, particularly with larger codebases.
[1] A minor and somewhat pedantic point: The interpreter I've been working on includes PEP 515 (underscores in numeric literals), which is new in 3.6. I didn't think it was right for me to "take credit" for this new feature before it was even out in Python 3.6. Obviously, the real credit for this feature existing (in 3.6 or in any interpreter) goes to the CPython core devs, and especially Georg Brandl.
naftaliharris | 9 years ago | on: Ask HN: What are the best resources to learn Python for Data Analysis
naftaliharris | 10 years ago | on: Visualizing K-Means equilibria
naftaliharris | 10 years ago | on: Ask HN: What alternative to find and xargs do you use?
It was a fun project; I learned a lot about how the CPython implementation works and have a lot of respect for the people that built it. It was surprisingly easy to implement Tauthon based off the work the core dev team did on Python3: https://www.naftaliharris.com/blog/nonlocal/
For what it's worth, I do believe that Python3 is a better language than Python2. We use Python 3.7 at my work (SentiLink) and we've had a good experience with it. (If you're starting a new project or can migrate, I'd recommend it). But I do think that the ~10 year saga of upgrading to Python3 from Python2 wasn't necessary when the main benefit was really the unicode refactoring.
I no longer maintain Tauthon personally but there are others who are excited about the project who occasionally add new features or bugfixes.