top | item 13093091

The Django Project Debates User Tracking

82 points| miiiiiike | 9 years ago |lwn.net | reply

50 comments

order
[+] czep|9 years ago|reply
Why does it not suffice to examine 'pip install Django' metrics from PyPI? That would be a reliable indicator of the relative popularity of the package against other packages in a level playing field.

While it would overcount the number of true installations of projects using Django, judging by the number of times I spin up a VM for testing, I would still argue that would be a better metric than a custom GA integration for which you'd have no relevant point of comparison. Even if they were to make this opt-out, what would they compare it to?

A: "Based on our custom GA developer tracking, we count 400,000 new Django projects this month."

B: "Django is the 4th most frequently installed third party Python package, based on the Python package index."

Personally I'd trust statement B more than A. No one can independently verify statement A.

[+] ubernostrum|9 years ago|reply
As others note, the issue with PyPI and other download metrics is there are tools which frequently download the full set of requirements for a project in order to run their tasks, and those downloads shouldn't be counted.
[+] folz|9 years ago|reply
I've seen many CI environments take forever because they download Django from pypi and forget to cache it. Downloads != usage.
[+] yladiz|9 years ago|reply
At first before reading the article I was very much against it. But after reading it, it seems a little bit more reasonable but I would really strongly prefer two things if this were to ever get implemented: 1) don't use Google. There must be services that will provide analytics pro-bono/really cheaply for open source/non-profits that aren't tied to a company with terrible privacy track records like Google. 2) make it abundantly clear that this will happen and explicitly give the opt-out instructions the first time (if it was indeed opt-out). As in, "We are enabling user tracking for better usage statistics. If you would like to opt-out, please type <...>." I know that I rarely read changelogs and that if things are not presented to me at installation time they would probably sneak in, through the fault of the Django team or not, but I'd worry that such a thing (notifying users directly) isn't easily possible through pip installations/setup.py.

However, would it be very useful statistically compared to the Pypi installation numbers? Sure, Python is different than NPM because NPM almost always locally installs packages whereas Python installs globally by default, but the numbers must still be high as Django is likely one of the highest installed packages from Pypi and in Python-land in general and as czep points out, because they would only be tracking themselves, it would be hard to compare numbers to anything. It would be useful from a total amount perspective but it wouldn't have any use in comparing to other packages because the kind of data would be different: Django would have usage statistics whereas Pypi has installation/download statistics.

I'm also surprised this is even necessary, since the main purpose of this is supposedly to be able to talk to potential investors for the DSF with concrete numbers. Is Django being basically familiar with every Python developer not enough? I'd really want to know specifically if investors have said they want usage data explicitly, rather than the nebulous idea that it may help make it easier to raise money before I'm more open to the proposal.

[+] forgotpwtomain|9 years ago|reply
> 1) don't use Google. There must be services that will provide analytics pro-bono/really cheaply for open source/non-profits that aren't tied to a company with terrible privacy track records like Google.

As an occasional Django user -- 100% on this. It's nothing difficult to store and persist some key-value pairs from a POST request, certainly doesn't require Google Analytics.

[+] was_boring|9 years ago|reply
I use django professionally, and if tracking usage helps guide development or attract sponsors to achieve higher quality -- I'm all for it.

There is a problem to be solved (how to make OSS sustainable), and I'm both interested in solving that problem and trying different approaches to solve it.

(edited for less use of the phrase "I'm all for it")

[+] ubernostrum|9 years ago|reply
Personally I'm not opposed to a popcon-style thing that just lets us estimate "X million people use Django". But it's increasingly looking like it's impossible to put together such a thing in a way that's both A) useful and B) not going to cause privacy issues.
[+] rtpg|9 years ago|reply
I want Django to reach success too, but are usage metrics that useful for a project of Django's size? Would Rails ever need to do this?

The information doesn't seem valuable given the context of this project.

It being GA is a bit bothersome, though it does extract a lot of useful info

[+] msane|9 years ago|reply
Someone proposed tracking django developers using the django command line? What a ludicrous and creepy idea.

edit: why downvote? that's what it says:

> the developer commands: startproject, startapp, runserver

[+] yeukhon|9 years ago|reply
So even if we do have an accurate usage count, say 10 millions, so what? What's the Foundation's plan to get funding?

I think they should run annual campaign like Mozilla and Wikipedia. The spend of the money should be 100% transparent. I am not really sure why we need a Foundation. I get the hosting cost, and rewarding people to work on very difficult features and enhancements, but what else? Conference cost & scholarship? What else.

[+] rokosbasilisk|9 years ago|reply
I do not support user tracking. Id fork it at that point.
[+] cyberpanther|9 years ago|reply
I use Django and don't mind being tracked if it helps development. However, the proposed tracking sounds like hit tracking which doesn't give you any meaningful numbers only trends. So I think tracking pip installs would give you the same trends.
[+] Walkman|9 years ago|reply
The best part:

"It is encouraging to see that a community can discuss such issues without heating up too much and shows great maturity for the Django project."

[+] daenney|9 years ago|reply
I agree though. The Django (and Python) community in my experience has been good at actually debating issues on their merit, and trying to keep own feelings/opinions with no facts to back them up out of it. Of course this doesn't always work and there's always going to be some comments that don't follow those principles, especially with more controversial topics.
[+] icebraining|9 years ago|reply
jezdez' proposal seems to be rather reasonable: just force the user to explicitly select yes or no - that gets over the objection that people will be too lazy to opt-in, since the effort is the same. And it removes another source of bias, which is the disabling of the tracking by redistributers like Debian, since the user does provide explicit permission.
[+] toyg|9 years ago|reply
If forced on screen with a honest message, people will just opt out in droves and make the numbers as useless as the PyPI-download ones.

This seems such a huge waste of time and effort. If they can't get funding by showing massive PyPI numbers, they won't get funding by showing massive startapp numbers.

[+] twsted|9 years ago|reply
The threat to add user tracking could be the best incentive for me to donate more to the project.
[+] Lazare|9 years ago|reply
I think this is an strong idea, and I don't see any issues with the proposed implementation using google analytics.

Certainly it seems more practical than any of the proposed alternatives suggested here. (Eg, micropayments. Come on, that's not even plausible...)

[+] smoyer|9 years ago|reply
I allow both Eclipse and Firefox DE to collect usage and bug information during my use of those systems ... I feel there are a few keys to making this decision for both platforms:

- I can opt out if I want to

- I can see what's sent if I want to

- The information is anonymized and aggregated

I would assume that Django developers would feel the same way as I do if there were these guarantees - that it's also in my interest for the software to improve.

[+] Rondom|9 years ago|reply
I think they did a very good job in discussing it openly instead of going the homebrew-way.
[+] toyg|9 years ago|reply
What if, instead of tracking, they added micropayments? Have a very simple way to donate $1 every time you run startapp or something like that, and boom, profit.
[+] cauterized|9 years ago|reply
Would you experiment with an unfamiliar framework for the first time if it cost $10?

How many $10 frameworks would you be willing to pay for if you didn't know you were going to use them?

Would you pay $10 to install django to spin up a new env to build a pluggable library for it that you intend to open source?

What about $10 to populate your environment each time you run a build on circleci?

[+] pryelluw|9 years ago|reply
Have they tested charging for Django? Id pay a reasonable fee to use it. I mean, least I could do (aside from donating sporadically).
[+] rantanplan|9 years ago|reply
But no one else would.

The very next minute a fork of a free version would ensue.

In an era that almost all similar frameworks are for free, charging for it seems like a really bad idea for its future.

[+] JupiterMoon|9 years ago|reply
Oh well Django had a good run for me but I don't use spyware. I guess something similar can be built up using Flask.
[+] ris|9 years ago|reply
I'm really not sure posting to HN is what lwn subscriber links are for.
[+] DanBC|9 years ago|reply
https://news.ycombinator.com/item?id=5688151#5688887

> FWIW I (as the editor of LWN and the author of the article) do not mind the posting of this link. It has brought in 16,000 people (at last count), many of whom are probably unfamiliar with LWN. Some subscriptions have been sold in the process.

> Certainly I don't want large amounts of our content to be distributed this way, but an occasional posting that puts an LWN article at #1 on HN is going to do us far more good than harm.

> (That said, I do appreciate your concern!)

https://news.ycombinator.com/item?id=3793183#3793448

[+] cx1000|9 years ago|reply
Why was this downvoted? It's a reasonable concern when "LWN subscriber-only content" is in bold at the top.
[+] rcarmo|9 years ago|reply
I use Django quite a bit, and would immediately disable any such tracking mechanism, even going to the extent of maintaining my own fork if necessary.

Having this on tools (like brew) is sort of OK because you can disable it and not risk having it deployed to production. Having it on a library is senseless, risky in many regards and likely to get it banned from, say, public contracts.

It is also a likely hook for exploitation, but I'll need to see an implementation first. Which I sure hope won't happen.

[+] myf01d|9 years ago|reply
The problem is Django itself as a framework and Python as a slow infrastructure for it are getting too old with time. I love Django but it grows too restrictive as projects get more complicated (ORM and template rendering for example), not to mention the slow performance compared to new languages like Go and Elixir, which is actually Python's responsibility not Django.

Django is a monolithic framework that wants to do everything while there are good and even superior alternatives(SQlAlchemy, Jinja2, WTForms), which makes things harder for its developers.

[+] kirkdouglas|9 years ago|reply
You can easily replace template engine and ORM in Django when your project becomes large. I've personally made this several times.
[+] wheelerwj|9 years ago|reply
i dont know what that has to do with embedding tracking in code.

if youre trying to make the point that funding is made more difficult and therefore exacerbates the analytics problem, sure. but isn't that out of scope?