How Khan Academy is using machine learning to assess student mastery

[+] noelwelsh|14 years ago|reply

You should look at liblinear and Vowpal Wabbit. The former gives you a super-fast regularized logistic regression (and many other things). The later gives you online classification, and you can fake a probabilistic model.

On a related note, you're wasting clicks using A/B testing. I emailed you guys about using a better online method (a bandit algorithm) but never heard back from anyone. If that's of interest, drop me a line (noel at untyped dot com).

Update It's occurred to me that you're using GAE, and so probably can't run C libraries like the above two projects. There is a Java library here: http://code.google.com/p/boxer-bayesian-regression/ If you're going to do per user and per exercise models you'll have many fewer data points to train your models on. You should consider sharing data between models or use a model that will give some measure of uncertainty in it's predictions. The Bayesian LR code I referenced above will give some measure of uncertainty. There is a stack of (really interesting) work on other methods that will also do this.

[+] brown9-2|14 years ago|reply

I have nothing to do with KA but I am curious by what you mean about A/B testing and bandit algorithms. Would you mind elaborating or sharing a link or two?

[+] vecter|14 years ago|reply

It's interesting how simple their feature set is. I imagine the two EMAs and the percent_correct are probably the most important inputs. A few other interesting ones might be the percent correct for say the past 10 or 20 questions (instead of explicitly cutting off the last 20 questions as they mention). They may also want to pick a different response, like % of next 10 the user gets right instead of probability of getting the next question right.

[+] pnewhook|14 years ago|reply

Wow, this is literally the exact application of the Stanford Machine Learning class up to this point.

[+] AndrewHampton|14 years ago|reply

I was thinking the exact same thing. It was really neat to read about a real world use of what I'm learning in the class.

It makes me wonder if the data set is available as well.

[+] gtrak|14 years ago|reply

the professor did say people make careers off of what we already know with linear and logistic regression.

[+] Eliezer|14 years ago|reply

Good stuff, but the thought occurs to me that what you really want to know is when the user has stopped learning - i.e., when proficiency stops increasing as a result of doing more problems - and how much each individual problem increases proficiency. But that would undoubtedly be more complicated.

[+] ja27|14 years ago|reply

Another interesting mechanism is what ChessTempo.com uses. Players and puzzles both have a Elo-like dynamic ranking. When a player beats a puzzle, the player's rank increases and the puzzle's rank decreases. This makes it easy to, over time, present players with puzzles that are of appropriate difficulty.

[+] currere|14 years ago|reply

Another ingenious approach is taken by chesstempo.com, a chess training site. Just as in chess itself the ratings of players are determined by pairwise comparisons (games between players), they pair players up against problems. If they solve the problem, the rating of the problem goes down, the rating of the player goes up. Players are given problems close to their ratings, which keeps everyone happy. I believe they use Glicko to track uncertainty in the rating.

Chapter 22 of David Barber's "Bayesian Reasoning and Machine Learning" (he makes it available online) does a nice (perhaps brief) job of explaining the progression through the Rasch model, the Bradley-Terry-Luce model and Elo.

As an aside, the way they chesstempo generate the exercises is also cute. The tactical chess problems are positions taken from high level (human) games fed into a chess engine which identifies blunderous moves where there is a single distinctly best way to respond. The challenge is to find that best move. Because they are taken from real games, they have the appearance and feel of real positions, which is important; many people believe pattern recognition is an important part of chess mastery. Apparently they've built up nearly 40000 such tactical exercises.

[+] currere|14 years ago|reply

How awful, I just saw [ja27 17 hours ago]. I've even managed to describe it in almost exactly the same way..

[+] lliiffee|14 years ago|reply

I think we should note that logistic regression has been around forever, and should probably be considered property of "statistics", not "machine learning".

[+] danteembermage|14 years ago|reply

Iteratively solving for the model parameters using gradient descent is not standard practice in a statistics class; paying attention to the numerical methods behind logistic regression is a very CS kind of thing to do.

[+] webspiderus|14 years ago|reply

this is literally the first thing taught in both of the machine learning classes I've taken from Prof. Ng at Stanford, so maybe it's the application of logistic regression more than the estimation technique itself that makes machine learning?

[+] cavedave|14 years ago|reply

>This was a fairly large change that we, understandably, only wanted to deploy to a small subset of users. This was facilitated by Bengineer Kamen's GAE/Bingo split-testing framework for App Engine.

I think this method of A/B testing has some faults. I blogged about it A/B testing. Is Khan doing it wrong?http://liveatthewitchtrials.blogspot.com/2011/09/ab-testing-...

and Allen Downey ran some simulations at Repeated tests: how bad can it be?http://allendowney.blogspot.com/2011/10/repeated-tests-how-b...

[+] ahsanhilal|14 years ago|reply

My question is how does time dependency work in this case. I am trying to wrap my head around how a prediction engine would work when your assessing students on the basis of not just past/current performance but also how much time their taking while answering each question.

I think you can model for randomness (kids getting lucky while answering a question), but if you can somehow add time-dependency to the model, then your predictability would be higher (of course this is pure speculation).

Does anyone have a good model I can look at? Any help would be appreciated.

[+] gujk|14 years ago|reply

Time-to-answer is a numeric, but perhaps not linear, predictor.

[+] im3w1l|14 years ago|reply

To solve the problem that people dont do more problems after becoming proficient, consider forcing a randomized subset to solve one extra problem for aquiring proficiency. Don't tell the users when this happens though, just show the bar as not quite full

[+] zmanji|14 years ago|reply

How soon until machines will be teaching us and grading us?

[+] losethos|14 years ago|reply

[deleted]

46 comments