abelbeepboop's comments

abelbeepboop | 10 years ago | on: Automatic bug-repair system fixes 10 times as many errors as its predecessors

It's either an SVM or multinomial logistic regression, but the bulk of the work is not the choice/design of model, but the structuring the data in a meaningful way.

The main issues in many interesting ML projects are 'how do you feed the data to the model' and 'how do you determine its success/failure'.

For instance: you could try to train a Pacman AI by feeding the game state into some model and asking for an up/down left/right output. But a lower bound for all the game states would be 2^(number of dots possible on level), making it impractical/impossible to store such a model in memory much less train it.

The strategy would be to encode the game state into a manageable number of features. This is a lossy process and the hope is that your set of features is small enough to be trained on, yet meaningful enough that the model can learn from them.

In the paper, they parse the data (code) into a syntax tree, compares it with the patched version. This identifies the 'point' in the tree where the patch modification takes place. The 'point' the 'type of modification' and a 'collection of neighboring points' are the features that are fed into the model.

tl;dr, yep, probably linear svm or multinomial logistic