Visualizing 40,000 student code submissions

[+] Shizka|12 years ago|reply

Quite cool when you think about it. Each cluster probably represent a different method for solving the problem. Awesome that it's possible to classify the solutions like this. I think this might be usable for better feedback on Coursera. Cool!

[+] informatimago|12 years ago|reply

Yes, and used in the reverse, starting from a red cluster, you can derivate a working program. Now let's just find a way to find those clusters from problem statements ;-)

[+] rube|12 years ago|reply

Interesting that the outer edges basically have less occurrence of failed answers. I guess that means that there is a positive correlation between thinking outside the box and success? ;)

[+] cdman|12 years ago|reply

Interesting choice of colors - red signifying that all the unit tests pass :-) (that is usually considered "green")

[+] yaddayadda|12 years ago|reply

The authors say that the colors correlated to similar implementations that result in similar behavior, with red specifically indicative of passing all tests. (I totally agree with you that green would have been a much more logical choice). Which only leaves green and blue. I'm curious what the distinction is between those implementations (e.g, blue passed some of the tests, green didn't pass any tests).

[+] emilesilvis|12 years ago|reply

I would agree!

[+] chrismorgan|12 years ago|reply

Abstract art? Yes. Of the best variety!

A couple of years ago, I made some abstract art of the inheritance structure of a large project written in a language with (extensively used) multiple inheritance, there being around 1800 classes. No one was game to produce a 15m-wide, 30cm-high wallpaper (the traditional type) of it, so I just removed the class names, leaving classes just dots and made it my computer's wallpaper. It's got quite a few comments. Still, it was nowhere near as pretty as this.

[+] akjetma|12 years ago|reply

Do you still have the image? I've created a few myself and they're really fun to look at. It's interesting to see the symmetry and orderliness of a project in its early stages as compared to the frankenstein's monster it eventually becomes. I'll post mine if I can find or re-run them.

[+] iMark|12 years ago|reply

Looks like a load of Pollocks :)

[+] khawkins|12 years ago|reply

I don't exactly see the value in this visualization. Clustering measures and feature analysis would provide far more insight into what's going on here. In fact, it's not even clear how large the dominant clusters are or what all of those speckles mean.

[+] tlarkworthy|12 years ago|reply

?

clustering is putting similar things near similar things. Tree edit distance is quite a natural measure of distance for tree like things like programs.

You can't avoid some warping when putting high dimensional manifolds on a low dimensional one. You can see a lot of their data does cluster properly but their are some long range red arcs (in the embedding space) which are side effects of warping (they are near in data space).

You can see a cluster of green which is clearly of interest ... why did so many students get the wrong answer in the same way?

I see lots of value in that picture.

[+] has2k1|12 years ago|reply

But what is it good for?

Well we have a lot of ideas! One thing that we did, for example, was to apply clustering to discover the ``typical'' approaches to this problem. This allowed us to discover common failure modes in the class, but also gave us a way to find multiple correct approaches to the same problem. Stay tuned for more results from the codewebs team!

[+] mrcactu5|12 years ago|reply

I keep meaning to take the machine learning course.

This is a great way of using metadata to search for patterns in student assingments. This could detect different "approaches" or "strategies"

[+] RVijay007|12 years ago|reply

Probably also allows them to more easily detect cheating on coding assignments.

[+] mcherm|12 years ago|reply

No, comparison of text rather than comparison of ASTs is is better for that purpose. There are many good reasons for ASTs to be equivalent and few good reasons for text to match.

[+] mkelley82|12 years ago|reply

Very cool visualization, I'd like to see how this sort of technique could be applied to other real world problems.

[+] yeukhon|12 years ago|reply

And probably a way to find who is cheating and who isn't :)

[+] losethos|12 years ago|reply

[deleted]

19 comments