top | item 45492231

(no title)

mcramer | 4 months ago

> Who is using K-means for classification? If you have labels, then a supervised algorithm seems like a more appropriate choice. The generated data is labeled but we can imagine those labels don't exist when running k-means. There are many applications for unsupervised clustering. I don't, however, think that there are many applications for running much of anything on an Apple ][+.

> K-means clustering is a recursive algorithm My bad. It's iterative. I'll fix that. Thanks.

> If we know that the distributions are Gaussian, which is very frequently the case in machine learning Gaussian distributions are very frequent and important in machine learning because of the Central Limit Theorem but, beyond that, you are correct. While many natural phenomena are approximately normal, the reason for the Gaussian's frequent use is often mathematical mathematical convenience. I'll correct my post.

> we can employ a more powerful algorithm: Expectation Maximization (EM) Excellent point. I will fix that, too. "While k-means is simple, it does not take advantage of our knowledge of the Gaussian nature of the data. If we know that the distributions are at least approximately Gaussian, which is frequently the case, we can employ a more powerful application of the Expectation Maximization (EM) framework (k-means is a specific implementation of centroid-based clustering that uses an iterative approach similar to EM with 'hard' clustering) that takes advantage of this." Thank you for pointing out all of this!

discuss

No comments yet.