Comments on: Data Mining With Weka

By: Dan

Dan — Sat, 05 Dec 2009 21:49:35 +0000

The clustering algorithm Weka uses is defined in the command you use, in the case above it is the k-means clustering.
http://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm

The Expectation Maximisation (EM) clustering is "better" and as a plus you don't have to guess/provide the number of clusters as it can figure this out using cross-validation (internally runs multiple times and picks the number of clusters which resulted in the highest expectation).

By: Weka Mining Update: It Works! at BarneyBlog

Weka Mining Update: It Works! at BarneyBlog — Thu, 24 Jul 2008 21:08:24 +0000

[...] in April, I posted about how I was using Weka to do some asset prioritization.Â The gist of it was that users would rank assets on a 1-5 scale, and then then the system would [...]

By: barneyb

barneyb — Tue, 24 Jun 2008 18:48:20 +0000

Jim, It's hard to say. There has been a slight skewing downward overall, but from some manual checks that I've done against asset similarity (e.g. manually figuring out what cluster an unranked asset probably belongs to), it's reasonable. To put that another way, prioritized assets average a slightly higher rank than randomized assets, which is just what you'd expect. I haven't done any actual math to quantify the skew, but only because what I'm seeing seems reasonable.

By: Jim

Jim — Tue, 24 Jun 2008 18:16:36 +0000

Hi Barney,

How did your test of seeding unranked items go? Did it skew the results at all?

Can you post some of your finding?

Thanks,

–j