91-DOU : Day #3

Course : Cluster Analysis & Unsupervised ML in Python

Video Mins completed : 27 mins

Last Video # completed : 29

Notes

KNN Cost functions –

Current metric : Distance from Cluster mean or center.

  • Does not scale well with default values.If features have varied scales, unless the data is scaled the algo will not work as the distance metric will vary wildly.
  • Does fit with large datasets
  • Sensitive to K

Another metric : Purity

Requires labels. Such methods called “external validation” methods. Examples

  • Rand Measure
  • F-measure
  • Jaccard Index
  • Normalized Mutual Info

Metric on unlabeled data : Davies Bouldin Index (DBI)

Lower DBI == better

How to choose ‘K’?

Value of K post which there is not significant change in cost will be the ideal value of K.

Course : Python for Machine Learning and Data Science Masterclass

Video Mins completed : 25 mins

Last Video # completed : 197