Course : Cluster Analysis & Unsupervised ML in Python
Video Mins completed : 27 mins
Last Video # completed : 29
Notes
KNN Cost functions –
Current metric : Distance from Cluster mean or center.
- Does not scale well with default values.If features have varied scales, unless the data is scaled the algo will not work as the distance metric will vary wildly.
- Does fit with large datasets
- Sensitive to K
Another metric : Purity
Requires labels. Such methods called “external validation” methods. Examples
- Rand Measure
- F-measure
- Jaccard Index
- Normalized Mutual Info
Metric on unlabeled data : Davies Bouldin Index (DBI)
Lower DBI == better
How to choose ‘K’?
Value of K post which there is not significant change in cost will be the ideal value of K.
Course : Python for Machine Learning and Data Science Masterclass
Video Mins completed : 25 mins
Last Video # completed : 197