preview

Algorithm Essay

Decent Essays

''' @summary: To remove keywords that are similar. This is to bring more diversity to the top ranked keywords. We use an algorithm called K-means for grouping. This algorithm is implemented and have not used third party library other than numpy (for computation) and gensim (loading of data set) Each word in the keyword list is converted to a vector using word2vec. We use GoogleNews pre-trained dataset to get vector of word. Then, we divide the n keywords into k clusters. Initially, we randomly choose k centroids. The euclidian distance of each vector is calculated from the centroid. Each cluster is formed with a centroid and vectors which are closest to it. Once cluster is formed, a new centroid is found for each cluster. Based on the …show more content…

centroid_list = random.sample(X,

Get Access