big data 8

                                                             PRACTICAL NO – 8

Aim: Implementing Clustering Algorithm Using Map-Reduce

 

Algorithm for Mapper

Input: A set of objects X = {x1, x2… xn}, A Set ofinitial Centroids C = {c1, c2, ,ck}

Output: An output list which contains pairs of (Ci, xj)where 1 ≤ i≤ n and 1 ≤j ≤ k

Procedure M1←{x1, x2… xm}

current_centroids←C

Distance (p, q) =√Σdi=1(pi– qi)2 (where pi (or qi)is the coordinate of p (or q) in dimension

i) for all xi ϵ M1 such that 1≤i≤m do bestCentroid←null minDist←∞

for all c ϵ current_centroids do

 

 


 


emit (bestCentroid, xi) i+=1

 

dist← distance (xi, c)

if (bestCentroid = null || dist<minDist) then

minDist←dist

bestCentroid ← c

end if

end for


 

 

end for


return Outputlist

 

Algorithm for Reducer

Input: (Key, Value), where key = bestCentroid and Value

=Objects assigned to the lpgr'; 1\] x centroid by the mapper Output: (Key, Value), where key = oldCentroid and value = newBestCentroid which is the new centroid value calculated for that bestCentroid

Procedure

Outputlist←outputlist from mappers

← { }

newCentroidList ← null for all β outputlist do

centroid ←β.key object ←β.value [centroid] ← object

end for

for all centroid ϵ     do

newCentroid, sumofObjects, sumofObjects← null

for all object ϵ     [centroid] do sumofObjects += object numofObjects += 1

end for

newCentroid ← (sumofObjects + numofObjects)

emit (centroid, newCentroid) end for

end

The outcome of the k-means map reduce algorithm is the cluster points along with bounded documents as <key, value> pairs, where key is the cluster id and value contains in the form of vector: weight. The weight indicates the probability of vector be a point in that cluster. For Example: Key: 92: Value: 1.0: [32:0.127,79:0.114, 97:0.114, 157:0.148 ...].

The final output of the program will be the cluster name, filename: number of text documents that belong to that cluster

Comments

Popular posts from this blog

big data 5

ML programs