big data 8
PRACTICAL NO – 8
Aim:
Implementing Clustering Algorithm Using Map-Reduce
Algorithm for Mapper
Input:
A set of objects X = {x1, x2… xn}, A Set ofinitial Centroids C = {c1, c2, ,ck}
Output:
An output list which contains pairs of (Ci, xj)where 1 ≤ i≤ n and 1 ≤j ≤ k
Procedure M1←{x1, x2… xm}
current_centroids←C
Distance (p, q) =√Σdi=1(pi–
qi)2 (where
pi (or qi)is the coordinate of p (or q) in dimension
i) for
all xi ϵ M1 such that 1≤i≤m do bestCentroid←null minDist←∞
for all c ϵ current_centroids
do
emit
(bestCentroid, xi) i+=1
dist← distance (xi, c)
if (bestCentroid = null || dist<minDist) then
minDist←dist
bestCentroid
← c
end if
end for
end for
return Outputlist
Algorithm for Reducer
Input: (Key, Value), where key = bestCentroid and
Value
=Objects
assigned to the lpgr'; 1\] x centroid by the mapper
Output: (Key, Value), where
key = oldCentroid and value = newBestCentroid which is the new centroid value calculated for that bestCentroid
Procedure
Outputlist←outputlist
from mappers
← { }
newCentroidList
← null for all β outputlist do
centroid ←β.key
object ←β.value [centroid] ← object
end for
for all centroid ϵ do
newCentroid, sumofObjects, sumofObjects← null
for all object ϵ [centroid] do sumofObjects += object numofObjects += 1
end for
newCentroid
← (sumofObjects + numofObjects)
emit
(centroid, newCentroid) end for
end
The outcome of the k-means map reduce algorithm is the
cluster points along with bounded documents as <key, value> pairs, where
key is the cluster id and value contains in the form of vector: weight. The
weight indicates the probability of vector be a point in that cluster. For
Example: Key: 92: Value: 1.0: [32:0.127,79:0.114, 97:0.114, 157:0.148 ...].
Comments
Post a Comment