Posts

big data 8

                                                                             PRACTICAL NO – 8 Aim: Implementing Clustering Algorithm Using Map-Reduce   Algorithm for Mapper Input: A set of objects X = {x1, x2… xn}, A Set ofinitial Centroids C = {c1, c2, ,ck} Output: An output list which contains pairs of (Ci, xj)where 1 ≤ i≤ n and 1 ≤j ≤ k Procedure M1←{x1, x2… xm} current_centroids←C Distance (p, q) =√Σd i=1 (pi– qi) 2 (where pi (or qi)is the coordinate of p (or q) in dimension i) for all xi ϵ M1 such that 1≤i≤m do bestCentroid←null minDist←∞ for all c ϵ current_centroids do       emit (bestCentroid, xi) i+=1   dist← distance (xi, c) if (bestCentroid = null || dist<minDist) then minDist←dist bestCentroid ← c end if end for     end for return Outputlist   Algorithm for Reducer Input: (Key, Value), where key = bestCentroid and Value =Objects assigned to the lpgr'; 1\] x centroid by the mapper Output:

big data 7

                                                                   PRACTICAL NO – 7 Aim: Implementing Frequent Item Set Algorithm Using Map-Reduce. import java.io.BufferedReader; import java.io.*; import java.io.IOException; import java.net.*; import java.util.ArrayList; import java.util.*;                      importmodel.HashTreeNode;                          import model.ItemSet; import model.Transaction; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.ap

big data 6

                                                          PRACTICAL NO – 6 Aim: Implementing Bloom Filter using Map-Reduce. 1.                  import java.io.DataOutputStream; 2.                  import java.io.FileOutputStream; 3.                  import java.io.IOException; 4.                  import org.apache.hadoop.util.bloom.BloomFilter; 5.                  import org.apache.hadoop.util.bloom.Key; 6.                  import org.apache.hadoop.util.hash.Hash; 7. 8.                  public class DepartmentBloomFilterTrainer { 9.                  public static int getBloomFilterOptimalSize (int numElements, float falsePosRate) { 10.         return (int) (-numElements * (float) Math.log(falsePosRate) / Math.pow(Math.log(2), 2)); 11. } 12.         public static int getOptimalK (float numElements, float vectorSize) { 13.         return (int) Math.round(vectorSize * Math.log(2) / numElements); 14. } 15.         public static void main (String[]