big data 5

 

PRACTICAL NO – 5

Aim: Implementing Database Operations on Hive.

 




To list out the databases in Hive warehouse, enter the command ‘show databases’.

The database creates in a default location of the Hive warehouse. In Cloudera, Hive database store in a /user/hive/warehouse.








The command to use the database is USE <data base name>

Copy the input data to HDFS from local by using the copy From Local command.




When we create a table in hive, it creates in the default location of the hive warehouse. – “/user/hive/warehouse”, after creation of the table we can move the data from HDFS to hive table.

The following command creates a table with in location of “/user/hive/warehouse/retail.db”




Note : retail.db is the database created in the Hive warehouse.


 

Data Manipulation Language (DML )

DML statements are used to retrieve, store, modify, delete, insert and update data in the database.

Example :

LOAD, INSERT Statements. Syntax :

LOAD data <LOCAL> inpath <file path> into table [tablename]

The Load operation is used to move the data into corresponding Hive table. If the

keyword local is specified, then in the load command will give the local file system path. If the keyword local is not specified we have to use the HDFS path of the file.






Here are some examples for the LOAD data LOCAL command

After loading the data into the Hive table we can apply the Data Manipulation Statements or aggregate functions retrieve the data.

Example to count number of records:




Count aggregate function is used count the total number of the records in a table.

‘create external’ Table :

The create external keyword is used to create a table and provides a location where the  table will create, so that Hive does not use a default location for this table. An EXTERNAL table points to any HDFS location for its storage, rather than default storage.


 



Insert Command:

The insert command is used to load the data Hive table. Inserts can be done to a table or a partition.

  INSERT OVERWRITE is used to overwrite the existing data in the table or partition.

 




INSERT INTO is used to append the data into existing data in a table. (Note: INSERT INTO syntax is work from the version 0.8)

 

Example for ‘Partitioned By’ and ‘Clustered By’ Command :




‘Partitioned by‘ is used to divided the table into the Partition and can be divided in to buckets by using the ‘Clustered By‘ command.

When we insert the data Hive throwing errors, the dynamic partition mode is strict  and dynamic partition not enabled (by Jeff at dresshead website). So we need to set the following parameters in Hive shell.

Comments

Popular posts from this blog

ML programs

big data 8

big data 6