Sqoop Import From MySQL Database data to HDFS


Sqoop Import From MySQL Database data to HDFS

The 'Import tool' imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in the text files or as binary data in Avro and Sequence files.

Step 1 - Change the directory to /usr/local/hadoop/sbin




$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.




$ start-all.sh

Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.




$ jps

Step 4 - Enter into MySQL command line inteface(CLI). Open a terminal (CTRL + ALT + T) and type the following command.




$ mysql -u root -p

Enter password




Enter password :****

Step 5 - Create a new database 'userdb'




mysql> create database userdb;

Step 6 - Use database 'userdb'




mysql> use userdb;

Step 7 - Create a new table 'employee'




mysql> create table employee(id int,name varchar(20),deg varchar(20),salary int,dept varchar(20), primary key(id));

Step 8 - Insert data into employee table.




mysql> insert into employee values(1201,'gopal','manager',50000,'TP');
mysql> insert into employee values(1202,'manisha','Proof reader', 50000,'TP');
mysql> insert into employee values(1203,'khalil', 'php dev',30000,'AC');
mysql> insert into employee values(1204,'prasanth','php dev',30000,'AC');
mysql> insert into employee values(1205,'kranthi','admin','20000','TP');


Step 9 - Verify




mysql> select * from employee;

Step 10 - Create a new table 'employee_address'




mysql> create table employee_address(id int,hno varchar(20),street varchar(20),city varchar(20),primary key(id));

Step 11 - Insert data into employee_address table.




mysql> insert into employee_address values(1201,'288A','vgiri','jublee');
mysql> insert into employee_address values(1202,'108I','aoc','sec-bad');
mysql> insert into employee_address values(1203,'144Z','pgutta','hyd');
mysql> insert into employee_address values(1204,'78B','old city','sec-bad');
mysql> insert into employee_address values(1205,'720X','hitec','sec-bad');


Step 12 - Verify




mysql> select * from  employee_address;

Step 13 - Create a new table 'employee_contact'




mysql> create table employee_contact(id int,phno varchar(20),email varchar(30),primary key(id));

Step 14 - Insert data into employee_contact table.




mysql> insert into employee_contact values(1201,'2356742','gopal@tp.com');
mysql> insert into employee_contact values(1202,'1661663','manisha@tp.com');
mysql> insert into employee_contact values(1203,'8887776','khalil@ac.com');
mysql> insert into employee_contact values(1204,'9988774','prasanth@ac.com');
mysql> insert into employee_contact values(1205,'2356742','kranthi@tp.com');


Importing a Table

Sqoop tool 'import' is used to import table data from the table to the Hadoop file system as a text file or a binary file.

Step 15 - Change the directory to /usr/local/sqoop/bin




$ cd /usr/local/sqoop/bin

Step 16 - The following command is used to import the employee table from MySQL database server to HDFS.




$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee --m 1


Step 17 - Verify




$ hdfs dfs -cat /user/hduser/employee/part-m-00000

Importing into Target Directory

We can specify the target directory while importing table data into HDFS using the Sqoop import tool.

Step 18 - The following command is used to import employee_address table data into '/targetflodername' directory.




$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee_address \
--m 1 \
--target-dir /targetflodername


Step 19 - Verify




$ hdfs dfs -cat /targetflodername/part-m-00000

Import Subset of Table Data

We can import a subset of a table using the 'where' clause in Sqoop import tool. It executes the corresponding SQL query in the respective database server and stores the result in a target directory in HDFS.

$ cd /usr/local/sqoop/bin

Step 20 - The following command is used to import a subset of employee_address table data. The subset query is to retrieve the employee id and address, who lives in Secunderabad city.




$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee_address \
--m 1 \
--where "city ='sec-bad'" \
--target-dir /sqoop/wherequery


Step 21 - Verify




$ hdfs dfs -cat /sqoop/wherequery/part-m-00000

Incremental Import

Incremental import is a technique that imports only the newly added rows in a table. It is required to add 'incremental', 'check-column', and 'last-value' options to perform the incremental import.

Step 22 - Insert some more data into employee table.




mysql> insert into employee values(1206,'satish p','grp des','20000','GR');

Before going to next step please remove old employee folder from HDFS.




$ hdfs dfs -rmr /user/hduser/employee

Step 23 - The following command is used to perform the incremental import in the employee table.




$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee \
--m 1 \
--incremental append \
--check-column id \
--last-value 1205


Step 24 - The following command is used to verify the imported data from employee table to HDFS employee/ directory.




$ hdfs dfs -cat /user/hduser/employee/part-m-00000

Step 25 - The following command is used to see the modified or newly added rows from the employee table.




$ hdfs dfs -cat /user/hduser/employee/part-m-00001

Import all tables

Import all the tables from the RDBMS database server to the HDFS. Each table data is stored in a separate directory and the directory name is same as the table name.

Let us take an example of importing all tables from the userdb database. The list of tables that the database userdb contains is as follows.




mysql> show tables;

Step 26 - The following command is used to import all the tables from the userdb database.




$ sqoop import-all-tables \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root


Step 27 - Verify




$ hdfs dfs -ls /user/hduser/



Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *