This is article 24 in the Big Data series. Demonstrates Sqoop and Hive bidirectional data migration: MySQL to Hive, and Hive back to MySQL.
Complete illustrated version: CSDN Original | Juejin
Experimental Environment
Three 2C4G cloud servers (h121, h122, h123) forming Hadoop cluster, YARN, HDFS, and Hive services running, MySQL deployed on h122.
MySQL → Hive: Data Import
1. Create Target Database and Table in Hive
CREATE DATABASE mydb;
CREATE TABLE mydb.goodtbl (
gname STRING,
serialNumber INT,
price INT,
stock_number INT,
create_time DATE
);
2. Execute Sqoop Import Command
sqoop import \
--connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
--username hive \
--password hive@wzk.icu \
--table goodtbl \
--hive-import \
--create-hive-table \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table mydb.goodtbl \
-m 1
Key Parameter Explanations
| Parameter | Description |
|---|---|
--hive-import | Specify target as Hive (instead of direct HDFS write) |
--create-hive-table | Auto-create Hive table if not exists (error if exists) |
--hive-overwrite | Overwrite existing data in Hive table |
--hive-table | Specify target Hive table, format database.table |
--fields-terminated-by | Field delimiter, must match Hive table SerDe config |
-m 1 | Use single Mapper, suitable for small data volume |
Execution Principle
Sqoop Hive import actually happens in two steps:
- First import MySQL data to HDFS temporary directory
- Then load temporary directory data to Hive table via
LOAD DATA INPATH
Therefore, one MapReduce Job will be seen on YARN. After completion, both Hive metadata and HDFS data directory will be updated.
Verify Results
-- Query in Hive to confirm
SELECT COUNT(*) FROM mydb.goodtbl;
SELECT * FROM mydb.goodtbl LIMIT 5;
Hive → MySQL: Data Export
1. Create Target Table in MySQL
CREATE TABLE sqoop.goodtbl2 (
gname VARCHAR(50),
serialNumber INT,
price INT,
stock_number INT,
create_time DATE
);
2. Execute Sqoop Export Command
sqoop export \
--connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
--username hive \
--password hive@wzk.icu \
--table goodtbl2 \
--num-mappers 1 \
--export-dir /user/hive/warehouse/mydb.db/goodtbl \
--input-fields-terminated-by "\t"
Export Parameter Explanations
| Parameter | Description |
|---|---|
--table | MySQL target table name |
--export-dir | Actual path of Hive table on HDFS |
--input-fields-terminated-by | Field delimiter when reading HDFS files |
--num-mappers | Parallel Mapper count (each Mapper executes batch INSERT during export) |
Hive table HDFS path pattern:
/user/hive/warehouse/<database>.db/<table>. UseDESCRIBE FORMATTED mydb.goodtbl;in Hive to confirm actual path.
Verify Results
-- Confirm data in MySQL
SELECT COUNT(*) FROM sqoop.goodtbl2;
SELECT * FROM sqoop.goodtbl2 LIMIT 5;
Common Issues
Q: --create-hive-table reports table already exists error
A: Remove the parameter, or manually DROP TABLE in Hive first. If want to overwrite data and keep table structure, use only --hive-overwrite.
Q: NULL columns appear in Hive query results
A: Usually --fields-terminated-by delimiter inconsistent with Hive table definition, keep both the same (recommended \t or \001).
Q: Data out of order when exporting to MySQL
A: Sqoop export uses parallel INSERT, row order not guaranteed. If order needed, ORDER BY in MySQL after import.
Next article introduces Sqoop incremental data import (CDC), implementing sync of only new or changed data to avoid full scan each time.