Sqoop and Hive Integration: MySQL ↔ Hive Bidirectional Da...

This is article 24 in the Big Data series. Demonstrates Sqoop and Hive bidirectional data migration: MySQL to Hive, and Hive back to MySQL.

Complete illustrated version: CSDN Original | Juejin

Experimental Environment

Three 2C4G cloud servers (h121, h122, h123) forming Hadoop cluster, YARN, HDFS, and Hive services running, MySQL deployed on h122.

MySQL → Hive: Data Import

1. Create Target Database and Table in Hive

CREATE DATABASE mydb;

CREATE TABLE mydb.goodtbl (
  gname        STRING,
  serialNumber INT,
  price        INT,
  stock_number INT,
  create_time  DATE
);

2. Execute Sqoop Import Command

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --table goodtbl \
  --hive-import \
  --create-hive-table \
  --fields-terminated-by "\t" \
  --hive-overwrite \
  --hive-table mydb.goodtbl \
  -m 1

Key Parameter Explanations

Parameter	Description
`--hive-import`	Specify target as Hive (instead of direct HDFS write)
`--create-hive-table`	Auto-create Hive table if not exists (error if exists)
`--hive-overwrite`	Overwrite existing data in Hive table
`--hive-table`	Specify target Hive table, format `database.table`
`--fields-terminated-by`	Field delimiter, must match Hive table SerDe config
`-m 1`	Use single Mapper, suitable for small data volume

Execution Principle

Sqoop Hive import actually happens in two steps:

First import MySQL data to HDFS temporary directory
Then load temporary directory data to Hive table via LOAD DATA INPATH

Therefore, one MapReduce Job will be seen on YARN. After completion, both Hive metadata and HDFS data directory will be updated.

Verify Results

-- Query in Hive to confirm
SELECT COUNT(*) FROM mydb.goodtbl;
SELECT * FROM mydb.goodtbl LIMIT 5;

Hive → MySQL: Data Export

1. Create Target Table in MySQL

CREATE TABLE sqoop.goodtbl2 (
  gname        VARCHAR(50),
  serialNumber INT,
  price        INT,
  stock_number INT,
  create_time  DATE
);

2. Execute Sqoop Export Command

sqoop export \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --table goodtbl2 \
  --num-mappers 1 \
  --export-dir /user/hive/warehouse/mydb.db/goodtbl \
  --input-fields-terminated-by "\t"

Export Parameter Explanations

Parameter	Description
`--table`	MySQL target table name
`--export-dir`	Actual path of Hive table on HDFS
`--input-fields-terminated-by`	Field delimiter when reading HDFS files
`--num-mappers`	Parallel Mapper count (each Mapper executes batch INSERT during export)

Hive table HDFS path pattern: /user/hive/warehouse/<database>.db/<table>. Use DESCRIBE FORMATTED mydb.goodtbl; in Hive to confirm actual path.

Verify Results

-- Confirm data in MySQL
SELECT COUNT(*) FROM sqoop.goodtbl2;
SELECT * FROM sqoop.goodtbl2 LIMIT 5;

Common Issues

Q: --create-hive-table reports table already exists error A: Remove the parameter, or manually DROP TABLE in Hive first. If want to overwrite data and keep table structure, use only --hive-overwrite.

Q: NULL columns appear in Hive query results A: Usually --fields-terminated-by delimiter inconsistent with Hive table definition, keep both the same (recommended \t or \001).

Q: Data out of order when exporting to MySQL A: Sqoop export uses parallel INSERT, row order not guaranteed. If order needed, ORDER BY in MySQL after import.

Next article introduces Sqoop incremental data import (CDC), implementing sync of only new or changed data to avoid full scan each time.