This is article 23 in the Big Data series. Introduces three practical ways Sqoop imports partial data from MySQL to HDFS by condition.

Complete illustrated version: CSDN Original | Juejin

Why Partial Import is Needed

Full import is useful for initialization, but production often only needs:

  • Specific columns (reduce transfer volume)
  • Rows meeting conditions (e.g., products with price > threshold)
  • Result set after multi-table JOIN

Sqoop provides three ways to achieve partial import.

Method 1: —query Custom Query

Suitable for complex SQL filtering or multi-table JOIN:

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --target-dir /root/sqoop_query \
  --append \
  -m 1 \
  --fields-terminated-by "\t" \
  --query 'select gname, serialNumber, price, stock_number, create_time
           from goodtbl where price > 88 and $CONDITIONS;'

Core Precautions:

  • WHERE clause must include $CONDITIONS placeholder, Sqoop uses it to inject split conditions
  • When using single quotes, $CONDITIONS doesn’t need escaping; when using double quotes, write as \$CONDITIONS
  • --append means append data to existing directory instead of overwriting

Method 2: —columns Specify Columns

Simplest way when only importing partial columns:

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --target-dir /root/sqoop_columns \
  --delete-target-dir \
  --num-mappers 1 \
  --fields-terminated-by "\t" \
  --columns gname,serialNumber,price \
  --table goodtbl

Note: Multiple columns separated by commas, no spaces, otherwise parsing fails.

Method 3: —where Condition Filter

Use standard SQL WHERE syntax, combined with --table, most intuitive:

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --target-dir /root/sqoop_where \
  --delete-target-dir \
  -m 1 \
  --fields-terminated-by "\t" \
  --table goodtbl \
  --where "price >= 68"

Comparison of Three Methods

MethodApplicable ScenarioMulti-table JOINColumn FilterRow Filter
--queryComplex SQL, multi-table JOINYesYesYes
--columnsOnly partial columns neededNoYesNo
--whereSimple row filterNoNoYes

--columns and --where can be combined, but cannot be used with --query.

Choice Between —append and —delete-target-dir

ParameterBehaviorApplicable Scenario
--appendAppend files to existing directoryIncremental scenario, don’t overwrite historical data
--delete-target-dirDelete target directory first then writeFull overwrite, ensure idempotence
No parameterError and exit if directory existsDefault protection against mistakes

Verify Results

# View HDFS files
hdfs dfs -ls /root/

# Check data content
hdfs dfs -cat /root/sqoop_query/part-m-00000

After successful import, can view MapReduce Job execution details in YARN Web UI (http://h121.wzk.icu:8088), confirm Mapper task allocation and time consumption.


Next article introduces Sqoop importing MySQL data directly to Hive table, and exporting Hive data back to MySQL bidirectional operations.