Sqoop Partial Import: --query, --columns, --where Three F...

This is article 23 in the Big Data series. Introduces three practical ways Sqoop imports partial data from MySQL to HDFS by condition.

Complete illustrated version: CSDN Original | Juejin

Why Partial Import is Needed

Full import is useful for initialization, but production often only needs:

Specific columns (reduce transfer volume)
Rows meeting conditions (e.g., products with price > threshold)
Result set after multi-table JOIN

Sqoop provides three ways to achieve partial import.

Method 1: —query Custom Query

Suitable for complex SQL filtering or multi-table JOIN:

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --target-dir /root/sqoop_query \
  --append \
  -m 1 \
  --fields-terminated-by "\t" \
  --query 'select gname, serialNumber, price, stock_number, create_time
           from goodtbl where price > 88 and $CONDITIONS;'

Core Precautions:

WHERE clause must include $CONDITIONS placeholder, Sqoop uses it to inject split conditions
When using single quotes, $CONDITIONS doesn’t need escaping; when using double quotes, write as \$CONDITIONS
--append means append data to existing directory instead of overwriting

Method 2: —columns Specify Columns

Simplest way when only importing partial columns:

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --target-dir /root/sqoop_columns \
  --delete-target-dir \
  --num-mappers 1 \
  --fields-terminated-by "\t" \
  --columns gname,serialNumber,price \
  --table goodtbl

Note: Multiple columns separated by commas, no spaces, otherwise parsing fails.

Method 3: —where Condition Filter

Use standard SQL WHERE syntax, combined with --table, most intuitive:

sqoop import \
  --connect jdbc:mysql://h122.wzk.icu:3306/sqoop \
  --username hive \
  --password hive@wzk.icu \
  --target-dir /root/sqoop_where \
  --delete-target-dir \
  -m 1 \
  --fields-terminated-by "\t" \
  --table goodtbl \
  --where "price >= 68"

Comparison of Three Methods

Method	Applicable Scenario	Multi-table JOIN	Column Filter	Row Filter
`--query`	Complex SQL, multi-table JOIN	Yes	Yes	Yes
`--columns`	Only partial columns needed	No	Yes	No
`--where`	Simple row filter	No	No	Yes

--columns and --where can be combined, but cannot be used with --query.

Choice Between —append and —delete-target-dir

Parameter	Behavior	Applicable Scenario
`--append`	Append files to existing directory	Incremental scenario, don’t overwrite historical data
`--delete-target-dir`	Delete target directory first then write	Full overwrite, ensure idempotence
No parameter	Error and exit if directory exists	Default protection against mistakes

Verify Results

# View HDFS files
hdfs dfs -ls /root/

# Check data content
hdfs dfs -cat /root/sqoop_query/part-m-00000

After successful import, can view MapReduce Job execution details in YARN Web UI (http://h121.wzk.icu:8088), confirm Mapper task allocation and time consumption.

Next article introduces Sqoop importing MySQL data directly to Hive table, and exporting Hive data back to MySQL bidirectional operations.