This is article 12 in the Big Data series. Introduces basic concepts of Hive data warehouse, architecture principles, and complete steps to install Hive 2.3.9 on three-node Hadoop cluster.
Complete illustrated version: CSDN Original | Juejin
What is Hive
Hive is a data warehouse tool based on Hadoop. Its core capability is mapping structured data files to database tables and providing SQL-like query language (HiveQL), with underlying HQL automatically converted to MapReduce tasks for execution.
Hive is not a database—it has no storage engine of its own. Data is stored in HDFS, computation depends on MapReduce/Tez/Spark.
Core Features
- Write Once Read Many: Suitable for batch offline analysis
- SQL Dialect: HiveQL syntax close to standard SQL, lower learning curve
- Extensible: Supports custom functions (UDF/UDAF/UDTF)
- Fault Tolerant: Relies on HDFS and YARN’s fault tolerance mechanisms
Hive Architecture
Client (CLI / JDBC / Web UI)
↓
HiveServer2 (Thrift Service)
↓
Driver (Parse → Compile → Optimize → Execute)
↓
Metastore (Metadata: Table Structure, Partitions, HDFS Paths)
↓
Execution Engine (MapReduce / Tez / Spark)
↓
HDFS (Actual Data Storage)
| Component | Responsibility |
|---|---|
| Driver | SQL parsing, logical/physical plan generation |
| Metastore | Stores metadata (default MySQL/MariaDB) |
| HiveServer2 | Exposes JDBC/Thrift interface externally |
| Execution Engine | Submits physical plan to YARN for execution |
Pros and Cons Analysis
Pros
- Low learning cost, SQL engineers can get started quickly
- Can handle PB-level data (MapReduce scales horizontally)
- Supports UDF, flexible business logic extension
- Unified metadata management, shares Metastore with Spark/Impala
Cons
- HQL expression capability limited, complex iterative computation difficult
- MapReduce execution efficiency low, high latency (minute-level)
- Auto-generated MR code lacks targeted optimization
- Difficult to tune, requires deep understanding of underlying mechanisms
Install Hive 2.3.9
1. Download and Extract
tar -zxvf apache-hive-2.3.9-bin.tar.gz -C /opt/servers/
2. Configure Environment Variables
Edit /etc/profile, add:
export HIVE_HOME=/opt/servers/apache-hive-2.3.9-bin
export PATH=$PATH:$HIVE_HOME/bin
Apply:
source /etc/profile
3. Install and Configure MariaDB (Store Metadata)
# Install MariaDB
sudo apt install mariadb-server -y
# Configure remote access (edit bind address)
# Edit /etc/mysql/mariadb.conf.d/50-server.cnf
# Change bind-address = 127.0.0.1 to 0.0.0.0
# Create Hive metadata database and user
mysql -u root -p
CREATE DATABASE hive_meta DEFAULT CHARACTER SET utf8;
CREATE USER 'hive'@'%' IDENTIFIED BY 'hive123';
GRANT ALL ON hive_meta.* TO 'hive'@'%';
FLUSH PRIVILEGES;
4. Configure hive-site.xml
Create hive-site.xml in $HIVE_HOME/conf/:
<configuration>
<!-- Metadata database connection -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://h121.wzk.icu:3306/hive_meta?useSSL=false&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive123</value>
</property>
<!-- Hive data warehouse path -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!-- CLI display optimization -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
</configuration>
5. Install MySQL JDBC Driver
Copy mysql-connector-java-5.1.x.jar to $HIVE_HOME/lib/.
6. Initialize Metadata Schema
schematool -dbType mysql -initSchema
Success creates about 70 Hive metadata tables in MariaDB.
7. Verify Installation
hive
# Enter Hive CLI
hive> show databases;
# Output: default
Data Warehouse Default Path
Hive defaults to storing data in HDFS under /user/hive/warehouse/:
- Database:
/user/hive/warehouse/<db_name>.db/ - Table:
/user/hive/warehouse/<db_name>.db/<table_name>/
Next article will cover Hive DDL/DML operations including database/table creation and data import.