This is article 12 in the Big Data series. Introduces basic concepts of Hive data warehouse, architecture principles, and complete steps to install Hive 2.3.9 on three-node Hadoop cluster.

Complete illustrated version: CSDN Original | Juejin

What is Hive

Hive is a data warehouse tool based on Hadoop. Its core capability is mapping structured data files to database tables and providing SQL-like query language (HiveQL), with underlying HQL automatically converted to MapReduce tasks for execution.

Hive is not a database—it has no storage engine of its own. Data is stored in HDFS, computation depends on MapReduce/Tez/Spark.

Core Features

  • Write Once Read Many: Suitable for batch offline analysis
  • SQL Dialect: HiveQL syntax close to standard SQL, lower learning curve
  • Extensible: Supports custom functions (UDF/UDAF/UDTF)
  • Fault Tolerant: Relies on HDFS and YARN’s fault tolerance mechanisms

Hive Architecture

Client (CLI / JDBC / Web UI)

  HiveServer2 (Thrift Service)

  Driver (Parse → Compile → Optimize → Execute)

  Metastore (Metadata: Table Structure, Partitions, HDFS Paths)

  Execution Engine (MapReduce / Tez / Spark)

  HDFS (Actual Data Storage)
ComponentResponsibility
DriverSQL parsing, logical/physical plan generation
MetastoreStores metadata (default MySQL/MariaDB)
HiveServer2Exposes JDBC/Thrift interface externally
Execution EngineSubmits physical plan to YARN for execution

Pros and Cons Analysis

Pros

  • Low learning cost, SQL engineers can get started quickly
  • Can handle PB-level data (MapReduce scales horizontally)
  • Supports UDF, flexible business logic extension
  • Unified metadata management, shares Metastore with Spark/Impala

Cons

  • HQL expression capability limited, complex iterative computation difficult
  • MapReduce execution efficiency low, high latency (minute-level)
  • Auto-generated MR code lacks targeted optimization
  • Difficult to tune, requires deep understanding of underlying mechanisms

Install Hive 2.3.9

1. Download and Extract

tar -zxvf apache-hive-2.3.9-bin.tar.gz -C /opt/servers/

2. Configure Environment Variables

Edit /etc/profile, add:

export HIVE_HOME=/opt/servers/apache-hive-2.3.9-bin
export PATH=$PATH:$HIVE_HOME/bin

Apply:

source /etc/profile

3. Install and Configure MariaDB (Store Metadata)

# Install MariaDB
sudo apt install mariadb-server -y

# Configure remote access (edit bind address)
# Edit /etc/mysql/mariadb.conf.d/50-server.cnf
# Change bind-address = 127.0.0.1 to 0.0.0.0

# Create Hive metadata database and user
mysql -u root -p
CREATE DATABASE hive_meta DEFAULT CHARACTER SET utf8;
CREATE USER 'hive'@'%' IDENTIFIED BY 'hive123';
GRANT ALL ON hive_meta.* TO 'hive'@'%';
FLUSH PRIVILEGES;

4. Configure hive-site.xml

Create hive-site.xml in $HIVE_HOME/conf/:

<configuration>
  <!-- Metadata database connection -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://h121.wzk.icu:3306/hive_meta?useSSL=false&amp;characterEncoding=UTF-8</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive123</value>
  </property>

  <!-- Hive data warehouse path -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>

  <!-- CLI display optimization -->
  <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
  </property>
</configuration>

5. Install MySQL JDBC Driver

Copy mysql-connector-java-5.1.x.jar to $HIVE_HOME/lib/.

6. Initialize Metadata Schema

schematool -dbType mysql -initSchema

Success creates about 70 Hive metadata tables in MariaDB.

7. Verify Installation

hive
# Enter Hive CLI
hive> show databases;
# Output: default

Data Warehouse Default Path

Hive defaults to storing data in HDFS under /user/hive/warehouse/:

  • Database: /user/hive/warehouse/<db_name>.db/
  • Table: /user/hive/warehouse/<db_name>.db/<table_name>/

Next article will cover Hive DDL/DML operations including database/table creation and data import.