This is article 16 in the Big Data series. Introduces HiveServer2 deployment configuration and how to use Beeline client for remote connections.

Complete illustrated version: CSDN Original | Juejin

What is HiveServer2

HiveServer2 (HS2) is a service component provided by Hive that allows remote clients to execute SQL queries via Thrift protocol and return results, supporting cross-platform and cross-language access.

Core features:

  • Provides JDBC/ODBC SQL interface
  • Supports multiple client concurrent connections with session isolation
  • Supports Kerberos, LDAP and other authentication methods
  • Integrates with YARN/LLAP for resource management
  • Default ports: 10000 (Thrift), 10002 (Web UI)

Configuration Steps

1. Configure Hadoop Proxy User

HiveServer2 needs to submit jobs as proxy user. Add to core-site.xml on all nodes:

<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>

2. Enable WebHDFS

Enable WebHDFS support in hdfs-site.xml:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

3. Configure hive-site.xml

<property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
</property>
<property>
  <name>hive.server2.authentication</name>
  <value>NONE</value>
</property>

After configuration changes, need to restart HDFS and YARN to take effect.

Start HiveServer2

Start HiveServer2 on master node (h121):

nohup hiveserver2 &

Verify service is listening on port 10000:

lsof -i:10000

You can also visit http://h121:10002 to view Web UI and confirm service status.

Use Beeline for Remote Connection

Beeline is HiveServer2’s official command-line client, supports remote connection from any node. Example connecting from h122 to HiveServer2 on h121:

beeline
!connect jdbc:hive2://h121.wzk.icu:10000

Enter username (e.g., root) and password (can be empty). After successful connection, execute HQL:

show databases;
use default;
show tables;
select * from test limit 10;

Introduction to HCatalog

HCatalog is a table storage management layer built on Hive Metastore, exposing Hive metadata to compute frameworks like MapReduce, Pig, and Spark. Benefits of using HCatalog:

  • Don’t need to care about underlying data format (ORC, Parquet, TextFile)
  • Support selective reading by partition and fields, don’t need to scan entire table
  • Unified access to same data across multiple compute engines

HCatalog installs with Hive, no additional deployment needed. After enabling, can use hcat command-line tool to operate.