HiveServer2 Configuration and Beeline Remote Connection

This is article 16 in the Big Data series. Introduces HiveServer2 deployment configuration and how to use Beeline client for remote connections.

Complete illustrated version: CSDN Original | Juejin

What is HiveServer2

HiveServer2 (HS2) is a service component provided by Hive that allows remote clients to execute SQL queries via Thrift protocol and return results, supporting cross-platform and cross-language access.

Core features:

Provides JDBC/ODBC SQL interface
Supports multiple client concurrent connections with session isolation
Supports Kerberos, LDAP and other authentication methods
Integrates with YARN/LLAP for resource management
Default ports: 10000 (Thrift), 10002 (Web UI)

Configuration Steps

1. Configure Hadoop Proxy User

HiveServer2 needs to submit jobs as proxy user. Add to core-site.xml on all nodes:

<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>

2. Enable WebHDFS

Enable WebHDFS support in hdfs-site.xml:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

3. Configure hive-site.xml

<property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
</property>
<property>
  <name>hive.server2.authentication</name>
  <value>NONE</value>
</property>

After configuration changes, need to restart HDFS and YARN to take effect.

Start HiveServer2

Start HiveServer2 on master node (h121):

nohup hiveserver2 &

Verify service is listening on port 10000:

lsof -i:10000

You can also visit http://h121:10002 to view Web UI and confirm service status.

Use Beeline for Remote Connection

Beeline is HiveServer2’s official command-line client, supports remote connection from any node. Example connecting from h122 to HiveServer2 on h121:

beeline
!connect jdbc:hive2://h121.wzk.icu:10000

Enter username (e.g., root) and password (can be empty). After successful connection, execute HQL:

show databases;
use default;
show tables;
select * from test limit 10;

Introduction to HCatalog

HCatalog is a table storage management layer built on Hive Metastore, exposing Hive metadata to compute frameworks like MapReduce, Pig, and Spark. Benefits of using HCatalog:

Don’t need to care about underlying data format (ORC, Parquet, TextFile)
Support selective reading by partition and fields, don’t need to scan entire table
Unified access to same data across multiple compute engines

HCatalog installs with Hive, no additional deployment needed. After enabling, can use hcat command-line tool to operate.