This is article 16 in the Big Data series. Introduces HiveServer2 deployment configuration and how to use Beeline client for remote connections.
Complete illustrated version: CSDN Original | Juejin
What is HiveServer2
HiveServer2 (HS2) is a service component provided by Hive that allows remote clients to execute SQL queries via Thrift protocol and return results, supporting cross-platform and cross-language access.
Core features:
- Provides JDBC/ODBC SQL interface
- Supports multiple client concurrent connections with session isolation
- Supports Kerberos, LDAP and other authentication methods
- Integrates with YARN/LLAP for resource management
- Default ports: 10000 (Thrift), 10002 (Web UI)
Configuration Steps
1. Configure Hadoop Proxy User
HiveServer2 needs to submit jobs as proxy user. Add to core-site.xml on all nodes:
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
2. Enable WebHDFS
Enable WebHDFS support in hdfs-site.xml:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
3. Configure hive-site.xml
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
After configuration changes, need to restart HDFS and YARN to take effect.
Start HiveServer2
Start HiveServer2 on master node (h121):
nohup hiveserver2 &
Verify service is listening on port 10000:
lsof -i:10000
You can also visit http://h121:10002 to view Web UI and confirm service status.
Use Beeline for Remote Connection
Beeline is HiveServer2’s official command-line client, supports remote connection from any node. Example connecting from h122 to HiveServer2 on h121:
beeline
!connect jdbc:hive2://h121.wzk.icu:10000
Enter username (e.g., root) and password (can be empty). After successful connection, execute HQL:
show databases;
use default;
show tables;
select * from test limit 10;
Introduction to HCatalog
HCatalog is a table storage management layer built on Hive Metastore, exposing Hive metadata to compute frameworks like MapReduce, Pig, and Spark. Benefits of using HCatalog:
- Don’t need to care about underlying data format (ORC, Parquet, TextFile)
- Support selective reading by partition and fields, don’t need to scan entire table
- Unified access to same data across multiple compute engines
HCatalog installs with Hive, no additional deployment needed. After enabling, can use hcat command-line tool to operate.