This is article 3 in the Big Data series, covering Hadoop cluster SSH passwordless login configuration and writing file distribution scripts, preparing for cluster startup.

Complete illustrated version: CSDN Original | Juejin

Why Passwordless Login is Needed

When starting Hadoop cluster, the master node connects to each slave node via SSH to execute startup commands. Without passwordless, you’d need to manually enter password each time, and the cluster cannot start automatically.

/etc/hosts Configuration (Important)

If not configured well, nodes won’t authenticate each other, all nodes need to map each other in /etc/hosts:

# Each machine's /etc/hosts needs these three lines
<h121's IP> h121.wzk.icu h121
<h122's IP> h122.wzk.icu h122
<h123's IP> h123.wzk.icu h123

Note for public cloud servers: Use public IP for hostname, not 127.x.x.x.

SSH Passwordless Configuration Steps

1. Generate RSA Keys on All Three Nodes

ssh-keygen -t rsa -b 4096
# Press enter all the way, don't set password

After generation, in ~/.ssh/:

  • id_rsa: Private key
  • id_rsa.pub: Public key

2. Distribute Public Key to All Nodes

On each node, execute to distribute your public key to three machines (including itself):

ssh-copy-id h121.wzk.icu
ssh-copy-id h122.wzk.icu
ssh-copy-id h123.wzk.icu

3. Verify Passwordless Login

ssh h122.wzk.icu
# Success if no password required

Cluster File Distribution Script

In cluster operations, often need to synchronize configuration files to all nodes, write an rsync script and put it in /usr/local/bin for global availability:

#!/bin/bash
# Script: xsync, usage: xsync <file or directory>
# Put in /usr/local/bin/xsync and chmod +x

pcount=$#
if((pcount==0)); then
  echo "No args..."
  exit
fi

p1=$1
fname=$(basename $p1)
pdir=$(cd -P $(dirname $p1); pwd)

echo "------sync $pdir/$fname------"

for host in h121.wzk.icu h122.wzk.icu h123.wzk.icu; do
  echo "-------- $host --------"
  rsync -rvl $pdir/$fname $host:$pdir
done

Install:

sudo cp xsync /usr/local/bin/
sudo chmod +x /usr/local/bin/xsync

Usage:

xsync /opt/servers/hadoop-2.9.2/etc/hadoop/

Common Pitfalls

  1. Permission issues: ~/.ssh directory must be 700, ~/.ssh/authorized_keys must be 600
  2. Firewall: Public cloud servers need to allow port 22 (usually already open)
  3. First login prompt: First ssh connection will prompt “Are you sure…”, manually enter yes to confirm before passwordless works

Next article: Big Data 04 - Hadoop Cluster Startup