实操Hadoop大数据高可用集群搭建(hadoop3.1.3+zookeeper3.5.7+hbase3.1.3+kafka2.12)
•
大数据
前言
纯实操,无理论,本文是给公司搭建测试环境时记录的,已经按照这一套搭了四五遍大数据集群了,目前使用还未发现问题。
有问题麻烦指出,万分感谢!
PS:Centos7.9、Rocky9.1可用
集群配置
| ip | hostname | 系统 | CPU | 内存 | 系统盘 | 数据盘 | 备注 |
|---|---|---|---|---|---|---|---|
| 192.168.22.221 | hadoop1 | Centos7.9 | 4 | 16 | 250G | ||
| 192.168.22.222 | hadoop2 | Centos7.9 | 4 | 16 | 250G | ||
| 192.168.22.223 | hadoop3 | Centos7.9 | 4 | 16 | 250G |
规划集群
| hadoop1 | hadoop2 | hadoop3 | 备注 |
|---|---|---|---|
| NameNode | |||
| NameNode | hadoop | ||
| JournalNode | JournalNode | JournalNode | |
| DataNode | DataNode | DataNode | |
| ResourceManager | ResourceManager | ||
| NodeManager | NodeManager | NodeManager | |
| JobHistoryServer | |||
| DFSZKFailoverController | DFSZKFailoverController | DFSZKFailoverController | |
| QuorumPeerMain | QuorumPeerMain | QuorumPeerMain | zookeeper |
| Kafka | Kafka | Kafka | kafka |
| HMatser | HMatser | HBase | |
| HRegionServer | HRegionServer | HRegionServer | |
| Flink |
工具配置
yum install -y epel-release yum install -y net-tools yum install -y vim yum install -y rsync #关闭防火墙 和 自启动 systemctl stop firewalld systemctl disable firewalld.service
修改Centos主机名称
vim /etc/hostname #输入主机名称 vim /etc/hosts 192.168.22.221 hadoop1 192.168.22.222 hadoop2 192.168.22.223 hadoop3
PS:可先配置脚本方便分发,见底部脚本大全
创建用户
#创建程序用户 区分root useradd hadoop passwd hadoop #修改/etc/sudoers文件,在%wheel这行下面添加一行,如下所示: ## Allow root to run any commands anywhere root ALL=(ALL) ALL ## Allows people in group wheel to run all commands %wheel ALL=(ALL) ALL hadoop ALL=(ALL) NOPASSWD:ALL #创建文件夹以安装程序 mkdir /opt/module mkdir /opt/software #分配用户组 chown hadoop:hadoop /opt/module chown hadoop:hadoop /opt/software
SHH免密登录
#/home/hadoop/.ssh ssh-keygen -t rsa #将公钥拷贝到免密登录的目标机器上 ssh-copy-id hadoop1 ssh-copy-id hadoop2 ssh-copy-id hadoop3
JDK安装
PS:环境变量可翻到对应目录,一次配齐。
###解压缩 tar -zxvf /opt/software/jdk-8u212-linux-x64.tar.gz -C /opt/module/ #配置变量 sudo vim /etc/profile.d/my_env.sh #JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin #配置生效 source /etc/profile #检验安装 java -version
zookeeper配置
#解压到指定目录 tar -zxvf /opt/software/apache-zookeeper-3.5.7-bin.tar.gz -C /opt/module/ #修改名称 mv /opt/module/apache-zookeeper-3.5.7-bin /opt/module/zookeeper-3.5.7 #将/opt/module/zookeeper-3.5.7/conf 这个路径下的 zoo_sample.cfg 修改为 zoo.cfg; mv zoo_sample.cfg zoo.cfg #打开 zoo.cfg 文件,修改 dataDir 路径: vim zoo.cfg dataDir=/opt/module/zookeeper-3.5.7/zkData #增加如下配置 #######################cluster########################## server.1=hadoop1:2888:3888 server.2=hadoop2:2888:3888 server.3=hadoop3:2888:3888 #创建zkData mkdir zkData #在/opt/module/zookeeper-3.5.7/zkData 目录下创建一个 myid 的文件 #在文件中添加与 server 对应的编号(注意:上下不要有空行,左右不要有空格) vi /opt/module/zookeeper-3.5.7/zkData/myid 1 #分发配置好的zookeeper xsync zookeeper-3.5.7 #修改对应的myid #比如hadoop2 为2、hadoop3 为3 #脚本启动 Zookeeper bin/zkServer.sh start #查看状态 bin/zkServer.sh status #启动客户端 bin/zkCli.sh #未关闭防火墙需要 #开放端口2888/3888(add为添加,remove为移除) xcall sudo firewall-cmd --zone=public --permanent --add-port=2181/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=2888/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=3888/tcp # 并重载入添加的端口: xcall sudo firewall-cmd --reload # 再次查询端口开放情况,确定2888和3888开放 xcall sudo firewall-cmd --zone=public --list-ports
hadoop配置
安装与配置环境
###解压缩 tar -zxvf /opt/software/hadoop-3.1.3.tar.gz -C /opt/module/ #重命名 mv hadoop-3.1.3 hadoop #配置变量 sudo vim /etc/profile.d/my_env.sh #HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin #配置生效 source /etc/profile #查看版本 hadoop version #未关闭防火墙需要开放端口 xcall sudo firewall-cmd --zone=public --permanent --add-port=8020/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=9870/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=8485/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=8088/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=8032/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=8030/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=8031/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=19888/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=10020/tcp # 并重载入添加的端口: xcall sudo firewall-cmd --reload # 再次查询端口开放情况,确定2888和3888开放 xcall sudo firewall-cmd --zone=public --list-ports
/etc/hadoop 配置文件
1、core-site.xml
fs.defaultFS
hdfs://mycluster
hadoop.tmp.dir
/home/hadoop/data
hadoop.http.staticuser.user
hadoop
ha.zookeeper.quorum
hadoop1:2181,hadoop2:2181,hadoop3:2181
2、hdfs-site.xml
dfs.namenode.name.dir
file://${hadoop.tmp.dir}/name
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/data
dfs.journalnode.edits.dir
${hadoop.tmp.dir}/jn
dfs.nameservices
mycluster
dfs.ha.namenodes.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
hadoop1:8020
dfs.namenode.rpc-address.mycluster.nn2
hadoop3:8020
dfs.namenode.http-address.mycluster.nn1
hadoop1:9870
dfs.namenode.http-address.mycluster.nn2
hadoop3:9870
dfs.namenode.shared.edits.dir
qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
shell(true)
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa
dfs.ha.automatic-failover.enabled
true
3、yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
cluster-yarn1
yarn.resourcemanager.ha.rm-ids
rm2,rm3
yarn.resourcemanager.hostname.rm2
hadoop2
yarn.resourcemanager.webapp.address.rm2
hadoop2:8088
yarn.resourcemanager.address.rm2
hadoop2:8032
yarn.resourcemanager.scheduler.address.rm2
hadoop2:8030
yarn.resourcemanager.resource-tracker.address.rm2
hadoop2:8031
yarn.resourcemanager.hostname.rm3
hadoop3
yarn.resourcemanager.webapp.address.rm3
hadoop3:8088
yarn.resourcemanager.address.rm3
hadoop3:8032
yarn.resourcemanager.scheduler.address.rm3
hadoop3:8030
yarn.resourcemanager.resource-tracker.address.rm3
hadoop3:8031
yarn.resourcemanager.zk-address
hadoop1:2181,hadoop2:2181,hadoop3:2181
yarn.resourcemanager.recovery.enabled
true
yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
yarn.nodemanager.env-whitelist
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
yarn.log-aggregation-enable
true
yarn.log.server.url
http://hadoop1:19888/jobhistory/logs
yarn.log-aggregation.retain-seconds
604800
The class to use as the resource scheduler.
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Number of threads to handle scheduler interface.
yarn.resourcemanager.scheduler.client.thread-count
8
Enable auto-detection of node capabilities such as
memory and CPU.
yarn.nodemanager.resource.detect-hardware-capabilities
false
Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
yarn.nodemanager.resource.count-logical-processors-as-cores
false
Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The number of vcores will be
calculated as number of CPUs * multiplier.
yarn.nodemanager.resource.pcores-vcores-multiplier
1.0
Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
yarn.nodemanager.resource.memory-mb
4096
Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.
yarn.nodemanager.resource.cpu-vcores
4
The minimum allocation for every container request at the RM in MBs. Memory requests lower than
this will be set to the value of this property. Additionally, a node manager that is configured to have less
memory than this value will be shut down by the resource manager.
yarn.scheduler.minimum-allocation-mb
1024
The maximum allocation for every container request at the RM in MBs. Memory requests higher than
this will throw an InvalidResourceRequestException.
yarn.scheduler.maximum-allocation-mb
2048
The minimum allocation for every container request at the RM in terms of virtual CPU cores.
Requests lower than this will be set to the value of this property. Additionally, a node manager that is
configured to have fewer virtual cores than this value will be shut down by the resource manager.
yarn.scheduler.minimum-allocation-vcores
1
The maximum allocation for every container request at the RM in terms of virtual CPU cores.
Requests higher than this will throw an
InvalidResourceRequestException.
yarn.scheduler.maximum-allocation-vcores
2
Whether virtual memory limits will be enforced for
containers.
yarn.nodemanager.vmem-check-enabled
false
Ratio between virtual memory to physical memory when setting memory limits for containers.
Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to
exceed this allocation by this ratio.
yarn.nodemanager.vmem-pmem-ratio
2.1
4、mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop1:10020
mapreduce.jobhistory.webapp.address
hadoop1:19888
5、workers
hadoop1 hadoop2 hadoop3
初始化运行
#分发配置!!! xsync /opt/module/hadoop/etc/hadoop #启动journalnode,用来初始化namenode xcall hdfs --daemon start journalnode #初始化hdfs hdfs namenode -format #单节点先启动namenode hdfs --daemon start namenode #其余Namenode中执行,同步Namenode hdfs namenode -bootstrapStandby #初始化zkfc hdfs zkfc -formatZK #脚本群起 hadoop.sh start
访问ui界面端口为9870确认Namenode已启动。
kafka配置
安装
#解压安装包 tar -zxvf /opt/software/kafka_2.12-3.0.0.tgz -C /opt/module #重命名 mv kafka_2.12-3.0.0/ kafka #进入到/opt/module/kafka目录,修改配置文件 cd config/ vim server.properties #broker 的全局唯一编号,不能重复,只能是数字。 broker.id=0 #处理网络请求的线程数量 num.network.threads=3 #用来处理磁盘 IO 的线程数量 num.io.threads=8 #发送套接字的缓冲区大小 socket.send.buffer.bytes=102400 #接收套接字的缓冲区大小 socket.receive.buffer.bytes=102400 #请求套接字的缓冲区大小 socket.request.max.bytes=104857600 #kafka 运行日志(数据)存放的路径,路径不需要提前创建,kafka 自动帮你创建,可以 配置多个磁盘路径,路径与路径之间可以用","分隔 log.dirs=/opt/module/kafka/datas #topic 在当前broker 上的分区个数 num.partitions=1 #用来恢复和清理data 下数据的线程数量 num.recovery.threads.per.data.dir=1 # 每个topic 创建时的副本数,默认时 1 个副本 offsets.topic.replication.factor=1 #segment 文件保留的最长时间,超时将被删除 log.retention.hours=168 #每个 segment 文件的大小,默认最大1G log.segment.bytes=1073741824 # 检查过期数据的时间,默认 5 分钟检查一次是否数据过期 log.retention.check.interval.ms=300000 #配置连接 Zookeeper 集群地址(在 zk 根目录下创建/kafka,方便管理) zookeeper.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181/kafka #分发安装包 xsync kafka/ #分别在 hadoop2 hadoop3 上修改配置文件/opt/module/kafka/config/server.properties 中的 broker.id=1、broker.id=2注:broker.id不得重复,整个集群中唯一。 #配置环境变量 sudo vim /etc/profile.d/my_env.sh #KAFKA_HOME export KAFKA_HOME=/opt/module/kafka export PATH=$PATH:$KAFKA_HOME/bin #刷新一下环境变量。 source /etc/profile #分发环境变量文件到其他节点,并 source。 sudo /home/hadoop/bin/xsync /etc/profile.d/my_env.sh #未关闭防火墙需要开放端口 xcall sudo firewall-cmd --zone=public --permanent --add-port=9092/tcp # 并重载入添加的端口: xcall sudo firewall-cmd --reload # 再次查询端口开放情况,确定2888和3888开放 xcall sudo firewall-cmd --zone=public --list-ports
启动
#先启动 Zookeeper集群,然后启动Kafka。 zk.sh start #群起集群 kf.sh start
HBase配置
安装
#解压tar包 tar -zxvf /opt/software/hbase-2.4.11-bin.tar.gz -C /opt/module #重命名 mv hbase-2.4.11 hbase #未关闭防火墙需要开放端口 xcall sudo firewall-cmd --zone=public --permanent --add-port=16000/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=16010/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=16020/tcp xcall sudo firewall-cmd --zone=public --permanent --add-port=16030/tcp # 并重载入添加的端口: xcall sudo firewall-cmd --reload # 再次查询端口开放情况,确定2888和3888开放 xcall sudo firewall-cmd --zone=public --list-ports
配置环境
hbase-env.sh
export HBASE_MANAGES_ZK=false
hbase-site.xml
hbase.zookeeper.quorum
hadoop1,hadoop2,hadoop3:2181
The directory shared by RegionServers.
<!-- -->
<!-- hbase.zookeeper.property.dataDir-->
<!-- /export/zookeeper-->
<!-- 记得修改 ZK 的配置文件 -->
<!-- -->
<!-- -->
hbase.rootdir
hdfs://hadoop1:8020/hbase
The directory shared by RegionServers.
hbase.cluster.distributed
true
regionservers
hadoop1
hadoop2
hadoop3
backup-masters(新建高可用HMaster备用节点文件)
插入高可用节点,不要有多余符号,直接写host
hadoop3
解决 HBase 和 Hadoop 的 log4j 兼容性问题,修改 HBase 的 jar 包,使用 Hadoop 的 jar 包
mv /opt/module/hbase/lib/client-facing-thirdparty/slf4j-reload4j-1.7.33.jar /opt/module/hbase/lib/client-facing-thirdparty/slf4j-reload4j-1.7.33.jar.bak
启动
#配置环境变量 sudo vim /etc/profile.d/my_env.sh #HBASE_HOME export HBASE_HOME=/opt/module/hbase export PATH=$PATH:$HBASE_HOME/bin #生效配置 source /etc/profile.d/my_env.sh #分发my_env.sh xsync /etc/profile.d/my_env.sh #群起命令 start-hbase.sh
访问:http://hadoop1:16010查看ui界面
flink
略
脚本大全
1、xsync
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop1 hadoop2 hadoop3
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
2、xcall
#!/bin/bash
# 获取控制台指令
cmd=$*
# 判断指令是否为空
if [ ! -n "$cmd" ]
then
echo "command can not be null !"
exit
fi
# 获取当前登录用户
user=`whoami`
# 在从机执行指令,这里需要根据你具体的集群情况配置,host与具体主机名一致,同上
for host in hadoop1 hadoop2 hadoop3
do
echo "================current host is $host================="
echo "--> excute command \"$cmd\""
ssh $user@$host $cmd
done
3、hadoop.sh
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case $1 in
"start")
echo " =================== 启动 hadoop集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh hadoop1 "/opt/module/hadoop/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh hadoop2 "/opt/module/hadoop/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh hadoop1 "/opt/module/hadoop/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh hadoop1 "/opt/module/hadoop/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh hadoop2 "/opt/module/hadoop/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh hadoop1 "/opt/module/hadoop/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
4、zk.sh
#!/bin/bash
case $1 in
"start"){
for i in hadoop1 hadoop2 hadoop3
do
echo ---------- zookeeper $i 启动 ------------
ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh start"
done
};;
"stop"){
for i in hadoop1 hadoop2 hadoop3
do
echo ---------- zookeeper $i 停止 ------------
ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh stop"
done
};;
"status"){
for i in hadoop1 hadoop2 hadoop3
do
echo ---------- zookeeper $i 状态 ------------
ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh status"
done
};;
esac
5、kf.sh
#! /bin/bash
case $1 in
"start"){
for i in hadoop1 hadoop2 hadoop3
do
echo " --------启动 $i Kafka-------"
ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties"
done
};;
"stop"){
for i in hadoop1 hadoop2 hadoop3
do
echo " --------停止 $i Kafka-------"
ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh "
done
};;
esac
环境变量
my_env.sh
#JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin #HADOOP_HOME export HADOOP_HOME=/opt/ha/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export PAHT=$PATH:/home/hadoop/bin #KAFKA_HOME export KAFKA_HOME=/opt/module/kafka export PATH=$PATH:$KAFKA_HOME/bin #HBASE_HOME export HBASE_HOME=/opt/module/hbase export PATH=$PATH:$HBASE_HOME/bin
本文来自网络,不代表协通编程立场,如若转载,请注明出处:https://net2asp.com/dd18c10615.html
