mac上搭建 hadoop 伪集群

1. hadoop介绍

Hadoop是Apache基金会开发的一个开源的分布式计算平台，主要用于处理和分析大数据。Hadoop的核心设计理念是将计算任务分布到多个节点上，以实现高度可扩展性和容错性。它主要由以下几个部分组成：

HDFS (Hadoop Distributed File System)：HDFS是Hadoop的分布式文件系统，具有较高的读写速度，很好的容错性和可伸缩性，为海量的数据提供了分布式存储。其冗余数据存储的方式很好地保证了数据的安全性。

MapReduce：MapReduce是一种用于并行处理大数据集的软件框架（编程模型）。用户可在无需了解底层细节的情况下，编写MapReduce程序进行分析和处理分布式文件系统上的数据，MapReduce保证了分析和处理数据的高效性。

YARN (Yet Another Resource Negotiator)：YARN是Hadoop2.0以后引入的另一个核心技术，它是一个任务调度和集群资源管理系统。

2. 部署

2.1 下载

官方下载日志如下

https://dlcdn.apache.org/hadoop/common/

在这里插入图片描述

这里选择最新的3.3.6

2.2 解压

检查版本

bin/hadoop version

在这里插入图片描述

2.3 配置

配置 java home etc/hadoop/hadoop-env.sh

#注意修改为自己的

export JAVA_HOME=/usr/local/develop/java/zulu-jdk17.0.7
修改 etc/hadoop/core-site.xml:

            fs.defaultFS        hdfs://localhost:9000

修改 etc/hadoop/hdfs-site.xml:

            dfs.replication        2

检查ssh

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
```
ssh localhost
```
If you cannot ssh to localhost without a passphrase, execute the following commands:
```
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
```
注意已经存在ssh key的话就不要重写了不然会影响到你之前配置的ssh key

mac上系统配置了也不行的话参考 https://blog.csdn.net/a15835774652/article/details/135572420

2.4 开始启动

首先启动 hdfs 首次需要format文件系统 Format the filesystem:

bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

sbin/start-dfs.sh

Browse the web interface for the NameNode; by default it is available at:

默认的地址

NameNode – http://localhost:9870/ http://localhost:9868/

DataNode: localhost:9864/datanode.html

示例图

在这里插入图片描述

开启YARN

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.
Configure parameters as follows:

etc/hadoop/mapred-site.xml:

    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.application.classpath
        $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
    
    
    
        mapreduce.jobhistory.address
        localhost:10020
    
    
    
        mapreduce.jobhistory.webapp.address
        localhost:19888

etc/hadoop/yarn-env.sh (注意这个jdk8之后必须要加不然yarn启动会报错)

export JAVA_HOME=/usr/local/develop/java/zulu-jdk17.0.7
export YARN_RESOURCEMANAGER_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"
export YARN_NODEMANAGER_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"

etc/hadoop/yarn-site.xml:

    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        yarn.nodemanager.env-whitelist
        JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
    
     
    
		yarn.log-aggregation-enable
	   	true
    
     
    
		yarn.log.server.url
      	localhost:19888/jobhistory/logs
    
     
    
		yarn.log-aggregation.retain-seconds
	   	604800

Start ResourceManager daemon and NodeManager daemon:

sbin/start-yarn.sh

Browse the web interface for the ResourceManager 浏览器查看

ResourceManager – http://localhost:8088/

启动/关闭历史服务器

jdk8以上环境需要修改 etc/hadoop/mapred-env.sh

export JAVA_HOME=/usr/local/develop/java/zulu-jdk17.0.7
export MAPRED_HISTORYSERVER_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"

# 启动
mapred --daemon start historyserver
# 关闭
mapred --daemon stop historyserver

使用jps查看启动的应用

查看HistoryServer控制台 : http://localhost:19888
另外在初始化之后也可以直接启动 sbin/start-all.sh 这个命令会启动 hdfs 和 yarn (生产环境慎用)
关闭同理 sbin/stop-dfs.sh 停止 hdfs sbin/./stop-yarn.sh 停止yarn 也可以直接 sbin/stop-all.sh 来关闭hdfs和yarn (生产环境慎用)

以上就完成了 hadoop的伪集群搭建

下一步我们来操作一把 hdfs 和 mapreduce

#调用hdfs 创建目录
./bin/hdfs dfs -mkdir -p /user/leon
# 向这个目录下写入一个测试文件
.bin/hdfs dfs -put ./etc/hadoop/hadoop-env.sh /user/leon

#测试下 任务调度
./bin/hdfs dfs -mkdir input
./bin/hdfs dfs -put etc/hadoop/*.xml input
#执行测试的调度任务
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep input output 'dfs[a-z.]+'
# 获取执行的结果
./bin/hdfs dfs -get output output
cat output/*

可能会遇到的问题

启动yarn后在jps中无法看到 resourcemanager

在日志文件中发现 yarn 启动失败了 Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make protected final java.lang.Class java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain) throws java.lang.ClassFormatError accessible: module java.base does not “opens java.lang” to unnamed module @4d7c417d

在yarn-env.sh 添加配置即可

export JAVA_HOME=/usr/local/develop/java/zulu-jdk17.0.7

export YARN_RESOURCEMANAGER_OPTS=“–add-opens java.base/java.lang=ALL-UNNAMED”

export YARN_NODEMANAGER_OPTS=“–add-opens java.base/java.lang=ALL-UNNAMED”
烦人的警告信息

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

hadoop的bin都是在特定的机器上进行编译的，不一定能支持所有的机器，

解决办法：

首先在 hadoop-env,sh 文件添加参数

export HADOOP_HOME=/usr/local/develop/hadoop-3.3.6
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

然后可以关闭或者启动集群进行测试下如果告警日志没有那就说明 hadoop的native包适用你的机器，如果还是提示那就要编译源码了但是网络上有已经编译好的，如果有自己对应的版本的话可以直接使用 GitHub 地址 https://github.com/silent-night-no-trace/mac-native-hadoop-library

good day ！！！

本文来自网络，不代表协通编程立场，如若转载，请注明出处：https://net2asp.com/0160ece2c4.html