机房360首页
当前位置:首页 » 技术文库 » Hadoop启动遇到的各种问题

Hadoop启动遇到的各种问题

来源:大数据观察 作者:hang编辑 更新时间:2016/9/8 9:30:51

摘要:这个日志会在启动hadoop的时候在JobTracker的log日志文件中出现,在这里就是hdfs出现问题,导致DataNode无法启动,这里唯一的解决方式就是把所有的NameNode管理的路径下的文件删除然后重新执行namenode -format,而删除的地方主要有存放临时数据的tmp路径,存放数据的data路径还有name路径,全部删除之后重新format次问题就解决了

  1. HDFS initialized but not 'healthy' yet, waiting...

  这个日志会在启动hadoop的时候在JobTracker的log日志文件中出现,在这里就是hdfs出现问题,导致DataNode无法启动,这里唯一的解决方式就是把所有的NameNode管理的路径下的文件删除然后重新执行namenode -format,而删除的地方主要有存放临时数据的tmp路径,存放数据的data路径还有name路径,全部删除之后重新format次问题就解决了

  2. 在执行hadoop程序的时候出现Name node is in safe mode

  这个异常一般就直接会在IDE的控制台输出,这个错误的主要导致原因是,datanode不停在丢失数据,所以此时namenode就强制本身进入safe mode模式,在该模式下对数据只可以进行读操作而不能进行写操作。解决此异常很简单,直接执行命令让namenode离开次模式就可以了。

  ./hadoop dfsadmin-safemode leave

  3. java.io.FileNotFoundException: /data/dfs/namesecondary/in_use.lock (Permission denied):

  2016-09-07 10:18:42,902 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started

  2016-09-07 10:18:43,053 FATAL org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start secondary namenode

  java.io.FileNotFoundException: /data/dfs/namesecondary/in_use.lock (Permission denied)

  at java.io.RandomAccessFile.open0(Native Method)

  at java.io.RandomAccessFile.open(RandomAccessFile.java:316)

  at java.io.RandomAccessFile.(RandomAccessFile.java:243)

  at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:706)

  at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:678)

  at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:499)

  at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:962)

  at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:243)

  at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)

  at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)

  2016-09-07 10:18:43,056 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

  2016-09-07 10:18:43,057 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:

  /************************************************************

  SHUTDOWN_MSG: Shutting down SecondaryNameNode at joyven/192.168.2.35

  ************************************************************

  这有两种场景出现,

  1):在原来正常的时候,有一次突然使用了原来不同的用户启动了一次hadoop。这种场景会产生一个in_use.lock 文件夹在你设置的目录中,这时候可以删除这个文件夹直接,然后重新启动

  2):在格式化hadoop的时候和当期启动的用户不是同一个,也会导致该问题。这个时候可以使用格式化hadoop的那个用户重新启动hadoop。也可以解决此错误。

  4. hadoop /tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

  启动了集群之后发现namenode起来了,但是各个slave节点的datanode却都没起起来。去看namenode日志发现错误日志:

  INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call addBlock(/opt/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_502181644) from 127.0.0.1:2278: error: java.io.IOException: File /opt/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

  java.io.IOException: File /opt/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)

  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

  at java.lang.reflect.Method.invoke(Method.java:597)

  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)

  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)

  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)

  at java.security.AccessController.doPrivileged(Native Method)

  at javax.security.auth.Subject.doAs(Subject.java:396)

  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

  具体原因还不是很清楚,当防火墙不关闭的时候可能出现,但是当异常宕掉整个系统再重启的时候也会出现。解决办法是master和slave同时重新格式化

  5. ERROR mapred.JvmManager: Caught Throwable in JVMRunner. Aborting TaskTracker.

  java.lang.OutOfMemoryError: unable to create new native thread

  在运行任务的过程中,计算突然停止,去计算节点查看TaskTracker日志,发现在计算的过程中抛出以上错误,经查证是因为你的作业打开的文件个数超过系统设置一个进程可以打开的文件的个数的上限。更改/etc/security/limits.conf的配置加入如下配置

  hadoop soft nproc 10000

  hadoop hard nproc 64000

  6. namenode 异常

  2013-08-20 14:10:08,946 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage directory /var/lib/hadoop/cache/hadoop/dfs/name

  2013-08-20 14:10:08,947 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.

  org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /var/lib/hadoop/cache/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

  at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)

  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)

  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)

  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:388)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:277)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:497)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1298)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1307)

  2013-08-20 14:10:08,948 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /var/lib/hadoop/cache/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

  at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)

  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)

  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)

  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:388)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:277)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:497)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1298)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1307)

  7. namenode无法启动(或者SecondaryNameNode无法启动)

  查看namenode日志,发现端口被占用:

  2016-09-07 10:18:08,547 INFO org.apache.hadoop.http.HttpServer2: HttpServer.start() threw a non Bind IOException

  java.net.BindException: Port in use: 0.0.0.0:50070

  at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919)

  at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)

  at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:752)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:638)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)

  Caused by: java.net.BindException: Address already in use

  at sun.nio.ch.Net.bind0(Native Method)

  at sun.nio.ch.Net.bind(Net.java:433)

  at sun.nio.ch.Net.bind(Net.java:425)

  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)

  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

  at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)

  at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)

  ... 8 more

  2016-09-07 10:18:08,550 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...

  2016-09-07 10:18:08,550 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.

  2016-09-07 10:18:08,550 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.

  2016-09-07 10:18:08,551 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

  java.net.BindException: Port in use: 0.0.0.0:50070

  at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919)

  at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)

  at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:752)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:638)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)

  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)

  Caused by: java.net.BindException: Address already in use

  at sun.nio.ch.Net.bind0(Native Method)

  at sun.nio.ch.Net.bind(Net.java:433)

  at sun.nio.ch.Net.bind(Net.java:425)

  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)

  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

  at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)

  at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)

  ... 8 more

  2016-09-07 10:18:08,552 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

  2016-09-07 10:18:08,553 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

  /************************************************************

  SHUTDOWN_MSG: Shutting down NameNode at joyven/192.168.2.35

  ************************************************************/

  解决方法:

  既然知道是哪个端口被占用了,要么换端口要么杀掉。

  mac下杀掉端口的方法:

  sudo lsof -i:端口 -P

  sudo kill -9 PID

  以50070端口为例:

  sudo lsof -i:50070 -P

  控制台输出内容:

  COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

  java 6501 root 189u IPv4 0x782e003217773193 0t0 TCP *:50070 (LISTEN)

  然后kill掉:

  sudo kill -9 6501

  8. NameNode、SecondaryNameNode以及DataNode无法启动的原因归纳

  NameNode 和 SecondaryNameNode 的错误大多数由于配置引起,比如core-site.xml的配置以及hdfs-site.xml的配置,涉及到ip和端口。

  查看日志位于logs下的后缀部分为nameode.log以及ssecondarynamenode.log文件,一般错误写的很明显。

  DataNode 启动错误一般是由于data的数据存放路径配置错误,格式化错误造成。主要配置文件为core-site.xml, hadoop.tmp.dir 属性为数据存放路径。

  namenode格式化命令:

  hadoop namenode -format

机房360微信公众号订阅
扫一扫,订阅更多数据中心资讯

本文地址:http://www.jifang360.com/news/201698/n301086709.html 网友评论: 阅读次数:
版权声明:凡本站原创文章,未经授权,禁止转载,否则追究法律责任。
转载声明:凡注明来源的文章其内容和图片均为网上转载,非商业用途,如有侵权请告知,会删除。
相关评论
正在加载评论列表...
评论表单加载中...
  • 我要分享
推荐图片