刚刚开始学习hadoop,在配置hdfs的时候经常出现一些莫名其妙的问题。总结一下:
一 关于hadoop namenode -format
每个节点(datanode、namenode)都需要进行hadoop namenode -format ,这是必须的,但是这也经常引发一些问题。例如datanode的namespaceID不匹配问题。导致datanode无法链接到namenode。在网上看到外文的参考方法:
Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is "minimally invasive" as you only have to edit one file on the problematic datanodes:
1.stop the datanode
2.edit the value of namespaceID in /current/VERSION to match the value of the current namenode
3.restart the datanode
If you followed the instructions in my tutorials, the full path of the relevant file is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data/current/VERSION (background: dfs.data.dir is by default set to ${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir to /usr/local/hadoop-datastore/hadoop-hadoop).
If you wonder how the contents of VERSION look like, here's one of mine:
#contents of /current/VERSION
namespaceID=393514426
storageID=DS-1706792599-10.10.10.1-50010-1204306713481
cTime=1215607609074
storageType=DATA_NODE
layoutVersion=-13
个人感觉配置hadoop.tmp.dir路径很重要,因为dfs.data.dir和dfs.name.dir都是参考这个路径。默认是在/tmp中。当然这两个路径也可以在hdfs-site.xml中配置:
hdfs-site.xml添加如下配置