— Distcp using MRv2 (YARN) from a CDH3 cluster to a CDH4 cluster may fail with CRC mismatch errors
Running distcp on a CDH4 YARN cluster with a CDH3 hftp source will fail if the CRC checksum type being used is the CDH4 default (CRC32C). This is because the default checksum type was changed in CDH4 from the CDH3 default of CRC32.
Bug: HADOOP-8060
Severity: Medium
Anticipated Resolution: To be fixed in an upcoming release
Workaround: You can work around this issue by changing the CRC checksum type on the CDH4 cluster to the CDH3 default, CRC32. To do this set dfs.checksum.type to CRC32 in hdfs-site.xml.
Error: java.io.IOException: File copy failed: hftp://xxxxxx:50070/tmp/中文路径测试/part-r-00017 --> hdfs://aaaaaa:54310/tmp/distcp_test14/part-r-00017
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://xxxxxx:50070/tmp/中文路径测试/part-r-00017 to hdfs://aaaaaa:54310/tmp/distcp_test14/part-r-00017
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
... 10 more
Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: HTTP_OK expected, received 500
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:201)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:167)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:112)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:90)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:71)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
Caused by: java.io.IOException: HTTP_OK expected, received 500
at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCode(HftpFileSystem.java:381)
at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:121)
at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:158)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:70)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:198)