Why are there a series of technical challenges behind “OMG buy it”>>>
Today, had wana to delete a folder in hdfs, and I use command:
hdfs dfs -rm -r {PATH_TO_DELETE}
but the execution caused error
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuffer.append(StringBuffer.java:237)
at java.net.URI.appendAuthority(URI.java:1852)
at java.net.URI.appendSchemeSpecificPart(URI.java:1890)
at java.net.URI.toString(URI.java:1922)
at java.net.URI.<init>(URI.java:749 at org.apache.hadoop.fs.Path.initialize(Path.java:203)
at org.apache.hadoop.fs.Path.<init>(Path.java:116 at org.apache.hadoop.fs.Path.<init>(Path.java:94 at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:230)
at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:263)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:732)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
after investigation, I found the folder has over 3,200,000 sub folders in it, and the hdfs dfs -rm -r command searchs recursively for all files in the target foler, and this caused GC OOM error, for so many folers in it, it is generated by spark streaming app, and it will generate a new folder in such foler every 5 seconds. it shows as follows:
drwxrwxrwt - lemmon hive 0 2019-08-22 15:59 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_15-59-45_031_3742489601469128850-1
drwxrwxrwt - lemmon hive 0 2019-08-22 15:59 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_15-59-50_027_933578136370010038-1
drwxrwxrwt - lemmon hive 0 2019-08-22 15:59 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_15-59-55_029_6199493420123626573-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-00_032_8311927636645071418-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-05_028_7808119144528074564-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-20_673_4245775132924454475-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-22_916_5203721247828982284-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-23_048_5400359532180400737-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-25_029_6418682296132408220-1
drwxrwxrwt - lemmon hive 0 2019-08-22 16:00 /user/hive/warehouse/flow.db/flow_today_data_detail_s/.hive-staging_hive_2019-08-22_16-00-30_035_7810140937128597094-1
so I wana to delete the hive staging foldes under this foler. to do this, we must set the hadoop gc paramter.
Hadoop runtime options:
export HADOOP_OPTS="-XX:-UseGCOverheadLimit"
Adding this option fixed the GC error, but started throwing the following error, citing the lack of Java Heap space.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1351)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
We modified the above export, and tried following instead. Note that instead of HADOOP_OPTS, we needed to set HADOOP_CLIENT_OPTS fix this error. This was needed because all the hadoop commands run as a client. HADOOP_OPTS needs to be setup for modifying actual Hadoop run time, and HADOOP_CLIENT_OPTS is needed to be setup for modifying run time for Hadoop command line client.
export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx24096m"
After set the HADOOP_CLIENT_OPTS variable, the hdfs command execute successful.