Tag Archives: hdfs

JAVA api Access HDFS Error: Permission denied in production environment

An error occurs when the Java API accesses the HDFS file system without specifying the access user

        //1. Create the object of hadoop configuration
//        Configuration conf = new Configuration();
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://linux121:9000");
        //2. Create the object of hadoop FileSystem
//        FileSystem fileSystem = FileSystem.get(new URI("hdfs://linux121:9000"), conf, "root");
        FileSystem fileSystem = FileSystem.get(conf);
        //3. Create the files
        fileSystem.mkdirs(new Path("/tp_user"));
        //4. Close
        fileSystem.close();

The following error messages appear:

org.apache.hadoop.security.AccessControlException: Permission denied: user=QI, access=WRITE, inode="/":root:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:350)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:251)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:189)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1753)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1737)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1696)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2990)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1096)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB

If the access name is not specified, the default is to use the current system user name for access, and an error will be reported if the permission is insufficient. The access permission of HDFS user is weak, which can not prevent bad people from doing bad things!!

In the case of production, there are three solutions:

Specify the user information to obtain the file system object and close the HDFS cluster permission verification

Turn off HDFS cluster permission verification

vim hdfs-site.xml
#Add the following properties
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>

Based on the weak characteristics of HDFS permissions, we can completely give up HDFS permission verification. If we are in a production environment, we can consider using Kerberos, sentry and other security frameworks to manage the security of big data clusters. Therefore, we directly modify the root directory permission of HDFS to 777

hadoop fs -chmod -R 777 /

Hadoop Connect hdfs Error: could only be replicated to 0 nodes instead of minReplication (=1).

Hadoop connect hdfs error: could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation:

        FileSystem fileSystem=null;
	    //Initialize before executing @Test
	    @Before
	    public void init() throws IOException, InterruptedException, URISyntaxException {
	        fileSystem=FileSystem.get(new URI("hdfs://master:8020"), new Configuration(), "root");

	    }
	    @Test
	    public void write(){
	    	try {
	    		FSDataOutputStream fdos = fileSystem.create(new Path("/testing/file01.txt"), true);
	            fdos.writeBytes("Test text for the txt file");
	            fdos.flush();
	            fdos.close();
	            fileSystem.close();
			} catch (Exception e) {
			}
	    }

The error is as follows

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File  could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
	at org.apache.hadoop.ipc.Client.call(Client.java:1337)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:440)

Finally, it was found that the computer used could not be directly connected to other data nodes of Hadoop, so the network environment was modified and the problem was solved

HDFS problem set (1), use the command to report an error: com.google.protobuf.servicee xception:java.lang.OutOfMemoryError :java heap space

Why are there a series of technical challenges behind “OMG buy it”>>>

Only personal practice, if there is something wrong, welcome to exchange


I. causes

Execute the following two basic HDFS commands to report an error

1 hdfs dfs -get /home/mr/data/* ./
2 hdfs dfs -ls /home/mr/data/*

These are two normal HDFS commands. How can I report an error?Then open the HDFS command to see the problem

Second, analysis

1) Use the following command to find the path of the HDFS command

1 which hdfs

Open the script with VIM HDFS, and find that Hadoop will be used when executing with HDFS DFS_ CLIENT_ Opts configuration item. The configuration item is usually set in the directory/etc/Hadoop/conf/hadoop-env.sh by searching

Open hadoop-env.sh script and find that the configuration item adopts the default configuration, namely 256M

2) After checking, there are 1W + small files in the/home/MR/data directory, but the size is only about 100m. It is speculated that the metadata is too large due to the size of the file data, resulting in insufficient memory when loading to the client ( guess may not be correct, you are welcome to give a correct explanation )

Third, solutions

Increase Hadoop_ CLIENT_ The configuration of opts can solve the problem in both forms

1 export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"
2 hdfs dfs -get /home/mr/data/* ./
3 
4 HADOOP_CLIENT_OPTS="-Xmx1024m" hdfs dfs -get /home/mr/data/* ./

In addition, you can modify the configuration permanently by modifying the value in hadoop-env.sh

Use sqoop to store HDFS data in MySQL and report an error_ 1566707990804_ 0002 failed with state FAILED due to: Tas k failed

Use sqoop to store HDFS data in MySQL database and report an error

Job job_ 1566707990804_ 0002 failed with state FAILED due to: Tas k failed task_ 1566707990804_ 0002_ m_ 0、

I encountered this problem because when creating a table in mysql, varchar (10) is used. If the content in the data is greater than 10, increase the varchar to solve it!