Tag Archives: Hive

[Solved] hive beeline Connect Error: User:*** is not allowed to impersonate

Beeline connection exception in hive user: * * * is not allowed to impersonate

1. Details of error reporting

When beeline connects hive, the following occurs:

The command causing the error is:

 bin/beeline -u jdbc:hive2://hadoop01:10000 -n root

2. Solutions

Restart the hadoop cluster after adding the configuration to /etc/hadoop/core-site.xml in the hadoop directory: (root is the username that reported the error)

<property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
</property>
<property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
</property>

3. Reference

Hadoop2.0 version began to support the ProxyUser mechanism. The meaning is to use User A’s user authentication information to access the hadoop cluster in the name of User B. For the server, it is considered that User B is accessing the cluster at this time, and the corresponding authentication of the access request (including the permission of the HDFS file system and the permission of YARN to submit the task queue) is performed by the user User B.

Assume superuser user name super, you want to make the user joesubmit any and access hdfs. kerberos have superuser credentials, but the user joedoes not. Tasks need to be joerun as a user , and access to any files on the namenode must also be joedone as a user . Requires kerberos credentials that the user joecan use superto establish a connection to the namenode or job tracker. In other words, the superuser is being impersonating joe.

By core-site.xmlit is provided, superuser supercan and only from host1 and host2 to simulate the user belongs group1 and group2:

<property>
     <name>hadoop.proxyuser.super.hosts</name>
     <value>host1,host2</value>
</property>
<property>
     <name>hadoop.proxyuser.super.groups</name>
     <value>group1,group2</value>
</property>

In a workaround, the security requirements are not high, wildcards *can be used for impersonation (impersonation) from any host or any user, user root from any host can impersonate (impersonate) any user belonging to any group, there is no need to impersonate a user here , but created its own authentication information for the root user, the root user also has access to the hadoop cluster:

<property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
</property>
<property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
</property>

[Solved] spark Connect hive Error: javax.jdo.JDODataStoreException: Required table missing : “`DBS`” in Catalog “” Schema “”

Today, I started spark. I always reported an error when connecting hive. The error is as follows

21/12/12 19:55:26 ERROR Hive: Cannot initialize metastore due to autoCreate error
javax.jdo.JDODataStoreException: Required table missing : "`DBS`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql));

Looking at the yellow font in the back, I first initialized the metadata. I wrote it in my last blog. Later, I found that it was not the metadata problem, but the need to open {datanucleus schema.autoCreateTables

<property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
 </property>

Complete hive site xm:

<configuration>
    <!-- Location of metadata generated by Hive-->
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>
    <!--- Use local service to connect to Hive, default is true-->
<property>
    <name>hive.metastore.local</name>
    <value>true</value>
</property>

    <!-- URL address of the database connection JDBC-->
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=UTC</value>
</property>
    <!-- Database connection driver, i.e. MySQL driver-->
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
</property>
    <!-- MySQL database user name-->
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
</property>
    <!-- MySQL Database Password-->
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
 </property>
<property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
 </property>
<property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
 </property>
</configuration>

Hive initialization metadata error [How to Solve]

Hive was configured today, and an error was reported to me when initializing metadata

I first modify hive/conf/hive site XML configuration file, adding servertimezone = UTC

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.244.1:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=UTC</value>
</property>

Then I still can’t

This error is because your database name or table name already exists. You need to delete the previous one or re create a new one, directly in/conf/hive site Modify the XML configuration file below, as shown in the figure below. Just redefine the name of a database

When hive processes data, it cannot run directly and reports an error: execution error, return code 2 from org.apache.hadoop.hive.ql.exec.mr

terms of settlement:

set hive.exec.mode.local.auto=true;

 

Error reason:

The memory space of the namenode is not enough, and the remaining memory space of the JVM is not enough, which is caused by the operation of a new job

 

Reference link: https://www.it610.com/article/1297818114901221376.htm

 

Hive Start Error: Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgumen

Error Messages:
Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)

at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

Cause:
The two guava.jar versions of hadoop and hive are inconsistent
The two locations are located in the following two directories.
– /usr/local/hive/lib/
– /usr/local/hadoop/share/hadoop/common/lib/

Solution:
Delete the lower version and copy the higher version to the lower version directory

[Solved] Hive Error: java.sql.SQLException: No suitable driver found for jdbc:hive://localhost:10000/default

Error:

java.sql.SQLException: No suitable driver found for jdbc:hive://localhost:10000/default
    at java.sql.DriverManager.getConnection(DriverManager.java:596)
    at java.sql.DriverManager.getConnection(DriverManager.java:233)
    at demo.utils.JDBCUtils.getConnection(JDBCUtils.java:25)
    at demo.hive.HiveJDBCDemo.main(HiveJDBCDemo.java:16)
Exception in thread "main" java.lang.NullPointerException
    at demo.hive.HiveJDBCDemo.main(HiveJDBCDemo.java:18)

Solution:

url->change hive to hive2

before:

private static String driverString = "org.apache.hive.jdbc.HiveDriver";
private static String urlString = "jdbc:hive://localhost:10000/default";

after:

private static String driverString = "org.apache.hive.jdbc.HiveDriver";
private static String urlString = "jdbc:hive2://localhost:10000/default";

[Solved] Hive export MYSQL Error: Container [pid=3962,containerID=container_1632883011739_0002_01_000002] is running 270113280B beyond the ‘VIRTUAL’ memory limit.

Problem description

Container [pid=3962,containerID=container_1632883011739_0002_01_000002] is running 270113280B beyond the ‘VIRTUAL’ memory limit.

Current usage: 91.9 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.

Cause of problem

When yarn starts running, it checks the virtual memory and throws an exception.

Solution:

Modify Hadoop_Home/etc/Hadoop/yen-site.xml file

Add the following to save and exit:

1 <property>
2 <name>yarn.nodemanager.vmem-pmem-ratio</name>
3 <value>3.0</value>
4 </property>

Problem solved!

Apache Ranger integrates hive error records [How to Solve]

Version information is as follows:

hadoop2.9.2

hive 2.x

Ranger latest version 2.1.0

After deploying the Ranger plug-in on the hive side, an error is reported when using beeline to connect and query the database. The error information is as follows:

verbose: on
0: jdbc:hive2://192.168.0.9:10000> show databases;
Getting log thread is interrupted, since query is done!
Error: Error running query: java.lang.NoSuchFieldError: REPLLOAD (state=,code=0)
org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoSuchFieldError: REPLLOAD
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
    at org.apache.hive.beeline.Commands.executeInternal(Commands.java:977)
    at org.apache.hive.beeline.Commands.execute(Commands.java:1148)
    at org.apache.hive.beeline.Commands.sql(Commands.java:1063)
    at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1134)
    at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:965)
    at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:875)
    at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:499)
    at org.apache.hive.beeline.BeeLine.main(BeeLine.java:482)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoSuchFieldError: REPLLOAD
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:218)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
    at com.sun.proxy.$Proxy52.executeStatementAsync(Unknown Source)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchFieldError: REPLLOAD
    at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:700)
    at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:817)
    at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:604)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:472)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
    at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1145)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:184)
    ... 27 more

 

Most of these errors are caused by version conflicts or version mismatches

Go to the software package of Ranger admin and check the hive version supported by Ranger 2.1.0:

My ranger-admin
#cd /data/ranger/ranger-2.1.0-SNAPSHOT-admin/ews/webapp/WEB-INF/classes/ranger-plugins/hive
 Check hive package information:
#ll
total 41504
-rw-r--r-- 1 ranger ranger   492033 Nov  5 15:18 hive-common-3.1.0.jar
-rw-r--r-- 1 ranger ranger 40603464 Nov  5 15:20 hive-exec-3.1.0.jar
-rw-r--r-- 1 ranger ranger   125261 Nov  5 15:20 hive-jdbc-3.1.0.jar
-rw-r--r-- 1 ranger ranger    36852 Nov  5 15:19 hive-metastore-3.1.0.jar
-rw-r--r-- 1 ranger ranger   566570 Nov  5 15:19 hive-service-3.1.0.jar
-rw-r--r-- 1 ranger ranger   313702 Nov  5 15:19 libfb303-0.9.3.jar
-rw-r--r-- 1 ranger ranger   246445 Nov  5 15:57 libthrift-0.12.0.jar
-rw-r--r-- 1 ranger ranger    99652 Nov  5 17:18 ranger-hive-plugin-2.1.0-SNAPSHOT.jar

It is found that the supported hive version is 3. X, while the version I currently use is 2. X, resulting in an error using beeline

Solution 1: upgrade hive to 3. X

 

[hduser@yjt ~]$ beeline -u jdbc:hive2://192.168.0.230:10000 -n hduser
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data1/hadoop/hive_3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data1/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://192.168.0.230:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://192.168.0.230:10000> show databases;
INFO  : Compiling command(queryId=hduser_20191108203220_9ae51d7b-38f0-4f3a-af96-c4ddb91ee9ad): show databases
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hduser_20191108203220_9ae51d7b-38f0-4f3a-af96-c4ddb91ee9ad); Time taken: 1.348 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hduser_20191108203220_9ae51d7b-38f0-4f3a-af96-c4ddb91ee9ad): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hduser_20191108203220_9ae51d7b-38f0-4f3a-af96-c4ddb91ee9ad); Time taken: 0.039 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.878 seconds)
0: jdbc:hive2://192.168.0.230:10000> !q
Closing: 0: jdbc:hive2://192.168.0.230:10000
[hduser@yjt ~]$ hive --version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data1/hadoop/hive_3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data1/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive 3.1.2
Git git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 8190d2be7b7165effa62bd21b7d60ef81fb0e4af
Compiled by gates on Thu Aug 22 15:01:18 PDT 2019
From source with checksum 0492c08f784b188c349f6afb1d8d9847
[hduser@yjt ~]$

Solution 2: delete the package under Ranger admin hive, and then copy the corresponding package from hive client to this directory (I didn’t test it)

 

If this error is encountered during hive fetching

If you encounter such an error during hive fetching: parseexception line 1:78 cannot recognize input near ‘& lt; EOF>’ ‘& lt; EOF>’ ‘& lt; EOF>’ in expression specification
hive

The reason is: there is a problem with the last symbol. There are multiple ” or ”, and the computer does not know which symbol matches which symbol. When two quotation marks must be used, one uses’ ‘and the other uses”