Tag Archives: Spark

[Solved] idea Remote Submit spark Error: java.io.IOException: Failed to connect to DESKTOP-H

Idea remotely submits spark job Java io. IOException: Failed to connect to DESKTOP-H

1. Error report log

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:63)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:63)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	... 4 more
Caused by: java.io.IOException: Failed to connect to DESKTOP-HSVM
	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
	at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
	at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
	at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 拒绝连接: DESKTOP-HSVM
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	... 1 more
Caused by: java.net.ConnectException: denied to connect
	... 11 more

2. cause analysis

Idea cannot establish a connection with the local machine while submitting spark jobs to the remote cluster and returning the results to the local machine.

Caused by: java.io.IOException: Failed to connect to DESKTOP-HSVM

3. solution

Add the native name desktop-hsvm and IP to the/etc/hosts file of the remote cluster, as shown in the following figure

[Solved] spark Connect hive Error: javax.jdo.JDODataStoreException: Required table missing : “`DBS`” in Catalog “” Schema “”

Today, I started spark. I always reported an error when connecting hive. The error is as follows

21/12/12 19:55:26 ERROR Hive: Cannot initialize metastore due to autoCreate error
javax.jdo.JDODataStoreException: Required table missing : "`DBS`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql));

Looking at the yellow font in the back, I first initialized the metadata. I wrote it in my last blog. Later, I found that it was not the metadata problem, but the need to open {datanucleus schema.autoCreateTables

<property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
 </property>

Complete hive site xm:

<configuration>
    <!-- Location of metadata generated by Hive-->
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>
    <!--- Use local service to connect to Hive, default is true-->
<property>
    <name>hive.metastore.local</name>
    <value>true</value>
</property>

    <!-- URL address of the database connection JDBC-->
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=UTC</value>
</property>
    <!-- Database connection driver, i.e. MySQL driver-->
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
</property>
    <!-- MySQL database user name-->
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
</property>
    <!-- MySQL Database Password-->
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
 </property>
<property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
 </property>
<property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
 </property>
</configuration>

[Solved] Unable to import Maven project: see logs for details when importing Maven dependent package of Spark Program

Issue.
When importing maven dependency packages for a spark application, it fails to import and reports the error.
0:23 Unable to import maven project: See logs for details

2019-08-23 00:34:05,140 [ 747292] WARN - #org.jetbrains.idea.maven - Cannot reconnect.
java.lang.RuntimeException: Cannot reconnect.
at org.jetbrains.idea.maven.server.RemoteObjectWrapper.perform(RemoteObjectWrapper.java:111)
at org.jetbrains.idea.maven.server.MavenIndexerWrapper.createIndex(MavenIndexerWrapper.java:61)
at org.jetbrains.idea.maven.indices.MavenIndex.createContext(MavenIndex.java:396)
at org.jetbrains.idea.maven.indices.MavenIndex.access$500(MavenIndex.java:48)
at org.jetbrains.idea.maven.indices.MavenIndex$IndexData.<init>(MavenIndex.java:703)
at org.jetbrains.idea.maven.indices.MavenIndex.doOpen(MavenIndex.java:236)
at org.jetbrains.idea.maven.indices.MavenIndex.open(MavenIndex.java:202)
at org.jetbrains.idea.maven.indices.MavenIndex.<init>(MavenIndex.java:104)
at org.jetbrains.idea.maven.indices.MavenIndices.add(MavenIndices.java:92)
at org.jetbrains.idea.maven.indices.MavenIndicesManager.ensureIndicesExist(MavenIndicesManager.java:174)
at org.jetbrains.idea.maven.indices.MavenProjectIndicesManager$3.run(MavenProjectIndicesManager.java:117)
at com.intellij.util.ui.update.MergingUpdateQueue.execute(MergingUpdateQueue.java:337)
at com.intellij.util.ui.update.MergingUpdateQueue.execute(MergingUpdateQueue.java:327)
at com.intellij.util.ui.update.MergingUpdateQueue.lambda$flush$1(MergingUpdateQueue.java:277)
at com.intellij.util.ui.update.MergingUpdateQueue.flush(MergingUpdateQueue.java:291)
at com.intellij.util.ui.update.MergingUpdateQueue.run(MergingUpdateQueue.java:246)
at com.intellij.util.concurrency.QueueProcessor.runSafely(QueueProcessor.java:246)
at com.intellij.util.Alarm$Request.runSafely(Alarm.java:417)
at com.intellij.util.Alarm$Request.access$700(Alarm.java:344)
at com.intellij.util.Alarm$Request$1.run(Alarm.java:384)
at com.intellij.util.Alarm$Request.run(Alarm.java:395)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.intellij.util.concurrency.SchedulingWrapper$MyScheduledFutureTask.run(SchedulingWrapper.java:242)
at com.intellij.util.concurrency.BoundedTaskExecutor$2.run(BoundedTaskExecutor.java:212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is:
java.net.SocketException: Connection reset

Reason: maven version problem, I originally used maven 3.6.0, incompatible.
The maven dependencies I need to import are as follows.

<properties>
<scala.version>2.11.8</scala.version>
<hadoop.version>2.7.4</hadoop.version>
<spark.version>2.1.3</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass></mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

Test steps.
1. replace the repository with a blank repository. while the path of the repository is shallow, I suspect that the path of the original repository is too deep. Or there is a problem with the original repository content. It didn’t work.
2. pom.xml to remove some dependencies, plug-ins, and then add one by one, no use.

Solution: replace the maven for the idea comes with maven3.3.9.

Idea debugs locally, and spark reports an error when creating hivecontext

Spark version: 1.6.1

Scala version: 2.10

problem scenario:

when idea debugged the local program, an error was reported when creating the hivecontext. This problem was not found in the morning. In the afternoon, a small Deamon was written in the project. This problem occurred. Here is my code:

import cn.com.xxx.common.config.SparkConfig
import org.apache.spark.sql.hive.HiveContext

object test{
  def main(args: Array[String]): Unit = {
    SparkConfig.init("local[2]", this.getClass.getName)
    val sc = SparkConfig.getSparkContext()
    sc.setLogLevel("WARN")
    val hqlContext = new HiveContext(sc)
    println("hello")
    sc.stop()
  }
}

 

error reporting:

 

  After worrying all morning, I thought my dependency package was not downloaded at first, but there was this package in the build.sbt file in my SBT. I tracked the source code, looked for Du Niang, and so on. Finally, I found the problem. Finally, I located the problem because the type setting of the spark hive dependency package was wrong:

By the way, how do I find the build.sbt file in idea

The culprit is this word, which needs to be changed to “compile”, as shown in the following figure:

Then, click the prompt in the lower right corner, import:

Wait until the update is completed, and then compile here. Success

Big pit, record it here to prevent encountering it again in the future

 

Spark Program Compilation error: object apache is not a member of package org

Spark Program Compilation error:

[INFO] Compiling 2 source files to E:\Develop\IDEAWorkspace\spark\target\classes at 1567004370534
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:3: error: object apache is not a member of package org
[ERROR] import org.apache.spark.rdd.RDD
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:4: error: object apache is not a member of package org
[ERROR] import org.apache.spark.{SparkConf, SparkContext}
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:12: error: not found: type SparkConf
[ERROR] val sparkConf: SparkConf = new SparkConf().setAppName(“WordCount”).setMaster(“local[2]”)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:12: error: not found: type SparkConf
[ERROR] val sparkConf: SparkConf = new SparkConf().setAppName(“WordCount”).setMaster(“local[2]”)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:14: error: not found: type SparkContext
[ERROR] val sc: SparkContext = new SparkContext(sparkConf)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:14: error: not found: type SparkContext
[ERROR] val sc: SparkContext = new SparkContext(sparkConf)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:18: error: not found: type RDD
[ERROR] val data: RDD[String] = sc.textFile(“E:\\Study\\BigData\\heima\\stage5\\2spark����\\words.txt”)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:20: error: not found: type RDD
[ERROR] val words: RDD[String] = data.flatMap(_.split(” “))
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:22: error: not found: type RDD
[ERROR] val wordToOne: RDD[(String, Int)] = words.map((_,1))
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:24: error: not found: type RDD
[ERROR] val result: RDD[(String, Int)] = wordToOne.reduceByKey(_+_)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCount.scala:27: error: not found: type RDD
[ERROR] val ascResult: RDD[(String, Int)] = result.sortBy(_._2,false) //����
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:3: error: object apache is not a member of package org
[ERROR] import org.apache.spark.{SparkConf, SparkContext}
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:4: error: object apache is not a member of package org
[ERROR] import org.apache.spark.rdd.RDD
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:12: error: not found: type SparkConf
[ERROR] val sparkConf: SparkConf = new SparkConf().setAppName(“WordCountCluster”)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:12: error: not found: type SparkConf
[ERROR] val sparkConf: SparkConf = new SparkConf().setAppName(“WordCountCluster”)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:14: error: not found: type SparkContext
[ERROR] val sc: SparkContext = new SparkContext(sparkConf)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:14: error: not found: type SparkContext
[ERROR] val sc: SparkContext = new SparkContext(sparkConf)
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:18: error: not found: type RDD
[ERROR] val data: RDD[String] = sc.textFile(args(0))
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:20: error: not found: type RDD
[ERROR] val words: RDD[String] = data.flatMap(_.split(” “))
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:22: error: not found: type RDD
[ERROR] val wordToOne: RDD[(String, Int)] = words.map((_,1))
[ERROR] ^
[ERROR] E:\Develop\IDEAWorkspace\spark\src\main\scala\cn\itcast\wordCount\WordCountCluster.scala:24: error: not found: type RDD
[ERROR] val result: RDD[(String, Int)] = wordToOne.reduceByKey(_+_)
[ERROR] ^
[ERROR] 21 errors found
[INFO] ————————————————————————
[INFO] BUILD FAILURE

Reason: There is a problem with the local repository. It is likely that the original local repository path is too long and too deep, the repository itself is fine, because I copied the original repository to the E:\Study\BigData\ directory and it works fine.

Solution.

The original spark project maven local repository is: E:\Study\BigData\heima\stage5\1scala\scala3\spark course needs maven repository\SparkRepository

Later I modified it to: E:\Study\BigData\repository \ just fine.