当前位置:网站首页>Spark bug practice (including bug:classcastexception; connectexception; NoClassDefFoundError; runtimeException, etc.)

Spark bug practice (including bug:classcastexception; connectexception; NoClassDefFoundError; runtimeException, etc.)

2022-06-27 22:51:00 wr456wr

Environmental Science

scala edition :2.11.8
jdk edition :1.8
spark edition :2.1.0
hadoop edition :2.7.1
ubuntu edition :18.04
window edition :win10

scala Code in windows End programming ,ubuntu Install... On virtual machine ,scala,jdk,spark,hadoop All installed in ubuntu End

Question 1

Problem description : In the use of wget download scala when , appear Unable to establish SSL connection

 Insert picture description here

solve :

Add the parameter to skip verifying the certificate --no-check-certificate

Question two

Problem description : In the use of scala Program testing wordCount There was an error in the program :

(scala Program on host ,spark Install on the virtual machine )

22/06/20 22:35:38 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://192.168.78.128:7077...
22/06/20 22:35:41 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 192.168.78.128:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
	...
	Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /192.168.78.128:7077
Caused by: java.net.ConnectException: Connection refused: no further information

 Insert picture description here

solve :

Yes spark Under the conf In the catalogue spark-env.sh The configuration is as follows

 Insert picture description here

After configuration spark Start... In the directory master and worker

bin/start-all.sh

Then run it again wordCount Program , The following error occurred

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/06/20 22:44:09 INFO SparkContext: Running Spark version 2.4.8
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration$DeprecationDelta
	at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
	at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
	at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:123)

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-935LgtJu-1655814306434)( Development process .assets/1655736581056.png)]

stay pom Introduce hadooop rely on

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
</dependency>

Run the program after refreshing the dependency , appear :

22/06/20 22:50:31 INFO spark.SparkContext: Running Spark version 2.4.8
22/06/20 22:50:31 INFO spark.SparkContext: Submitted application: wordCount
22/06/20 22:50:31 INFO spark.SecurityManager: Changing view acls to: Administrator
22/06/20 22:50:31 INFO spark.SecurityManager: Changing modify acls to: Administrator
22/06/20 22:50:31 INFO spark.SecurityManager: Changing view acls groups to: 
22/06/20 22:50:31 INFO spark.SecurityManager: Changing modify acls groups to: 
22/06/20 22:50:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(Administrator); groups with view permissions: Set(); users  with modify permissions: Set(Administrator); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
	at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
	at org.apache.spark.network.util.NettyMemoryMetrics.<init>(NettyMemoryMetrics.java:76)

 Insert picture description here

to update spark-core Dependent version , It was 2.1.0, Now update to 2.3.0,spark-core Of pom Depends on the following

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>2.3.0</version>
</dependency>

The connection problem occurs again after refreshing the dependency

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-5aCGosbR-1655814306436)( Development process .assets/1655737140250.png)]

modify pom Rely on

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mesos_2.11</artifactId>
            <version>2.1.0</version>
</dependency>
        

An error is reported again after refreshing :java.lang.RuntimeException: java.lang.NoSuchFieldException: DEFAULT_TINY_CACHE_SIZE

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-a2eBPfzi-1655814306436)( Development process .assets/1655779696035.png)]

add to io.netty rely on

<dependency>
                <groupId>io.netty</groupId>
                <artifactId>netty-all</artifactId>
                <version>4.0.52.Final</version>
</dependency>

The connection problem occurs again after the operation

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-b9NjY23a-1655814306437)( Development process .assets/1655780018354.png)]

Check out the spark Of master Start up

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-TAi3JuIs-1655814306437)( Development process .assets/1655780121179.png)]

They all started successfully , Eliminate the reason for failure to start successfully

modify spark Of conf Under the spark-env .sh file

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-VY3mBPyk-1655814306437)( Development process .assets/1655780670484.png)]

Restart spark

sbin/start-all.sh

Successfully connected to the virtual machine after starting the program spark Of master

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-LnC2GfcA-1655814306438)( Development process .assets/1655780808544.png)]

Question 3

Problem description :

function scala Of wordCount appear :com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.13.0

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-BFx03mtx-1655814306438)( Development process .assets/1655781028153.png)]

This is because Jackson The version of this tool library is inconsistent . Solution : First, in the Kafka In the dependencies of , Exclude for Jackon Dependence , Thereby preventing Maven Automatically import a higher version of the library , Then manually add the lower version Jackon Dependencies of the library , again import that will do .

Add dependency :

<dependency>
            <groupId> org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>1.1.1</version>
            <exclusions>
                <exclusion>
                    <groupId>com.fasterxml.jackson.core</groupId>
                    <artifactId>*</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.6.6</version>
</dependency>

Re run the program after import , There's another problem :

NoClassDefFoundError: com/fasterxml/jackson/core/util/JacksonFeature

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-SLr4iDc6-1655814306438)( Development process .assets/1655781408043.png)]

as a result of jackson Incomplete dependence , Import jackson rely on

<dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.6.7</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.6.7</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.6.7</version>
</dependency>

Run again and the :Exception in thread “main” java.net.ConnectException: Call From WIN-P2FQSL3EP74/192.168.78.1 to 192.168.78.128:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

 Insert picture description here

Presumably this should be hadoop Connection denied , instead of spark Of master Connection problem

modify hadoop Of etc Under the core-site.xml file

 Insert picture description here

Then restart hadoop Run the program , There are new problems :

WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 0.0.0.0, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2411)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-N6EssLDt-1655814306440)( Development process .assets/1655786575000.png)]

From the screenshot of the problem, you can see that the connection should be OK , And started the scheduled wordCount Work , This time it may be a code level problem .

wordCount complete scala Code :

import org.apache.spark.{
    SparkConf, SparkContext}

object WordCount {
    
  def main(arg: Array[String]): Unit = {
    
    val ip = "192.168.78.128";
    val inputFile = "hdfs://" + ip + ":9000/hadoop/README.txt";
    val conf = new SparkConf().setMaster("spark://" + ip + ":7077").setAppName("wordCount");
    val sc = new SparkContext(conf)
    val textFile = sc.textFile(inputFile)
    val wordCount = textFile.flatMap(line => line.split(" "))
      .map(word => (word, 1)).reduceByKey((a, b) => a + b)
    wordCount.foreach(println)
  }
}

Need to set up jar package , Package the project

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-1fgMuGMF-1655814306440)( Development process .assets/1655812962723.png)]

Then the packaged project is in target Under the path , To find the corresponding jar Bag location , Copy

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-nnVVC8Xn-1655814306440)( Development process .assets/1655813012337.png)]

Copy path added to configuration setJar In the way , complete Scala WordCount Code

import org.apache.spark.{
    SparkConf, SparkContext}



object WordCount {
    
  def main(arg: Array[String]): Unit = {
    
    // After packaging jar Package address 
    val jar = Array[String]("D:\\IDEA_CODE_F\\com\\BigData\\Proj\\target\\Proj-1.0-SNAPSHOT.jar")
    //spark Virtual machine address 
    val ip = "192.168.78.129";
    val inputFile = "hdfs://" + ip + ":9000/hadoop/README.txt";
    val conf = new SparkConf()
      .setMaster("spark://" + ip + ":7077") //master Node address 
      .setAppName("wordCount")      //spark The program name 
      .setSparkHome("/root/spark")  //spark Installation address ( It should not be necessary )
      .setIfMissing("spark.driver.host", "192.168.1.112")
      .setJars(jar) // Set the packaged jar package 
    val sc = new SparkContext(conf)
    val textFile = sc.textFile(inputFile)
    val wordCount = textFile.flatMap(line => line.split(" "))
      .map(word => (word, 1)).reduceByKey((a, b) => a + b)
	val str1 = textFile.first()
    println("str: " + str1)
    val l = wordCount.count()
    println(l)
    println("------------------")
    val tuples = wordCount.collect()
    tuples.foreach(println)
    sc.stop()
  }
}

The approximate result of the operation :

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-i1D32Pdr-1655814306441)( Development process .assets/1655814085565.png)]


md,csdn When can I import directly markdown Complete documents , Every time the machine finishes writing the imported picture, it cannot be imported directly , And paste one screenshot after another

原网站

版权声明
本文为[wr456wr]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/178/202206271957196365.html

随机推荐