当前位置:网站首页>If you encounter problems when using spark for the first time, please ask for help
If you encounter problems when using spark for the first time, please ask for help
2022-06-30 04:06:00 【Xiao Zhu, classmate Wu】
The main question is just getting started Spark novice , In the use of pycharm When writing the following program , No results
from pyspark import SparkConf,SparkContext
conf=SparkConf().setMaster("local").setAppName("My App")
sc=SparkContext(conf=conf)
sc.setLogLevel('ERROR')
sc.setLogLevel("INFO")
print(sc)
nums=sc.parallelize([1,2,3,4])
squaared=nums.map(lambda x:x*x).collect()
for num in squaared:
print("%i" %(num))pycharm The environment variables in are set according to what the elder said ,spark and hadoop The version can be displayed at the terminal input .PYTHONPATH The settings are D:\spark\spark-3.1.1-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip. that winutils.exe Yes Hadoop The folder bin I went down to .
The error reported now is like this .
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/D:/spark/spark-3.1.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
<SparkContext master=local appName=My App>
21/03/06 16:28:55 INFO SparkContext: Starting job: collect at D:\pythonfile\spark0305\saprk01\squred.py:8
21/03/06 16:28:55 INFO DAGScheduler: Got job 0 (collect at D:\pythonfile\spark0305\saprk01\squred.py:8) with 1 output partitions
21/03/06 16:28:55 INFO DAGScheduler: Final stage: ResultStage 0 (collect at D:\pythonfile\spark0305\saprk01\squred.py:8)
21/03/06 16:28:55 INFO DAGScheduler: Parents of final stage: List()
21/03/06 16:28:55 INFO DAGScheduler: Missing parents: List()
21/03/06 16:28:55 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at collect at D:\pythonfile\spark0305\saprk01\squred.py:8), which has no missing parents
21/03/06 16:28:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.6 KiB, free 434.4 MiB)
21/03/06 16:28:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.0 KiB, free 434.4 MiB)
21/03/06 16:28:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on LAPTOP-QKHMC2OG:49527 (size: 3.0 KiB, free: 434.4 MiB)
21/03/06 16:28:55 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1383
21/03/06 16:28:55 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (PythonRDD[1] at collect at D:\pythonfile\spark0305\saprk01\squred.py:8) (first 15 tasks are for partitions Vector(0))
21/03/06 16:28:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
21/03/06 16:28:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (LAPTOP-QKHMC2OG, executor driver, partition 0, PROCESS_LOCAL, 4495 bytes) taskResourceAssignments Map()
21/03/06 16:28:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/06 16:29:05 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:107)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:708)
at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:752)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:684)
at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:650)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:626)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:583)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:540)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
... 14 more
21/03/06 16:29:05 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-QKHMC2OG executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:107)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:708)
at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:752)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:684)
at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:650)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:626)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:583)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:540)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
... 14 more
21/03/06 16:29:05 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
21/03/06 16:29:05 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
21/03/06 16:29:05 INFO TaskSchedulerImpl: Cancelling stage 0
21/03/06 16:29:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
21/03/06 16:29:05 INFO DAGScheduler: ResultStage 0 (collect at D:\pythonfile\spark0305\saprk01\squred.py:8) failed in 10.418 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-QKHMC2OG executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:107)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:708)
at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:752)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:684)
at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:650)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:626)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:583)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:540)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
... 14 more
Driver stacktrace:
21/03/06 16:29:05 INFO DAGScheduler: Job 0 failed: collect at D:\pythonfile\spark0305\saprk01\squred.py:8, took 10.461468 s
Traceback (most recent call last):
File "D:\pythonfile\spark0305\saprk01\squred.py", line 8, in <module>
squaared=nums.map(lambda x:x*x).collect()
File "D:\python\1\lib\site-packages\pyspark\rdd.py", line 949, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "D:\spark\spark-3.1.1-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1304, in __call__
File "D:\spark\spark-3.1.1-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-QKHMC2OG executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:107)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:708)
at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:752)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:684)
at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:650)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:626)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:583)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:540)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
... 14 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2253)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2202)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2201)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2201)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1078)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1078)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1078)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2440)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2382)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2371)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2223)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2242)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2267)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:180)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:107)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
... 1 more
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:708)
at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:752)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:684)
at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:650)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:626)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:583)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:540)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
... 14 more
21/03/06 16:29:05 INFO SparkContext: Invoking stop() from shutdown hook
Process finished with exit code 1
Does anyone know what to do , Very helpless hahaha . thank you .
边栏推荐
- Pig-Latin (UVA492)
- The school training needs to make a registration page. It needs to open the database and save the contents entered on the registration page into the database
- (Reprinted) an article will take you to understand the reproducing kernel Hilbert space (RKHS) and various spaces
- EasyCVR部署服务器集群时,出现一台在线一台不在线是什么原因?
- dotnet-exec 0.5.0 released
- DO280私有仓库持久存储与章节实验
- DRF -- nested serializer (multi table joint query)
- [punch in - Blue Bridge Cup] day 2 --- format output format, ASCII
- 技术分享| 融合调度中的广播功能设计
- Day 12 advanced programming techniques
猜你喜欢

绿色新动力,算力“零”负担——JASMINER X4系列火爆热销中
![[fuzzy neural network prediction] water quality prediction based on fuzzy neural network, including Matlab source code](/img/88/038826ec6d16c8eb04d9ef2e01d47a.png)
[fuzzy neural network prediction] water quality prediction based on fuzzy neural network, including Matlab source code

Radiant energy, irradiance and radiance
![[summary of skimming questions] database questions are summarized by knowledge points (continuous update / simple and medium questions have been completed)](/img/89/fc02ce355c99031623175c9f351790.jpg)
[summary of skimming questions] database questions are summarized by knowledge points (continuous update / simple and medium questions have been completed)

Day 12 advanced programming techniques

Grasp grpc communication framework in simple terms

NER中BiLSTM-CRF解读score_sentence

学校实训要做一个注册页面,要打开数据库把注册页面输入的内容存进数据库但是

dotnet-exec 0.5.0 released

基于海康EhomeDemo工具排查公网部署出现的视频播放异常问题
随机推荐
mysql更新数组形式的json串
Jour 9 Gestion des scripts et des ressources
Geometric objects in shapely
What does the hyphen mean for a block in Twig like in {% block body -%}?
Day 11 script and game AI
[note] May 23, 2022 MySQL
Pig-Latin (UVA492)
SQL追加字段
Unity échappe à l'entrée de caractères lors de l'entrée de chaînes dans l'éditeur
UML diagrams and list collections
Semantic segmentation resources
关于智能视觉组上的机械臂
2021-07-05
About manipulator on Intelligent Vision Group
[punch in - Blue Bridge Cup] day 3 --- slice in reverse order list[: -1]
Use ideal to connect to the database. The results show some warnings. How to deal with this part
将DataBinding整合到Activity/Fragment的一种极简方式
A minimalist way to integrate databinding into activity/fragment
技术分享| 融合调度中的广播功能设计
matplotlib. pyplot. Hist parameter introduction