当前位置:网站首页>Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
2022-07-06 17:37:00 【51CTO】
park Distributed Computing ,PySpark Actually Python Called Spark The bottom frame of , So how are these frameworks invoked ? The last one said Python Inside use GDAL Space operators implemented by packages , What about these whole call processes ? Today, let's explore .
The first article in this series said , Want to run PySpark, Need to use Py4J This package , The function of this package is to use Python To call Java Objects in the virtual machine , The principle is :

Python Algorithm passed Socket Send the task to Java The virtual machine ,JAVA The virtual machine passes through Py4J This package is parsed , And then call Spark Of Worker Computing node , Then restore the task to Python Realization , After the execution is complete , Walk backwards again .
Please check this article for specific instructions , I won't repeat it :
http://sharkdtu.com/posts/pyspark-internal.html
You can see it , use Python What you write , Finally, Worker End , Also use Python Algorithm or package to implement , Now let's do an experiment :

utilize Python Of sys Package to view the running Python Version of , use socket Package to view the machine name of the node , These two bags are Python Peculiar , if PySpark Just run Java Words , On different nodes , It should be impossible to implement .
I have two machines here , named sparkvm.com and sparkvmslave.com, among Sparkvm.com yes master + worker, and sparkvmslave.com just worker.
The last execution shows , Different results are returned on different nodes .
It can be seen from the above experiment that , On different computing nodes , The end use is Python Algorithm package for , So how to use spatial analysis algorithm on different nodes ?
stay Spark On , Utilized Algorithm plug-in In this way :

As long as the same is installed on different nodes Python Algorithm package , You can do it , The key point is to need Configure the system Python, because PySpark The default call is system Python.
Let's do another experiment :

And then in PySpark An example is running above :

Two nodes , Why is it all executed on one node ? have a look debug Log out :

Found in 153 Node , An exception has been thrown , Said he didn't find pygeohash package .
Next I'm in 153 above , hold pygeohash Package installation :

Then execute the above content again :

Finally, let's take advantage of gdal Spatial algorithm interface , Let's run an example :

To be continued
The source code can be passed through my Gitee perhaps github download :
github: https://github.com/allenlu2008/PySparkDemo
gitee: https://gitee.com/godxia/PySparkDemo
边栏推荐
- MySQL error reporting solution
- 2021-03-22 "display login screen during recovery" can't be canceled. The appearance of lock screen interface leads to the solution that the remotely connected virtual machine can't work normally
- 02 personal developed products and promotion - SMS platform
- List集合数据移除(List.subList.clear)
- Error: Publish of Process project to Orchestrator failed. The operation has timed out.
- DataGridView scroll bar positioning in C WinForm
- 分布式(一致性协议)之领导人选举( DotNext.Net.Cluster 实现Raft 选举 )
- Wu Jun's trilogy insight (V) refusing fake workers
- Flink 解析(四):恢复机制
- MySQL advanced (index, view, stored procedure, function, password modification)
猜你喜欢

连接局域网MySql

JVM garbage collector part 1

Garbage first of JVM garbage collector
![[reverse] repair IAT and close ASLR after shelling](/img/83/1c77e24e9430fb7ea775169a794a28.png)
[reverse] repair IAT and close ASLR after shelling

2021-03-22 "display login screen during recovery" can't be canceled. The appearance of lock screen interface leads to the solution that the remotely connected virtual machine can't work normally

Take you hand-in-hand to do intensive learning experiments -- knock the level in detail

Final review of information and network security (based on the key points given by the teacher)

分布式(一致性协议)之领导人选举( DotNext.Net.Cluster 实现Raft 选举 )

【Elastic】Elastic缺少xpack无法创建模板 unknown setting index.lifecycle.name index.lifecycle.rollover_alias

Akamai anti confusion
随机推荐
C# WinForm系列-Button简单使用
BearPi-HM_ Nano development board "flower protector" case
[reverse] repair IAT and close ASLR after shelling
Flink parsing (III): memory management
【MySQL入门】第四话 · 和kiko一起探索MySQL中的运算符
02个人研发的产品及推广-短信平台
C version selenium operation chrome full screen mode display (F11)
Flink parsing (V): state and state backend
Xin'an Second Edition: Chapter 24 industrial control safety demand analysis and safety protection engineering learning notes
C# NanoFramework 点灯和按键 之 ESP32
Flink parsing (VI): savepoints
connection reset by peer
Flink analysis (II): analysis of backpressure mechanism
How does wechat prevent withdrawal come true?
Huawei certified cloud computing hica
PySpark算子处理空间数据全解析(4): 先说说空间运算
[VNCTF 2022]ezmath wp
05个人研发的产品及推广-数据同步工具
JVM garbage collector part 2
Interpretation of Flink source code (III): Interpretation of executiongraph source code