当前位置:网站首页>Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
2022-07-06 17:37:00 【51CTO】
park Distributed Computing ,PySpark Actually Python Called Spark The bottom frame of , So how are these frameworks invoked ? The last one said Python Inside use GDAL Space operators implemented by packages , What about these whole call processes ? Today, let's explore .
The first article in this series said , Want to run PySpark, Need to use Py4J This package , The function of this package is to use Python To call Java Objects in the virtual machine , The principle is :

Python Algorithm passed Socket Send the task to Java The virtual machine ,JAVA The virtual machine passes through Py4J This package is parsed , And then call Spark Of Worker Computing node , Then restore the task to Python Realization , After the execution is complete , Walk backwards again .
Please check this article for specific instructions , I won't repeat it :
http://sharkdtu.com/posts/pyspark-internal.html
You can see it , use Python What you write , Finally, Worker End , Also use Python Algorithm or package to implement , Now let's do an experiment :

utilize Python Of sys Package to view the running Python Version of , use socket Package to view the machine name of the node , These two bags are Python Peculiar , if PySpark Just run Java Words , On different nodes , It should be impossible to implement .
I have two machines here , named sparkvm.com and sparkvmslave.com, among Sparkvm.com yes master + worker, and sparkvmslave.com just worker.
The last execution shows , Different results are returned on different nodes .
It can be seen from the above experiment that , On different computing nodes , The end use is Python Algorithm package for , So how to use spatial analysis algorithm on different nodes ?
stay Spark On , Utilized Algorithm plug-in In this way :

As long as the same is installed on different nodes Python Algorithm package , You can do it , The key point is to need Configure the system Python, because PySpark The default call is system Python.
Let's do another experiment :

And then in PySpark An example is running above :

Two nodes , Why is it all executed on one node ? have a look debug Log out :

Found in 153 Node , An exception has been thrown , Said he didn't find pygeohash package .
Next I'm in 153 above , hold pygeohash Package installation :

Then execute the above content again :

Finally, let's take advantage of gdal Spatial algorithm interface , Let's run an example :

To be continued
The source code can be passed through my Gitee perhaps github download :
github: https://github.com/allenlu2008/PySparkDemo
gitee: https://gitee.com/godxia/PySparkDemo
边栏推荐
- C version selenium operation chrome full screen mode display (F11)
- MySQL basic addition, deletion, modification and query of SQL statements
- 基于Infragistics.Document.Excel导出表格的类
- 02个人研发的产品及推广-短信平台
- 华为认证云计算HICA
- Flink analysis (I): basic concept analysis
- 学 SQL 必须了解的 10 个高级概念
- mysql的合计/统计函数
- EasyRE WriteUp
- TCP connection is more than communicating with TCP protocol
猜你喜欢

TCP连接不止用TCP协议沟通

TCP connection is more than communicating with TCP protocol

JVM 垃圾回收器之Garbage First

【MySQL入门】第一话 · 初入“数据库”大陆

Flink analysis (II): analysis of backpressure mechanism

2021-03-22 "display login screen during recovery" can't be canceled. The appearance of lock screen interface leads to the solution that the remotely connected virtual machine can't work normally

Wu Jun's trilogy experience (VII) the essence of Commerce

PySpark算子处理空间数据全解析(4): 先说说空间运算

Display picture of DataGridView cell in C WinForm

数据仓库建模使用的模型以及分层介绍
随机推荐
Connect to LAN MySQL
MySQL报错解决
C# WinForm中DataGridView单元格显示图片
Flink parsing (V): state and state backend
Jetpack compose 1.1 release, based on kotlin's Android UI Toolkit
Brush questions during summer vacation, ouch ouch
Uipath browser performs actions in the new tab
The most complete tcpdump and Wireshark packet capturing practice in the whole network
[ciscn 2021 South China]rsa writeup
微信防撤回是怎么实现的?
远程代码执行渗透测试——B模块测试
07 personal R & D products and promotion - human resources information management system
Interpretation of Flink source code (II): Interpretation of jobgraph source code
【逆向中级】跃跃欲试
Flink parsing (III): memory management
How does wechat prevent withdrawal come true?
网络分层概念及基本知识
mysql的合计/统计函数
Shawshank's sense of redemption
connection reset by peer