当前位置:网站首页>Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
2022-07-06 17:37:00 【51CTO】
park Distributed Computing ,PySpark Actually Python Called Spark The bottom frame of , So how are these frameworks invoked ? The last one said Python Inside use GDAL Space operators implemented by packages , What about these whole call processes ? Today, let's explore .
The first article in this series said , Want to run PySpark, Need to use Py4J This package , The function of this package is to use Python To call Java Objects in the virtual machine , The principle is :

Python Algorithm passed Socket Send the task to Java The virtual machine ,JAVA The virtual machine passes through Py4J This package is parsed , And then call Spark Of Worker Computing node , Then restore the task to Python Realization , After the execution is complete , Walk backwards again .
Please check this article for specific instructions , I won't repeat it :
http://sharkdtu.com/posts/pyspark-internal.html
You can see it , use Python What you write , Finally, Worker End , Also use Python Algorithm or package to implement , Now let's do an experiment :

utilize Python Of sys Package to view the running Python Version of , use socket Package to view the machine name of the node , These two bags are Python Peculiar , if PySpark Just run Java Words , On different nodes , It should be impossible to implement .
I have two machines here , named sparkvm.com and sparkvmslave.com, among Sparkvm.com yes master + worker, and sparkvmslave.com just worker.
The last execution shows , Different results are returned on different nodes .
It can be seen from the above experiment that , On different computing nodes , The end use is Python Algorithm package for , So how to use spatial analysis algorithm on different nodes ?
stay Spark On , Utilized Algorithm plug-in In this way :

As long as the same is installed on different nodes Python Algorithm package , You can do it , The key point is to need Configure the system Python, because PySpark The default call is system Python.
Let's do another experiment :

And then in PySpark An example is running above :

Two nodes , Why is it all executed on one node ? have a look debug Log out :

Found in 153 Node , An exception has been thrown , Said he didn't find pygeohash package .
Next I'm in 153 above , hold pygeohash Package installation :

Then execute the above content again :

Finally, let's take advantage of gdal Spatial algorithm interface , Let's run an example :

To be continued
The source code can be passed through my Gitee perhaps github download :
github: https://github.com/allenlu2008/PySparkDemo
gitee: https://gitee.com/godxia/PySparkDemo
边栏推荐
- [CISCN 2021 华南赛区]rsa Writeup
- 沉淀下来的数据库操作类-C#版(SQL Server)
- Jetpack compose 1.1 release, based on kotlin's Android UI Toolkit
- The solution to the left-right sliding conflict caused by nesting Baidu MapView in the fragment of viewpager
- Xin'an Second Edition; Chapter 11 learning notes on the principle and application of network physical isolation technology
- 06个人研发的产品及推广-代码统计工具
- Flexible report v1.0 (simple version)
- mysql的合计/统计函数
- 虚拟机启动提示Probing EDD (edd=off to disable)错误
- connection reset by peer
猜你喜欢

信息与网络安全期末复习(基于老师给的重点)

06 products and promotion developed by individuals - code statistical tools

Kali2021 installation and basic configuration

Vscode matches and replaces the brackets

案例:检查空字段【注解+反射+自定义异常】

Take you hand-in-hand to do intensive learning experiments -- knock the level in detail

2021-03-22 "display login screen during recovery" can't be canceled. The appearance of lock screen interface leads to the solution that the remotely connected virtual machine can't work normally

Vscode replaces commas, or specific characters with newlines

Flink 解析(二):反压机制解析

【MySQL入门】第三话 · MySQL中常见的数据类型
随机推荐
Models used in data warehouse modeling and layered introduction
Detailed explanation of data types of MySQL columns
The NTFS format converter (convert.exe) is missing from the current system
Automatic operation and maintenance sharp weapon ansible Playbook
Brush questions during summer vacation, ouch ouch
Flink parsing (III): memory management
灵活报表v1.0(简单版)
【MMdetection】一文解决安装问题
信息与网络安全期末复习(完整版)
JVM class loading subsystem
Junit单元测试
Selenium test of automatic answer runs directly in the browser, just like real users.
List set data removal (list.sublist.clear)
Grafana 9 正式发布,更易用,更酷炫了!
C WinForm series button easy to use
Wu Jun's trilogy experience (VII) the essence of Commerce
Solrcloud related commands
The most complete tcpdump and Wireshark packet capturing practice in the whole network
复盘网鼎杯Re-Signal Writeup
Flink 解析(二):反压机制解析