当前位置:网站首页>Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
Pyspark operator processing spatial data full parsing (5): how to use spatial operation interface in pyspark
2022-07-06 17:37:00 【51CTO】
park Distributed Computing ,PySpark Actually Python Called Spark The bottom frame of , So how are these frameworks invoked ? The last one said Python Inside use GDAL Space operators implemented by packages , What about these whole call processes ? Today, let's explore .
The first article in this series said , Want to run PySpark, Need to use Py4J This package , The function of this package is to use Python To call Java Objects in the virtual machine , The principle is :
Python Algorithm passed Socket Send the task to Java The virtual machine ,JAVA The virtual machine passes through Py4J This package is parsed , And then call Spark Of Worker Computing node , Then restore the task to Python Realization , After the execution is complete , Walk backwards again .
Please check this article for specific instructions , I won't repeat it :
http://sharkdtu.com/posts/pyspark-internal.html
You can see it , use Python What you write , Finally, Worker End , Also use Python Algorithm or package to implement , Now let's do an experiment :
utilize Python Of sys Package to view the running Python Version of , use socket Package to view the machine name of the node , These two bags are Python Peculiar , if PySpark Just run Java Words , On different nodes , It should be impossible to implement .
I have two machines here , named sparkvm.com and sparkvmslave.com, among Sparkvm.com yes master + worker, and sparkvmslave.com just worker.
The last execution shows , Different results are returned on different nodes .
It can be seen from the above experiment that , On different computing nodes , The end use is Python Algorithm package for , So how to use spatial analysis algorithm on different nodes ?
stay Spark On , Utilized Algorithm plug-in In this way :
As long as the same is installed on different nodes Python Algorithm package , You can do it , The key point is to need Configure the system Python, because PySpark The default call is system Python.
Let's do another experiment :
And then in PySpark An example is running above :
Two nodes , Why is it all executed on one node ? have a look debug Log out :
Found in 153 Node , An exception has been thrown , Said he didn't find pygeohash package .
Next I'm in 153 above , hold pygeohash Package installation :
Then execute the above content again :
Finally, let's take advantage of gdal Spatial algorithm interface , Let's run an example :
To be continued
The source code can be passed through my Gitee perhaps github download :
github: https://github.com/allenlu2008/PySparkDemo
gitee: https://gitee.com/godxia/PySparkDemo
边栏推荐
- JVM class loading subsystem
- Interpretation of Flink source code (II): Interpretation of jobgraph source code
- Flink 解析(二):反压机制解析
- 【Elastic】Elastic缺少xpack无法创建模板 unknown setting index.lifecycle.name index.lifecycle.rollover_alias
- 自动化运维利器-Ansible-Playbook
- 关于Selenium启动Chrome浏览器闪退问题
- 基于Infragistics.Document.Excel导出表格的类
- 自动答题 之 Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。
- 04个人研发的产品及推广-数据推送工具
- C#版Selenium操作Chrome全屏模式显示(F11)
猜你喜欢
Akamai anti confusion
February database ranking: how long can Oracle remain the first?
【MySQL入门】第三话 · MySQL中常见的数据类型
The problem of "syntax error" when uipath executes insert statement is solved
【ASM】字节码操作 ClassWriter 类介绍与使用
03个人研发的产品及推广-计划服务配置器V3.0
02个人研发的产品及推广-短信平台
06 products and promotion developed by individuals - code statistical tools
Integrated development management platform
Junit单元测试
随机推荐
Automatic operation and maintenance sharp weapon ansible Foundation
Flink 解析(五):State与State Backend
04 products and promotion developed by individuals - data push tool
CTF reverse entry question - dice
PySpark算子处理空间数据全解析(4): 先说说空间运算
Development and practice of lightweight planning service tools
当前系统缺少NTFS格式转换器(convert.exe)
【MMdetection】一文解决安装问题
Learn the wisdom of investment Masters
Flink analysis (I): basic concept analysis
Final review of information and network security (full version)
pip install pyodbc : ERROR: Command errored out with exit status 1
Detailed explanation of data types of MySQL columns
mysql的列的数据类型详解
EasyRE WriteUp
Xin'an Second Edition; Chapter 11 learning notes on the principle and application of network physical isolation technology
数据仓库建模使用的模型以及分层介绍
03 products and promotion developed by individuals - plan service configurator v3.0
Flink源码解读(二):JobGraph源码解读
自动化运维利器-Ansible-Playbook