当前位置:网站首页>【无标题】
【无标题】
2022-06-26 09:36:00 【半_调_子】
第一:下载所有hadoop二进制包
第二:下载spark 包
第三:下载java
第四:下载anancode
# 创建虚拟环境 pyspark, 基于Python 3.8
conda create -n pyspark python=3.8
# 切换到虚拟环境内
conda activate pyspark
# 在虚拟环境内安装包
pip install pyhive pyspark jieba -i https://pypi.tuna.tsinghua.edu.cn/simple
通过pycharm写代码:
# coding:utf8
from pyspark import SparkConf, SparkContext
import os
os.environ['JAVA_HOME'] = r"C:\Java\jdk1.8.0_201"
os.environ['SPARK_HOME'] = r"D:\spark-3.1.2-bin-hadoop2.7"
os.environ['PYSPARK_PYTHON'] = r"D:\anaconda3\envs\pyspark\python.exe"
os.environ['HADOOP_HOME']=r"D:\hadoop-2.7.7"
if __name__ == '__main__':
conf = SparkConf().setAppName("helloword")
# 通过SparkConf对象构建SparkContext对象
sc = SparkContext(conf=conf)
file_rdd = sc.textFile("./myfile.text")
words_rdd = file_rdd.flatMap(lambda line: line.split(" "))
# 将单词转换为元组对象, key是单词, value是数字1
words_with_one_rdd = words_rdd.map(lambda x: (x, 1))
# 将元组的value 按照key来分组, 对所有的value执行聚合操作(相加)
result_rdd = words_with_one_rdd.reduceByKey(lambda a, b: a + b)
# 通过collect方法收集RDD的数据打印输出结果
print(result_rdd.collect())
边栏推荐
- Internationalization configuration
- 美国总统签署社区安全法案以应对枪支问题
- 904. 水果成篮
- Leetcode basic calculator 224 227. follow up 394
- #云原生征文# 在 Google Kubernetes Cluster 上使用 HANA Expression Database Service
- 國際化配置
- Dialog centered
- Configuration internationale
- Druid data source for background monitoring
- The basis of C language grammar -- factoring by function applet
猜你喜欢

Win10安装tensorflow-quantum过程详解

install realsense2: The following packages have unmet dependencies: libgtk-3-dev

Redis notes (14) - persistence and data recovery (data persistence RDB and AOF, data recovery, mixed persistence)

What is the web SSH service port of wgcloud

Full introduction to flexboxlayout (Google official flexible implementation of flow layout control)

Differences between JVM, Dalvik and art

定制拦截器

P1296 whispers of cows (quick row + binary search)

druid数据源实现后台监控

Custom interceptor
随机推荐
online trajectory generation
MapReduce & yarn theory
定制拦截器
逻辑英语结构【重点】
A concise tutorial for getting started with go generics
In the fragment, the input method is hidden after clicking the confirm cancel button in the alertdialog (this is valid after looking for it on the Internet for a long time)
MySQL learning summary
Redis master-slave replication in win10 system
存储过程测试入门案例
力扣------从数组中移除最大值和最小值
Mysql database field query case sensitive setting
How to find and install the dependent libraries of Debian system
自动化测试——pytest本身及第三方模块介绍及使用
美国总统签署社区安全法案以应对枪支问题
The basis of C language grammar -- learning of local variables and storage categories, global variables and storage categories, and macro definitions
Constraintlayout control uses full Raiders
自动化测试——pytest框架介绍及示例
Internationalization configuration
Battery historian analyzes battery consumption
Does the go compiled executable have dynamic library links?