当前位置:网站首页>【无标题】
【无标题】
2022-06-26 09:36:00 【半_调_子】
第一:下载所有hadoop二进制包
第二:下载spark 包
第三:下载java
第四:下载anancode
# 创建虚拟环境 pyspark, 基于Python 3.8
conda create -n pyspark python=3.8
# 切换到虚拟环境内
conda activate pyspark
# 在虚拟环境内安装包
pip install pyhive pyspark jieba -i https://pypi.tuna.tsinghua.edu.cn/simple
通过pycharm写代码:
# coding:utf8
from pyspark import SparkConf, SparkContext
import os
os.environ['JAVA_HOME'] = r"C:\Java\jdk1.8.0_201"
os.environ['SPARK_HOME'] = r"D:\spark-3.1.2-bin-hadoop2.7"
os.environ['PYSPARK_PYTHON'] = r"D:\anaconda3\envs\pyspark\python.exe"
os.environ['HADOOP_HOME']=r"D:\hadoop-2.7.7"
if __name__ == '__main__':
conf = SparkConf().setAppName("helloword")
# 通过SparkConf对象构建SparkContext对象
sc = SparkContext(conf=conf)
file_rdd = sc.textFile("./myfile.text")
words_rdd = file_rdd.flatMap(lambda line: line.split(" "))
# 将单词转换为元组对象, key是单词, value是数字1
words_with_one_rdd = words_rdd.map(lambda x: (x, 1))
# 将元组的value 按照key来分组, 对所有的value执行聚合操作(相加)
result_rdd = words_with_one_rdd.reduceByKey(lambda a, b: a + b)
# 通过collect方法收集RDD的数据打印输出结果
print(result_rdd.collect())
边栏推荐
猜你喜欢

Some problems to be considered when designing technical implementation scheme

WIN10系统实现Redis主从复制

The basis of C language grammar -- learning of local variables and storage categories, global variables and storage categories, and macro definitions

Basic grammar of C language -- pointer (character, one-dimensional array) learning

P1296 whispers of cows (quick row + binary search)

cmake / set 命令

c语言语法基础之——指针(字符、一维数组) 学习

2021 national vocational college skills competition (secondary vocational group) network security competition questions (1) detailed analysis tutorial

The 100000 line transaction lock has opened your eyes.

DAY 3 数组,前置后置,字符空间,关键词和地址指针
随机推荐
Today's headline adaptation scheme code
The basis of C language grammar -- function nesting, Fibonacci sum of recursive applet and factorial
Poj3682 king arthur's birthday celebration (probability)
【LeetCode】59. 螺旋矩阵 II
pcl install
Specific meaning of go bootstrap
The basis of C language grammar -- function definition learning
118. 杨辉三角
Does the go compiled executable have dynamic library links?
Threadmode interpretation of eventbus
cento7.7安装ELK简单记录
SQL advanced tutorial
Dialog centered
Full introduction to flexboxlayout (Google official flexible implementation of flow layout control)
Redis master-slave replication in win10 system
测试实践——app 测试注意点
thymeleaf中抽取公共片段
How do technicians send notifications?
Speed test of adding, deleting, modifying and querying 5million pieces of data in a single MySQL table
How about the security of flush stock trading software? How to open an account in flush