当前位置:网站首页>MongoDB 遇见 spark(进行整合)
MongoDB 遇见 spark(进行整合)
2022-07-07 11:17:00 【cui_yonghua】
基础篇(能解决工作中80%的问题):
进阶篇:
其它:
一. 与HDFS相比,MongoDB的优势
1、在存储方式上,HDFS以文件为单位,每个文件大小为 64M~128M, 而mongo则表现的更加细颗粒化;
2、MongoDB支持HDFS没有的索引概念,所以在读取速度上更快;
3、MongoDB更加容易进行修改数据;
4、HDFS响应级别为分钟,而MongoDB响应类别为毫秒;
5、可以利用MongoDB强大的 Aggregate功能进行数据筛选或预处理;
6、如果使用MongoDB,就不用像传统模式那样,到Redis内存数据库计算后,再将其另存到HDFS上。
二. 大数据的分层架构
MongoDB可以替换HDFS, 作为大数据平台中最核心的部分,可以分层如下:
第1层:MongoDB或者HDFS;
第2层:资源管理 如 YARN、Mesos、K8S;
第3层:计算引擎 如 MapReduce、Spark;
第4层:程序接口 如 Pig、Hive、Spark SQL、Spark Streaming、Data Frame等
参考:
mongo-python-driver: https://github.com/mongodb/mongo-python-driver/
三. 源码介绍
mongo-spark/examples/src/test/python/introduction.py
# -*- coding: UTF-8 -*-
#
# Copyright 2016 MongoDB, Inc.
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# To run this example use:
# ./bin/spark-submit --master "local[4]" \
# --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.coll?readPreference=primaryPreferred" \
# --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.coll" \
# --packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0 \
# introduction.py
from pyspark.sql import SparkSession
if __name__ == "__main__":
spark = SparkSession.builder.appName("Python Spark SQL basic example").getOrCreate()
logger = spark._jvm.org.apache.log4j
logger.LogManager.getRootLogger().setLevel(logger.Level.FATAL)
# Save some data
characters = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77), ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
characters.write.format("com.mongodb.spark.sql").mode("overwrite").save()
# print the schema
print("Schema:")
characters.printSchema()
# read from MongoDB collection
df = spark.read.format("com.mongodb.spark.sql").load()
# SQL
df.registerTempTable("temp")
centenarians = spark.sql("SELECT name, age FROM temp WHERE age >= 100")
print("Centenarians:")
centenarians.show()
边栏推荐
- 为租客提供帮助
- [untitled]
- CMU15445 (Fall 2019) 之 Project#2 - Hash Table 详解
- Differences between MySQL storage engine MyISAM and InnoDB
- 2022a special equipment related management (boiler, pressure vessel and pressure pipeline) simulated examination question bank simulated examination platform operation
- ORACLE进阶(五)SCHEMA解惑
- Lingyunguang of Dachen and Xiaomi investment is listed: the market value is 15.3 billion, and the machine is implanted into the eyes and brain
- test
- 迅为iTOP-IMX6ULL开发板Pinctrl和GPIO子系统实验-修改设备树文件
- Leetcode question brushing: binary tree 26 (insertion operation in binary search tree)
猜你喜欢

About how appium closes apps (resolved)

共创软硬件协同生态:Graphcore IPU与百度飞桨的“联合提交”亮相MLPerf
![《ASP.NET Core 6框架揭秘》样章[200页/5章]](/img/4f/5688c391dd19129d912a3557732047.jpg)
《ASP.NET Core 6框架揭秘》样章[200页/5章]

Differences between MySQL storage engine MyISAM and InnoDB

COSCon'22 社区召集令来啦!Open the World,邀请所有社区一起拥抱开源,打开新世界~

《开源圆桌派》第十一期“冰与火之歌”——如何平衡开源与安全间的天然矛盾?
![[untitled]](/img/6c/df2ebb3e39d1e47b8dd74cfdddbb06.gif)
[untitled]

Creation and assignment of graphic objects

PACP学习笔记一:使用 PCAP 编程

飞桨EasyDL实操范例:工业零件划痕自动识别
随机推荐
[crawler] avoid script detection when using selenium
3D content generation based on nerf
Leetcode skimming: binary tree 25 (the nearest common ancestor of binary search tree)
JNA学习笔记一:概念
在字符串中查找id值MySQL
SSM框架搭建的步骤
Leetcode skimming: binary tree 22 (minimum absolute difference of binary search tree)
Sequoia China completed the new phase of $9billion fund raising
Cinnamon 任务栏网速
Creation and assignment of graphic objects
《ASP.NET Core 6框架揭秘》样章[200页/5章]
[learn microservice from 0] [01] what is microservice
Unity 构建错误:当前上下文中不存在名称“EditorUtility”
MySQL master-slave replication
What if the xshell evaluation period has expired
关于 appium 启动 app 后闪退的问题 - (已解决)
【无标题】
将数学公式在el-table里面展示出来
MySQL入门尝鲜
日本政企员工喝醉丢失46万信息U盘,公开道歉又透露密码规则