当前位置:网站首页>pymongo保存dataframe格式的数据(insert_one, insert_many, 多线程保存)
pymongo保存dataframe格式的数据(insert_one, insert_many, 多线程保存)
2022-07-25 15:46:00 【呆萌的代Ma】
使用Pymongo保存数据的基本方法(增删改查)请参考:Python连接MongoDB,使用pymongo进行增删改查
1. 基本方法: 逐行保存
这是最基本的保存方法,可以对数据本身做微调,然后保存
from pymongo import MongoClient
import pandas as pd
import numpy as np
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def _save_or_update_mongodb(coll, dict_value):
"""根据检查_id,如果存在就覆盖,如果不存在就新增"""
record = coll.find_one({
"_id": dict_value['_id']})
if not record:
coll.insert_one(dict_value)
else:
coll.update_one(record, {
"$set": dict_value,
})
def save_dataframe_to_mongo(dataframe):
coll = get_coll("test_db", "test_collection")
for index, series in dataframe.iterrows():
dict_value = series.to_dict()
dict_value.update({
"_id": index,
})
_save_or_update_mongodb(coll, dict_value)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(10, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
2. insert_many 批量保存
可以一次性保存一批数据,使用insert_many方法可以批量保存数据
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record")
coll.insert_many(dict_list)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
3. Threading 多线程保存数据
Pymongo是多线程安全、多进程不安全的,因此可以肆无忌惮的使用多线程模式保存数据,示例代码如下:
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
import threading
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
thread_list = []
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record") # 待保存数据
# 多线程
thread = threading.Thread(target=coll.insert_many, args=(dict_list,))
thread.start()
thread_list.append(thread)
# 等待全部线程任务执行完成
for _thr in thread_list:
_thr.join()
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
边栏推荐
- Zhaoqi Kechuang high-level innovation and Entrepreneurship Talent Service Platform at home and abroad, mass entrepreneurship and innovation achievement transformation platform
- 哪个led显示屏厂家更好
- 不愧是阿里内部“千亿级并发系统架构设计笔记”面面俱到,太全了
- mysql意向锁
- 共享锁(Shared Lock)
- Leetcode - 379 telephone directory management system (Design)
- Experimental reproduction of image classification (reasoning only) based on caffe resnet-50 network
- 30 lines write the concurrency tool class yourself (semaphore, cyclicbarrier, countdownlatch)
- Pytoch learning notes -- Summary of common functions of pytoch 1
- Activity review | July 6 Anyuan AI X machine heart series lecture No. 2 | MIT professor Max tegmark shares "symbiotic evolution of human and AI"
猜你喜欢

LeetCode - 362 敲击计数器(设计)

MATLAB optimization tool manopt installation

「数字安全」警惕 NFT的七大骗局

基于Caffe ResNet-50网络实现图片分类(仅推理)的实验复现

Beyond Compare 4 实现class文件对比【最新】

阿唐的小帮手

Solve the vender-base.66c6fc1c0b393478adf7.js:6 typeerror: cannot read property 'validate' of undefined problem

通用测试用例写作规范

Understand "average load"

泰雷兹推出解决方案,助力SAP客户控制云端数据
随机推荐
Leetcode - 380 o (1) time to insert, delete and get random elements (design hash table + array)
LeetCode - 303 区域和检索 - 数组不可变 (设计 前缀和数组)
Save the image with gaussdb (for redis), and the recommended business can easily reduce the cost by 60%
Pytoch learning notes advanced_ CNN (using perception_module) implements MNIST dataset classification - (comments and results)
IDEA—点击文件代码与目录自动同步对应
Okaleido上线聚变Mining模式,OKA通证当下产出的唯一方式
Pytoch learning notes -- Summary of common functions 3
电阻电路的等效变化(Ⅱ)
The second revolution of reporting tools
Zhaoqi Kechuang high-level innovation and Entrepreneurship Talent Service Platform at home and abroad, mass entrepreneurship and innovation achievement transformation platform
Leetcode - 707 design linked list (Design)
MySQL tutorial 66 data table query statement
Matlab -- CVX optimization kit installation
【服务器数据恢复】HP EVA服务器存储意外断电导致RAID信息丢失的数据恢复案例
共享锁(Shared Lock)
Wechat applet
How to solve cross domain problems
MySQL-自增锁
Boomi荣获“多元化最佳首席执行官奖”和“职业成长最佳公司奖”,在大型公司类别中跻身50强
30行自己写并发工具类(Semaphore, CyclicBarrier, CountDownLatch)