当前位置:网站首页>pymongo保存dataframe格式的数据(insert_one, insert_many, 多线程保存)
pymongo保存dataframe格式的数据(insert_one, insert_many, 多线程保存)
2022-07-25 15:46:00 【呆萌的代Ma】
使用Pymongo保存数据的基本方法(增删改查)请参考:Python连接MongoDB,使用pymongo进行增删改查
1. 基本方法: 逐行保存
这是最基本的保存方法,可以对数据本身做微调,然后保存
from pymongo import MongoClient
import pandas as pd
import numpy as np
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def _save_or_update_mongodb(coll, dict_value):
"""根据检查_id,如果存在就覆盖,如果不存在就新增"""
record = coll.find_one({
"_id": dict_value['_id']})
if not record:
coll.insert_one(dict_value)
else:
coll.update_one(record, {
"$set": dict_value,
})
def save_dataframe_to_mongo(dataframe):
coll = get_coll("test_db", "test_collection")
for index, series in dataframe.iterrows():
dict_value = series.to_dict()
dict_value.update({
"_id": index,
})
_save_or_update_mongodb(coll, dict_value)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(10, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
2. insert_many 批量保存
可以一次性保存一批数据,使用insert_many方法可以批量保存数据
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record")
coll.insert_many(dict_list)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
3. Threading 多线程保存数据
Pymongo是多线程安全、多进程不安全的,因此可以肆无忌惮的使用多线程模式保存数据,示例代码如下:
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
import threading
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
thread_list = []
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record") # 待保存数据
# 多线程
thread = threading.Thread(target=coll.insert_many, args=(dict_list,))
thread.start()
thread_list.append(thread)
# 等待全部线程任务执行完成
for _thr in thread_list:
_thr.join()
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
边栏推荐
- Leetcode - 677 key value mapping (Design)*
- Pytoch framework exercise (based on kaggle Titanic competition)
- LeetCode - 677 键值映射(设计)*
- Experimental reproduction of image classification (reasoning only) based on caffe resnet-50 network
- 通用测试用例写作规范
- IDEA—点击文件代码与目录自动同步对应
- SVD singular value decomposition derivation and application and signal recovery
- Basic usage of MFC thread afxbeginthread, passing multiple parameters
- Beyond compare 4 realizes class file comparison [latest]
- MySQL tutorial 65 data in MySQL operation table
猜你喜欢

Window system black window redis error 20creating server TCP listening socket *: 6379: listen: unknown error19-07-28

How matlab produces random complex sequences

「数字安全」警惕 NFT的七大骗局

报表工具的二次革命

Equivalent change of resistance circuit (Ⅱ)

Leetcode - 359 log rate limiter (Design)

Leetcode - 622 design cycle queue (Design)

Leetcode - 362 knock counter (Design)

如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?

Beyond Compare 4 实现class文件对比【最新】
随机推荐
推荐收藏,这或许是最全的类别型特征的编码方法总结
Componentization and modularization
Leetcode - 707 design linked list (Design)
mysql 隔离级别事务
Wechat applet
记得那两句话
ML - Speech - traditional speech model
Leetcode - 380 o (1) time to insert, delete and get random elements (design hash table + array)
Leetcode - 232 realize queue with stack (design double stack to realize queue)
面试8家公司,1周拿了5个offer,分享一下自己的心得
ML - Speech - Introduction to speech processing
Matlab simulation of BPSK modulation system (1)
Leetcode - 303 area and retrieval - array immutable (design prefix and array)
MySQL - Summary of common SQL statements
没错,请求DNS服务器还可以使用UDP协议
R语言使用gt包和gtExtras包漂亮地显示表格数据:gt_bar_plot函数和gt_plt_bar_pct函数可视化百分比条形图、原始数据的百分比条形、缩放后的数据的百分比条形、指定数据对齐宽度
【服务器数据恢复】HP EVA服务器存储意外断电导致RAID信息丢失的数据恢复案例
Leetcode - 641 design cycle double ended queue (Design)*
LeetCode - 359 日志速率限制器 (设计)
MySQL tutorial 68-as setting alias