当前位置:网站首页>Pymongo saves data in dataframe format (insert_one, insert_many, multi-threaded saving)
Pymongo saves data in dataframe format (insert_one, insert_many, multi-threaded saving)
2022-07-25 17:26:00 【Cute Dai Ma】
Use Pymongo The basic method of saving data ( Additions and deletions ) Please refer to :Python Connect MongoDB, Use pymongo Add, delete, modify, etc
List of articles
1. The basic method : Save line by line
This is the most basic way to save , You can fine tune the data itself , Then save
from pymongo import MongoClient
import pandas as pd
import numpy as np
def get_coll(database, collection, host="127.0.0.1"):
""" Target database """
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def _save_or_update_mongodb(coll, dict_value):
""" According to the inspection _id, If it exists, cover , If it doesn't exist, add """
record = coll.find_one({
"_id": dict_value['_id']})
if not record:
coll.insert_one(dict_value)
else:
coll.update_one(record, {
"$set": dict_value,
})
def save_dataframe_to_mongo(dataframe):
coll = get_coll("test_db", "test_collection")
for index, series in dataframe.iterrows():
dict_value = series.to_dict()
dict_value.update({
"_id": index,
})
_save_or_update_mongodb(coll, dict_value)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(10, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
2. insert_many Save in bulk
You can save a batch of data at one time , Use insert_many Method can save data in batches
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
def get_coll(database, collection, host="127.0.0.1"):
""" Target database """
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record")
coll.insert_many(dict_list)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
3. Threading Multi thread save data
Pymongo Is multithread safe 、 Multi process unsafe , Therefore, you can use multithreading mode to save data recklessly , The sample code is as follows :
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
import threading
def get_coll(database, collection, host="127.0.0.1"):
""" Target database """
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
thread_list = []
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record") # Data to be saved
# Multithreading
thread = threading.Thread(target=coll.insert_many, args=(dict_list,))
thread.start()
thread_list.append(thread)
# Wait for all thread tasks to complete
for _thr in thread_list:
_thr.join()
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
边栏推荐
- Don't believe these "rumors" in the process of preparing for the exam!
- I2C通信——时序图
- HCIP笔记十一天
- Ultimate doll 2.0 | cloud native delivery package
- 8 年产品经验,我总结了这些持续高效研发实践经验 · 研发篇
- win10如何删除微软拼音输入法
- 约瑟夫环问题
- Enterprise live broadcast: witness focused products, praise and embrace ecology
- 我们被一个 kong 的性能 bug 折腾了一个通宵
- 什么是元宇宙Gamefi链游系统开发?Gamefi元宇宙NFT链游系统开发应用案例及分析
猜你喜欢
![[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part3): rule-based problem classification](/img/4c/aeebbc9698f8d5c23ed6473c9aca34.png)
[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part3): rule-based problem classification

失意的互联网人拼命叩开Web3大门

Mindoc makes mind map

jenkins的文件参数,可以用来上传文件

Outlook 教程,如何在 Outlook 中搜索日历项?

How to delete Microsoft Pinyin input method in win10

ACL 2022 | comparative learning based on optimal transmission to achieve interpretable semantic text similarity

Chapter III data types and variables

备考过程中,这些“谣言”千万不要信!

【解决方案】Microsoft Edge 浏览器 出现“无法访问该页面”问题
随机推荐
从数字化到智能运维:有哪些价值,又有哪些挑战?
Go语言系列:Go从哪里来,Go将去哪里?
方正期货网上开户靠谱吗,开户安全吗?
Lvgl 7.11 tileview interface cycle switching
如何看一本书
[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part4): problem analysis and retrieval sentence generation combined with problem classification
做智能硬件要考虑的产品生命周期
mindoc制作思维导图
Frustrated Internet people desperately knock on the door of Web3
Customize MVC project login registration and tree menu
四六级
Boring post roast about work and life
双向链表的基本操作
02. Add two numbers
[target detection] yolov5 Runtong visdrone data set
How to prevent the unburned gas when the city gas safety is alarmed again?
Don't believe these "rumors" in the process of preparing for the exam!
世界各地的标志性建筑物
Outlook 教程,如何在 Outlook 中搜索日历项?
Use huggingface to quickly load pre training models and datasets in moment pool cloud