MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Application for easy configuration of swap file and swappiness priority in slackware and others linux distributions.

Swap File Program created with the objective of assisting in the configuration of swap file in Distributions such as Slackware. Required packages: pyt

Mauricio Ferrari 3 Aug 06, 2022
A dictionary that can be flattened and re-inflated

deflatable-dict A dictionary that can be flattened and re-inflated. Particularly useful if you're interacting with yaml, for example. Installation wit

Lucas Sargent 2 Oct 18, 2021
Backup a folder to an another folder by using mirror update method.

Mirror Update Backup Backup a folder to an another folder by using mirror update method. How to use Install requirement pip install -r requirements.tx

1 Nov 21, 2022
A python module to manipulate XCode projects

This module can read, modify, and write a .pbxproj file from an Xcode 4+ projects. The file is usually called project.pbxproj and can be found inside the .xcodeproj bundle. Because some task cannot b

Ignacio Calderon 1.1k Jan 02, 2023
Brainfuck rollup scaling experiment for fun

Optimistic Brainfuck Ever wanted to run Brainfuck on ethereum? Don't ask, now you can! And at a fraction of the cost, thanks to optimistic rollup tech

Diederik Loerakker 48 Dec 28, 2022
Color box that provides various colors‘ rgb decimal code.

colorbox Color box that provides various colors‘ rgb decimal code

1 Dec 07, 2021
Check username

Checker-Oukee Check username It checks the available usernames and creates a new account for them Doesn't need proxies Create a file with usernames an

4 Jun 05, 2022
Dependency injection lib for Python 3.8+

PyDI Dependency injection lib for python How to use To define the classes that should be injected and stored as bean use decorator @component @compone

Nikita Antropov 2 Nov 09, 2021
Two fast AUC calculation implementations for python

fastauc Two fast AUC calculation implementations for python: python-based is approximately 5X faster than the default sklearn.metrics.roc_auc_score()

Vsevolod Kompantsev 26 Dec 11, 2022
Finds price floor for every single attribute in a given collection

Solana Solanart Scanner Enjoy the Free Code Steps to run Download VS Code

Dalton Nisbett 19 Oct 20, 2022
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

Shreyas Ashtamkar 5 Oct 21, 2022
extract gene TSS/TES site form gencode/ensembl/gencode database GTF file and export bed format file.

GetTsite python Package extract gene TSS/TES site form gencode/ensembl/gencode database GTF file and export bed format file. Install $ pip install Get

laojunjun 7 Nov 21, 2022
Lock files using python and cmd

Python_Lock_Files Lock files using python and cmd license feel free to do whatever you want to with these files, i dont take any responsibility tho, u

1 Nov 01, 2021
Produce a simulate-able SDF of an arbitrary mesh with convex decomposition.

Mesh-to-SDF converter Given a (potentially nasty, nonconvex) mesh, automatically creates an SDF file that describes that object. The visual geometry i

Greg Izatt 22 Nov 23, 2022
Script to rename and resize folders of images

script to rename and resize folders of images

Tega Brain 2 Oct 29, 2021
Convert any-bit number to decimal number and vise versa.

2deci Convert any-bit number to decimal number and vise versa. --bit n to set bit to n --exp xxx to set expression to xxx --r to run reversely (from d

3 Sep 15, 2021
[P]ython [w]rited [B]inary [C]onverter

pwbinaryc [P]ython [w]rited [Binary] [C]onverter You have rights to: Modify the code and use it private (friends are allowed too) Make a page and redi

0 Jun 21, 2022
A clock app, which helps you with routine tasks.

Clock This app helps you with routine tasks. Alarm Clock Timer Stop Watch World Time (Which city you want) About me Full name: Matin Ardestani Age: 14

Matin Ardestani 13 Jul 30, 2022
VerSign: Easy Signature Verification in Python

VerSign: Easy Signature Verification in Python versign is a small Python package which can be used to perform verification of offline signatures. It a

Muhammad Saif Ullah Khan 3 Dec 01, 2022
EthTx - Ethereum transactions decoder

EthTx - Ethereum transactions decoder Installation pip install ethtx Requirements The package needs a few external resources, defined in EthTxConfig o

398 Dec 25, 2022