First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

Overview

dbt-osmosis

First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

Primary Objectives

Standardize organization of schema files (and provide ability to define and conform with code)

  • Config can be set on per directory basis if desired utilizing dbt_project.yml, all models which are processed require direct or inherited config +dbt-osmosis:. If even one dir is missing the config, we close gracefully and inform user to update dbt_project.yml. No assumed defaults. Placing our config under your dbt project name in models: is enough to set a default for the project since the config applies to all subdirectories.

    Note: You can change these configs as often as you like or try them all, dbt-osmosis will take care of restructuring your project schema files-- no human effort required.

    A directory can be configured to conform to any one of the following standards:

    • Can be one schema file to one model file sharing the same name and directory ie.

        staging/
            stg_order.sql
            stg_order.yml
            stg_customer.sql
            stg_customer.yml
      
      • +dbt-osmosis: "model.yml"
    • Can be one schema file per directory wherever model files reside named schema.yml, ie.

        staging/
            stg_order.sql
            stg_customer.sql
            schema.yml
      
      • +dbt-osmosis: "schema.yml"
    • Can be one schema file per directory wherever model files reside named after its containing folder, ie.

        staging/
            stg_order.sql
            stg_customer.sql
            staging.yml
      
      • +dbt-osmosis: "folder.yml"
    • Can be one schema file to one model file sharing the same name nested in a schema subdir wherever model files reside, ie.

        staging/
            stg_order.sql
            stg_customer.sql
            schema/
                stg_order.yml
                stg_customer.yml
      
      • +dbt-osmosis: "schema/model.yml"

Build and Inject Non-documented models

  • Injected models will automatically conform to above config per directory based on location of model file.

  • This means you can focus fully on modelling; and documentation, including yaml updates/yaml creation depending on your config, will automatically follow at any time with simple invocation of dbt-osmosis

Propagate existing column level documentation downward to children

  • Build column level knowledge graph accumulated and updated from furthest identifiable origin (ancestors) to immediate parents

  • Will automatically populate undocumented columns of the same name with passed down knowledge accumulated within the context of the models upstream dependency tree

  • This means you can freely generate models and all columns you pull in that already have been documented will be automatically learned/documented. Again the focus is fully on modelling and any yaml work is an afterthought.

Order Matters

In a full run we will:

  1. Conform dbt project
    • Configuration lives in dbt_project.yml --> we require our config to run, can be at root level of models: to apply a default convention to a project or can be folder by folder, follows dbt config resolution where config is overridden by scope. Config is called +dbt-osmosis: "folder.yml" | "schema.yml" | "model.yml" | "schema/model.yml"
  2. Bootstrap models to ensure all models exist
  3. Recompile Manifest
  4. Propagate definitions downstream to undocumented models solely within the context of the models dependency tree

New workflows enabled!

  1. Build one dbt model or a bunch of them without documenting anything (gasp)

    Run dbt-osmosis

    Automatically construct/update your schema yamls built with as much of the definitions pre-populated as possible from upstream dependencies

    Schema yaml is automatically built in exactly the right directories / style that conform to the configured standard upheld and enforced across your dbt project on a dir by dir basis automatically

    Configured using just the dbt_project.yml and +dbt-osmosis: configs

    boom, mic drop

  2. Problem reported by stakeholder with data (WIP)

    Identify column

    Run dbt-osmosis impact --model orders --column price

    Find the originating model and action

  3. Need to score our documentation (WIP)

    Run dbt-osmosis coverage --docs --min-cov 80

    Get a curated list of all the documentation to update in your pre-bootstrapped dbt project

    Sip coffee and engage in documentation

Comments
  • 'DbtYamlManager' object has no attribute 'yaml', No module named 'dbt_osmosis'

    'DbtYamlManager' object has no attribute 'yaml', No module named 'dbt_osmosis'

    Hi there--thanks for the great tool!

    I'm getting an AttributeError when running dbt-osmosis compose and dbt-osmosis run. See the stacktrace below. My dbt_project.yml is attached, as well.

    > dbt-osmosis run -f acdw.dimensions.acdw__dimensions__item 
    INFO     ๐ŸŒŠ Executing dbt-osmosis                                                                                                                                                                                                                                         main.py:78
    
    INFO     ๐Ÿ“ˆ Searching project stucture for required updates and building action plan                                                                                                                                                                                  osmosis.py:907
    INFO     ...building project structure mapping in memory                                                                                                                                                                                                              osmosis.py:885
    INFO     [('CREATE', '->', WindowsPath('C:/my_project_path/models/acdw/dimensions/schema/acdw__dimensions__item.yml'))]                                                                                                                         osmosis.py:1027
    INFO     ๐Ÿ‘ท Executing action plan and conforming projecting schemas to defined structure                                                                                                                                                                              osmosis.py:977
    INFO     ๐Ÿšง Building schema file acdw__dimensions__item.yml                                                                                                                                                                                                           osmosis.py:983
    Traceback (most recent call last):
      File "C:\Users\me\Anaconda3\envs\dbt\lib\runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "C:\Users\me\Anaconda3\envs\dbt\lib\runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "C:\Users\me\Anaconda3\envs\dbt\Scripts\dbt-osmosis.exe\__main__.py", line 7, in <module>
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\dbt_osmosis\main.py", line 89, in run
        if runner.commit_project_restructure_to_disk():
      File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\dbt_osmosis\core\osmosis.py", line 987, in commit_project_restructure_to_disk
        self.yaml.dump(structure.output, target)
    AttributeError: 'DbtYamlManager' object has no attribute 'yaml'
    

    Possibly relatedly, I'm not able to run dbt-osmosis workbench, either. Apparently, dbt_osmosis can't find itself.

    > dbt-osmosis workbench
    INFO     ๐ŸŒŠ Executing dbt-osmosis                                                                                                                                                                                                                                        main.py:406
    [...]
    2022-10-06 17:41:59.975 Pandas backend loaded 1.4.2
    2022-10-06 17:41:59.981 Numpy backend loaded 1.21.5
    2022-10-06 17:41:59.982 Pyspark backend NOT loaded
    2022-10-06 17:41:59.983 Python backend loaded
    2022-10-06 17:42:01.020 Uncaught app exception
    Traceback (most recent call last):
      File "C:\Users\me\Anaconda3\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 562, in _run_script
        exec(code, module.__dict__)
      File "C:\Users\me\Anaconda3\envs\dbt\Lib\site-packages\dbt_osmosis\app.py", line 17, in <module>
        from dbt_osmosis.core.osmosis import DEFAULT_PROFILES_DIR, DbtProject
    ModuleNotFoundError: No module named 'dbt_osmosis'
    

    dbt_project.zip

    opened by coryandrewtaylor 5
  • AttributeError: 'CompiledSqlNode' object has no attribute 'compiled_sql'

    AttributeError: 'CompiledSqlNode' object has no attribute 'compiled_sql'

    Hello, I'm getting following error : AttributeError: 'CompiledSqlNode' object has no attribute 'compiled_sql' Traceback: File "/home/dbt/venv_dbt/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 562, in _run_script exec(code, module.dict) File "/home/dbt/venv_dbt/lib/python3.10/site-packages/dbt_osmosis/app.py", line 327, in if compile_sql(state[RAW_SQL]) != state[COMPILED_SQL]: File "/home/dbt/venv_dbt/lib/python3.10/site-packages/dbt_osmosis/app.py", line 197, in compile_sql return ctx.compile_sql(sql) File "/home/dbt/venv_dbt/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 367, in compile_sql return self.get_compiled_node(sql).compiled_sql

    any ide how to fix that ? 
    
    opened by mencwelp 4
  • IPv6 issue

    IPv6 issue

    Good day,

    When trying to run the server within docker, I'm getting the error

    ERROR:    [Errno 99] error while attempting to bind on address ('::1', 8581, 0, 0): cannot assign requested address
    

    I'm guessing this is because the program was written on IPv6 instead of IPv4. I was wondering if it would be possible to support IPv4 as well.

    Thanks!

    opened by SBurwash 3
  • Can't run `workbench` - No souch file or directory: `streamlit`

    Can't run `workbench` - No souch file or directory: `streamlit`

    Hi, I installed the dbt-osmosis and tried to execute the dbt-osmosis workbench command but I get error that No such file or directory: 'streamlit'

    here is the full error msg:

    INFO     ๐ŸŒŠ Executing dbt-osmosis                                                            main.py:390
                                                                                                            
    Traceback (most recent call last):
      File "/Users/galpolak/Library/Python/3.9/bin/dbt-osmosis", line 8, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/Users/galpolak/Library/Python/3.9/lib/python/site-packages/dbt_osmosis/main.py", line 414, in workbench
        subprocess.run(
      File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'streamlit'
    
    opened by Nic3Guy 2
  • VS Code - Osmosis Execute DBT SQL feature failing

    VS Code - Osmosis Execute DBT SQL feature failing

    Hello and thanks for looking at my issue. When I click on this button after opening a model sql file in VS Code, image

    I get this error. image

    The actual path is c:\\Temp\git\au-nvg-business-intelligence\\dbt\navigate_bi\\dbt_project.yml. Because there are single backslashes in 2 spots (\au-nvg-xxx and \navigate_bi\xxxx), both are being translated to \a and \n. I don't know why a mixture of single and double backslashes are coming in the file name and path.

    I've had to put a temp fix in osmosis.py in line 181 to get this working. See project_dir being added with replace calls.

    args = PseudoArgs(
                threads=threads,
                target=target,
                profiles_dir=profiles_dir,
                #project_dir=project_dir,
                project_dir=project_dir.replace('\t', '\\t').replace('\n', '\\n').replace('\r', '\\r').replace('\a', '\\a'),
            )
    
    opened by MurugR 2
  • Error starting workbench

    Error starting workbench

    Encountering below page output (and on CLI as well) when executing dbt-osmosis workbench -m unique_model_name. Seems it's getting a none value for a file path, but I'm not sure where or how exactly. This is my first run of dbt-osmosis workbench.

    AttributeError: 'NoneType' object has no attribute 'patch_path'
    Traceback:
    File "C:\Users\myusername\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 554, in _run_script
        exec(code, module.__dict__)
    File "C:\Users\myusername\AppData\Local\Programs\Python\Python310\lib\site-packages\dbt_osmosis\app.py", line 345, in <module>
        Path(ctx.project_root) / ctx.get_patch_path(st.session_state[BASE_NODE]),
    File "C:\Users\myusername\AppData\Local\Programs\Python\Python310\lib\site-packages\dbt_osmosis\core\osmosis.py", line 179, in get_patch_path
        return Path(node.patch_path.split(FILE_ADAPTER_POSTFIX)[-1])
    
    opened by ghost 2
  • [feat] Optimize Workbench for 2 Usage Patterns

    [feat] Optimize Workbench for 2 Usage Patterns

    Lets reduce scope in our workbench to a hyper tailored experience for A) Creating a new model B) Editing an existing model

    There should (probably) be two separate but similar layouts for this.

    enhancement 
    opened by z3z1ma 2
  • AttributeError 'NoneType' object has no attribute 'current'

    AttributeError 'NoneType' object has no attribute 'current'

    I'm having this error :

        if SCHEMA_FILE.current is None:
    AttributeError: 'NoneType' object has no attribute 'current'
    

    Looking at the code in main.py and app.py, I've notices many things.

    Here is an extract of the content of the ยซ project ยป variable (dict) : {'name': 'MYPROJ', 'version': '1.0.1', 'project-root': 'my/path/MYPROJ/dbt', ....}

    But in the scipt, to get the name and root path of the project it's done like that : project.project_name instead of project.name project.project_root instead of project.project-root

    And SCHEMA_FILE is None because the proj variable, that should contain project.project_name => MYPROJvalue, has : my/path/MYPROJ/dbt. And node.get("package_name") = MYPROJ So the test comparing "node.get("package_name") == proj" is always False so SCHEMA_FILE is logically None.

    bug 
    opened by hcylia 2
  • [feat] case insensitivity

    [feat] case insensitivity

    In our environment people write everything in lower case including both the SQL and the YML. It looks like the SNOWFLAKE adapter consistently returns column names as upper case. Can you provide an option to ignore case in comparisons of column names between the YML version and the DB?

    I "fixed" this locally by changing this line to read c.name.lower() in the obvious place: https://github.com/z3z1ma/dbt-osmosis/blob/05ed698a059f855556a0ef59ab3c9cb578ce680e/src/dbt_osmosis/main.py#L438

    But you might instead want to keep that case and modify all the places where you compare columns to do a case insensitive comparison. Looks like you're using python sets for the different column lists. I don't know how you'd make them case insensitive, so just adding the lower() maybe as a configuration would be great.

    good first issue 
    opened by hbwhbw 2
  • [feat] SQFluff Interface Modification

    [feat] SQFluff Interface Modification

    TODO: Extend description

    I think our scope does not require us to limit ourselves to "models" and we should attach ourselves to the SQL itself. This simplifies the whole process as can be seen in this PR.

    opened by z3z1ma 1
  • Handle updating empty schema.yml files

    Handle updating empty schema.yml files

    DbtYamlManager.yaml_handler assumed that if a schema.yml file existed, it contained data. This caused a NoneType is not iterable exception when 1) reading an empty file, then 2) checking it for a key.

    opened by coryandrewtaylor 1
  • AttributeError: 'DbtOsmosis'

    AttributeError: 'DbtOsmosis'

    When a user runs dbt-osmosis diff -m some_model ... they are getting AttributeError: 'DbtOsmosis':

    dbt-osmosis diff -m some_model --project-dir /path/to/dbt/transformations --profiles-dir /path/to/dbt/transformations/profiles --target target0

    INFO     ๐ŸŒŠ Executing dbt-osmosis
                                                                                                                                                                                                                 
    INFO     Injecting macros, please wait...                                                                                                                                                         
    Traceback (most recent call last):
      File "/Users/myuser/.pyenv/versions/3.10.7/bin/dbt-osmosis", line 8, in <module>
        sys.exit(cli())
    .....
    AttributeError: 'DbtOsmosis' object has no attribute '_get_exec_node'. Did you mean: 'get_ref_node'?
    

    Note

    Please note that no error when running the dbt-osmosis workbench - that one runs nicely.

    Configuration:

    • dbt version: 1.2.0
    • python version: Python 3.10.7
    opened by dimitrios-ev 0
  • dbt-osmosis Public streamlit app is broken

    dbt-osmosis Public streamlit app is broken

    The sample streamlit app linked in the README returns an error.

    Broken link: https://z3z1ma-dbt-osmosis-srcdbt-osmosisapp-4y67qs.streamlit.app/

    Screenshot: Screen Shot 2022-12-05 at 18 57 29

    opened by sambradbury 0
  • Support Documentation of Struct Fields in BigQuery

    Support Documentation of Struct Fields in BigQuery

    Proposed Behavior

    When a user runs dbt-osmosis yaml document it will also pull in and preserve struct level documentation of fields in bigquery projects.

    Current behavior

    When there are struct fields in the schema.yml or source.yml the process removes any struct documentation that is specified as a field like name: struct.struct_field

    Specifications

    • dbt version: 1.2.0
    • python version: 3.8.10
    opened by brandon-segal 0
  • Using dbt query_comment breaks project registration

    Using dbt query_comment breaks project registration

    Symptoms: My dbt installation uses the query-comment config to add labels to BigQuery queries. In our dbt_project.yml we have

    query-comment:
      comment: "{{ query_comment(node) }}"
      job-label: true
    

    When trying to run a dbt-osmosis server serve...., the server throws a 500 Could not connect to Database error.

    Diagnosis: Trailing things through the code, I reached parse_project in osmosis.py. This inits the DB adapter, loads the manifest, and importantly calls save_macros_to_adapter. BUT... before we get to the save macros, the adapter setter fn calls _verify_connection which tries to query the DB before the adapter is fully initialised...

    Now the adapter errors in this chunk in the dbt-bigquery connection

    if (
                hasattr(self.profile, "query_comment")
                and self.profile.query_comment
                and self.profile.query_comment.job_label
            ):
                query_comment = self.query_header.comment.query_comment. 
                labels = self._labels_from_query_comment(query_comment)
    

    failing with 'NoneType' object has no attribute 'comment' on line 6 of the snippet. Now this code looks ugly - it's checking the contents of self.profile, then trying to access self.query-header - clearly a massive assumption is made here...

    So - we have a conflict. dbt-osmosis is trying to run a query without ensuring the adapter is completely configured - and dbt-bigquery is doing a really wonky test. I'm happy to PR a deferred verify process in dbt-osmosis, or if you consider this to be more of a dbt-bigquery bug, I'll raise it with them.

    opened by dh-richarddean 0
  • How do you run dbt-power-user with dbt-osmosis?

    How do you run dbt-power-user with dbt-osmosis?

    In this issue @z3z1ma you say that the preferred method of development is with the dbt-server, and in the server docs you imply that you can use the server with dbt-power-user. I've tried to stitch together the solution but am coming up short.

    Maybe you could provide an explanation for how to make the integration work and update the docs? Right now dbt-power-user points to this repo for how to set up a dbt REPL, so if you could make that part of the docs more clear I think it could help other people as well.

    opened by evanaze 1
  • Feature request: `dbt-osmosis yaml audit`

    Feature request: `dbt-osmosis yaml audit`

    I think it would be pretty useful to have a command to execute the audit standalone. For example this line dbt-osmosis yaml audit -f development.test would have the following output:

                 โœ… Audit Report                                                                                                                                                                                                               
                 -------------------------------                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                 Database: awsdatacatalog                                                                                                                                                                                                      
                 Schema: development                                                                                                                                                                                                 
                 Table: test                                                                                                                                                                                                          
                                                                                                                                                                                                                                               
                 Total Columns in Database: 3.0
                 Total Documentation Coverage: 100.0%        
                                                                                                                                                                                                                                               
                 Action Log:                                                                                                                                                                                                                   
                 Columns Added to dbt: 2                                                                                                                                                                                                     
                 Column Knowledge Inherited: 1                                                                                                                                                                                                 
                 Extra Columns Removed: 0   
    

    Then, one would be able to do whatever they want with it: post it as a comment in a PR or send it as a Slack message, etc

    opened by ireyna-modo 0
Releases(v0.7.6)
  • v0.7.6(Sep 8, 2022)

    This release includes the addition of a server which allows developers to interact with their dbt project independent of the RPC server and in a much more performant way. The interface is also greatly improved. This is intended to be used as a development server -- though it could scale horizontally in theory.

    The interface simply involves POST requests to either /run or /compile with a content type of text/plain containing your SQL with as much or as little dbt jinja as you would like. The result is an enum response of a dict containing top level key errors on failure, otherwise a result with top level key of result for compile endpoint and top level keys column_names, rows, compiled_sql, raw_sql for run endpoint.

    This integration is fully compatible with dbt power user datacoves fork here which adds extended functionality including model preview panel that can execute a dbt model and show results in a web view with a button or even more conveniently, with control+enter like in many SQL IDEs.

    Source code(tar.gz)
    Source code(zip)
    dbt-osmosis-0.7.6.tar.gz(40.36 KB)
    dbt_osmosis-0.7.6-py3-none-any.whl(38.73 KB)
  • v0.6.2(Aug 7, 2022)

  • v0.5.8(Jun 24, 2022)

Owner
Alexander Butler
I wear many hats
Alexander Butler
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Dec 31, 2022
Performance analysis of predictive (alpha) stock factors

Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour

Quantopian, Inc. 2.5k Jan 09, 2023
Python for Data Analysis, 2nd Edition

Python for Data Analysis, 2nd Edition Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media Buy

Wes McKinney 18.6k Jan 08, 2023
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
Statistical Rethinking course winter 2022

Statistical Rethinking (2022 Edition) Instructor: Richard McElreath Lectures: Uploaded Playlist and pre-recorded, two per week Discussion: Online, F

Richard McElreath 3.9k Dec 31, 2022
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Jan 04, 2023
Universal data analysis tools for atmospheric sciences

U_analysis Universal data analysis tools for atmospheric sciences Script written in python 3. This file defines multiple functions that can be used fo

Luis Ackermann 1 Oct 10, 2021
A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

PyUpBit CS490 Large Scale Data Analytics โ€” Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont

Hyeong Kyun (Daniel) Park 1 Jun 28, 2022
A columnar data container that can be compressed.

Unmaintained Package Notice Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore. During

944 Dec 09, 2022
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 01, 2021
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

SPCL 330 Dec 30, 2022
Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Thailand COVID-19 Cluster Data Extraction About Extract Clusters from Thailand Daily COVID-19 briefing PDF Download latest data Here. Data will be upd

Noppakorn Jiravaranun 5 Sep 27, 2021
Mining the Stack Overflow Developer Survey

Mining the Stack Overflow Developer Survey A prototype data mining application to compare the accuracy of decision tree and random forest regression m

1 Nov 16, 2021
Python reader for Linked Data in HDF5 files

Linked Data are becoming more popular for user-created metadata in HDF5 files.

The HDF Group 8 May 17, 2022
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
Convert tables stored as images to an usable .csv file

Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for

711 Dec 26, 2022