DataOps framework for Machine Learning projects.

Overview

Noronha DataOps

noronha logo

Noronha is a Python framework designed to help you orchestrate and manage ML projects life-cycle.

It hosts Machine Learning models inside a portable, ready-to-use DataOps architecture, thus helping you benefit from DataOps and MLOps practices without having to change much of your usual work behavior.

The architecture consists of three components:

  • File storage: Artifactory, Nexus, Apache Cassandra
    • A raw storage of your choice is used to version ML assets, such as notebooks, datasets and model binaries. Noronha currently supports Artifactory (default) and Nexus for this task, while Apache Cassandra can be used only as model binary storage.
  • Metadata storage: MongoDB
    • Mongo document design is used to guide the framework while managing your ML project, therefore it cannot be swapped with other technologies.
  • Model router: NodeJS (optional)
    • A router can be set up to act as a single entrypoint for all your models. This is especially useful when deploying in Kubernetes, where service exposure comes into play.

These components are internally called isles, they can be used in native mode (managed by Noronha) or foreign mode (managed by user).

Prerequisites

To use the framework in its most basic configuration all you need is:

Getting started

pip install noronha-dataops
nha --debug --pretty get-me-started

After installing the framework, the command-line option noronha or nha becomes available. Every command has the --help option, use it constantly.

The get-me-started option will set up Artifactory and MongoDB instances in native mode.

--debug and --pretty help debugging and reading error messages. Use their short version instead: nha -d -p

3) Basic usage

Once you have successfully installed Noronha, start with the simplest project structure:

project_home:
+-- Dockerfile
+-- requirements.txt
+-- notebooks/
    +-- training.ipynb
    +-- predict.ipynb

This is what the Dockerfile may look like:

# default public base image for working inside Noronha
FROM noronhadataops/noronha:latest

# project dependencies installation
ADD requirements.txt .
RUN bash -c "source ${CONDA_HOME}/bin/activate ${CONDA_VENV}  && \
    conda install --file requirements.txt"

# deploying the project's code
ADD notebooks ./notebooks

From your project home folder, record it in Noronha and build a new image:

nha -d -p proj new --name my-first-proj --desc "Testing project" --home-dir .
nha -d -p proj build --tag develop

Then, run Jupyter Notebook interface for editing and testing code:

nha -d -p note --edit --tag develop --port 9090

--edit will mount your current directory into the container. This is useful if you want to edit code, test it and save it in the local machine (remember to be in the right directory when using this option).

--port host port that will be routed to the notebook's UI

Go to your browser and enter: http://localhost:9090/

Next steps

For fully-working project template and end-to-end tutorial, see the iris example.

For more information about Noronha and advanced usage of the framework, check readthedocs.

If you want to know how to run in Kubernetes, check this guide.

Report issues and request features

If you run into any problem or feel like there is some funcionality that should be added, please consider submiting an issue.

We also monitor Stack Overflow questions that use the tag: #noronha-dataops.

If you like mailing lists, here is our Google Groups: [email protected].

Contributing

Please read our contributing guide.

Comments
  • Extend kubernetes service support, refactor resource profiles, compatibility when deploying with sidecar

    Extend kubernetes service support, refactor resource profiles, compatibility when deploying with sidecar

    K8s services can be configure via resource profile Resource profiles can define whatever combination of request/limit Noronha Pods are now compatible with sidecar paradigm

    opened by g-crocker 1
  • Unfreeze mongoengine version

    Unfreeze mongoengine version

    mongoengine was frozen a while ago because it had a blocking issue.

    Now it has been resolved and Noronha install was failing due to conflict between mongoengine and setuptools.

    Tests with current mongoengine version run without problems.

    opened by g-crocker 0
  • Build is failing due to conflict between mongoengine and setuptools

    Build is failing due to conflict between mongoengine and setuptools

    Describe the bug

    Noronha install and build fails on enviroments running setuptools >= 58.0.0 due to old version of Mongo library being used (mongoengine==0.18.2)

    To Reproduce

    Create environment with setuptools >= 58.0.0 pip install noronha-dataops

    Log messages + Traceback

    Collecting mongoengine==0.18.2
      Downloading mongoengine-0.18.2.tar.gz (151 kB)
        ERROR: Command errored out with exit status 1:
         command: /etc/miniconda/envs/py3_default/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-_i6czwns/mongoengine_59f2504400a24032844b35daec24a8bc/setup.py'"'"'; __file__='"'"'/tmp/pip-install-_i6czwns/mongoengine_59f2504400a24032844b35daec24a8bc/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-gahpmg7n
             cwd: /tmp/pip-install-_i6czwns/mongoengine_59f2504400a24032844b35daec24a8bc/
        Complete output (1 lines):
        error in mongoengine setup command: use_2to3 is invalid.
        ----------------------------------------
    WARNING: Discarding https://files.pythonhosted.org/packages/a7/1c/0749992c7a2b6a2f1879ad44ba5285f907d2024838459b4cd635c5e5effd/mongoengine-0.18.2.tar.gz#sha256=fa3e73c966fca2b814cc1103ac4f55bcca7aae05028b112ef0cc8b321ee4a2f7 (from https://pypi.org/simple/mongoengine/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    ERROR: Could not find a version that satisfies the requirement mongoengine==0.18.2 (from versions: 0.4, 0.5, 0.5.1, 0.5.2, 0.6.1, 0.6.3, 0.6.4, 0.6.6, 0.6.7, 0.6.8, 0.6.10, 0.6.12, 0.6.13, 0.6.15, 0.6.16, 0.6.17, 0.6.19, 0.6.20, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.8, 0.7.9, 0.7.10, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.7.post2, 0.8.7.1, 0.8.8, 0.9.0, 0.10.0, 0.10.1, 0.10.4, 0.10.5, 0.10.6, 0.10.7, 0.10.9, 0.11.0, 0.12.0, 0.13.0, 0.14.3, 0.15.0, 0.15.3, 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.17.0, 0.18.0, 0.18.2, 0.19.0, 0.19.1, 0.20.0, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.23.1)
    ERROR: No matching distribution found for mongoengine==0.18.2
    
    

    Expected behavior

    Build and install finish successfully

    Environment

    • OS version: Ubuntu 20.04
    • Docker version: 19.03
    • Noronha version: 1.6.2
    bug 
    opened by g-crocker 0
  • Support MongoDB customization of service type in K8s

    Support MongoDB customization of service type in K8s

    Current behavior

    Today Noronha relies on NodePort service expousure in order to reach MongoDB when outside of the cluster (i.e.: in user host).

    Enhancement

    Since there are security issues regarding NodePort usage, it is interesting to allow users to parametrize which type of service exposure is needed (ClusterIP, NodePort, LoadBalancer).

    enhancement 
    opened by g-crocker 0
  • Support Artifactory usage through K8s Ingress and enable customization of service type

    Support Artifactory usage through K8s Ingress and enable customization of service type

    Current behavior

    Today Noronha relies on NodePort service expousure in order to reach Artifactory when outside of the cluster (i.e.: in user host).

    Enhancement

    Since Artifactory requests are all HTTP-based, access via Ingress would be a much better approach.

    It is also interesting to allow users to parametrize which type of service exposure is needed (ClusterIP, NodePort, LoadBalancer).

    enhancement 
    opened by g-crocker 0
  • Model publish fixes and post-training steps cleanup

    Model publish fixes and post-training steps cleanup

    Model version publish behaves correctly when executing in IDE.

    Training mongo document is correctly updated after model publish.

    Fixed example for lazy prediction.

    Moving Noronha to 1.6.1

    opened by g-crocker 0
  • Publishing model version from notebook IDE fails

    Publishing model version from notebook IDE fails

    Describe the bug

    When I train a new model inside the notebook IDE it fails to publish them, although the log messages indicate it successfully saved the model files.

    To Reproduce

    Start notebook IDE: nha -d -p note --edit --port 9090

    Train model, save output to tmp_path and call Publisher class:

    import joblib
    from noronha.tools.publish import Publisher
    from noronha.tools.shortcuts import data_path, tmp_path
    from sklearn import svm
    
    clf = svm.SVC(kernel=kernel, gamma=gamma)
    clf.fit(entries, classes)
    joblib.dump(clf, tmp_path('clf.pkl'))
    
    Publisher()(version_name='test', model_name='iris-clf')
    

    Log messages + Traceback

    Uploading file: clf.pkl
    
    ---------------------------------------------------------------------------
    DoesNotExist                              Traceback (most recent call last)
    <ipython-input-10-f0881f50ca51> in <module>
          1 joblib.dump(clf, tmp_path('clf.pkl'))
          2 
    ----> 3 Publisher()(version_name='test', model_name='iris-clf')
    
    /etc/miniconda/envs/py3_default/lib/python3.7/site-packages/noronha/tools/publish.py in __call__(self, src_path, details, version_name, model_name, uses_dataset, dataset_name, uses_pretrained, pretrained_with, lightweight)
        142         )
        143 
    --> 144         self.train.reload()
        145         self.train.update(mover=mv, ds=ds)
        146 
    
    /etc/miniconda/envs/py3_default/lib/python3.7/site-packages/mongoengine/document.py in reload(self, *fields, **kwargs)
        709 
        710         if self.pk is None:
    --> 711             raise self.DoesNotExist('Document does not exist')
        712 
        713         obj = self._qs.read_preference(ReadPreference.PRIMARY).filter(
    
    DoesNotExist: Document does not exist
    

    Expected behavior

    Sucessfully publish a new model version to Noronha.

    Environment

    • OS version: Ubuntu 16.04
    • Docker version: 19.03.13
    • K8s version: N/A, using Swarm
    • Noronha version: 1.6.0

    Additional context

    If I run a training using nha train new, the Publisher works fine.

    bug 
    opened by g-crocker 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • Allow secret injection as environment variable and file

    Allow secret injection as environment variable and file

    Use case

    Complex use cases require users to inject files or environment variables that contain sensitive information. Noronha does not provide any functionality to solve this.

    Feature description

    Allow users with previously created secrets to inject those into their Noronha-managed containers. Either as a file or an environment variable.

    feature 
    opened by g-crocker 0
  • Isle image is not pulled from Docker Hub

    Isle image is not pulled from Docker Hub

    Describe the bug

    When deploying an isle for the first time with nha get-me-started or nha isle <name> setup Noronha builds it locally instead of pulling from DockerHub.

    To Reproduce

    Make sure there is no isle image in your registry e.g.: docker images | grep nha-isle-artif. Run command to setup artifactory nha -d -p isle artif setup

    Expected behavior

    Pull image from DockerHub and tag/push to private registry if configured, then start isle container. Only build locally if flag --just-build is used.

    Environment

    • Noronha version: 1.6.2
    bug 
    opened by g-crocker 0
  • Support a file store isle that easily integrates with cloud native object storage

    Support a file store isle that easily integrates with cloud native object storage

    Use case

    While Noronha supports Artifactory, Nexus and Cassandra, when we think about cloud-native solutions the usage of cloud object storage is also very common. Being able to interact with a tools such as CEPH/Rookand MinIO would extend Noronha use-cases.

    Feature description

    Allow users to use a file store isle that is relying in cloud-native object storage.

    Personally, I believe adding support to MinIO first would be better, since it's easier to configure and interact with.

    CEPH seems a bit more challenging, although it might be more used by some projects.

    feature 
    opened by g-crocker 0
  • When an island setup deployment fails, the associated volumes are automatically deleted

    When an island setup deployment fails, the associated volumes are automatically deleted

    Describe the bug

    For example, when we run nha -d -p isle mongo setup and for any reason the deployment fails, Noronha will automatically revert the creation of associated volumes (in K8s, PVCs), even if those volumes were already there before the command was run.

    This can be a major issue if someone already has isles running in their cluster and need to reconfigure them (e.g.: update the resource profile), if for some reason the setup command fails, Noronha will remove existing PVCs without even prompting the user, completely erasing it's own database.

    Expected behavior

    1. Noronha should prompt the user if this action should be done
    2. The default option should be to leave the volumes there
    bug 
    opened by g-crocker 0
  • Include K8s NFS setup steps in host

    Include K8s NFS setup steps in host

    Missing Documentation

    Production guide

    Today the guide doesn't specify how to mount a K8s NFS into a Linux host.

    This is a requirement in order to use Noronha-managed Jupyter notebooks in edit mode while running in K8s, so it makes sense to add it to the docs.

    documentation 
    opened by g-crocker 0
Releases(v1.6.2)
Remote Desktop Protocol in Twisted Python

RDPY Remote Desktop Protocol in twisted python. RDPY is a pure Python implementation of the Microsoft RDP (Remote Desktop Protocol) protocol (client a

Sylvain Peyrefitte 1.6k Dec 30, 2022
Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App

Google Kubernetes Engine (GKE) with a Snyk Kubernetes controller installed/configured for Snyk App This example provisions a Google Kubernetes Engine

Pas Apicella 2 Feb 09, 2022
Blazingly-fast :rocket:, rock-solid, local application development :arrow_right: with Kubernetes.

Gefyra Gefyra gives Kubernetes-("cloud-native")-developers a completely new way of writing and testing their applications. Over are the times of custo

Michael Schilonka 352 Dec 26, 2022
A Simple script to hunt unused Kubernetes resources.

K8SPurger A Simple script to hunt unused Kubernetes resources. Release History Release 0.3 Added Ingress Added Services Account Adding RoleBindding Re

Yogesh Kunjir 202 Nov 19, 2022
A repository containing a short tutorial for Docker (with Python).

Docker Tutorial for IFT 6758 Lab In this repository, we examine the advtanges of virtualization, what Docker is and how we can deploy simple programs

Arka Mukherjee 0 Dec 14, 2021
🐳 RAUDI: Regularly and Automatically Updated Docker Images

🐳 RAUDI: Regularly and Automatically Updated Docker Images RAUDI (Regularly and Automatically Updated Docker Images) automatically generates and keep

SecSI 534 Dec 29, 2022
MicroK8s is a small, fast, single-package Kubernetes for developers, IoT and edge.

MicroK8s The smallest, fastest Kubernetes Single-package fully conformant lightweight Kubernetes that works on 42 flavours of Linux. Perfect for: Deve

Ubuntu 7.1k Jan 08, 2023
GitGoat enables DevOps and Engineering teams to test security products intending to integrate with GitHub

GitGoat is an open source tool that was built to enable DevOps and Engineering teams to design and implement a sustainable misconfiguration prevention strategy. It can be used to test with products w

Arnica 149 Dec 22, 2022
Ansible for DevOps examples.

Ansible for DevOps Examples This repository contains Ansible examples developed to support different sections of Ansible for DevOps, a book on Ansible

Jeff Geerling 6.6k Jan 08, 2023
Caboto, the Kubernetes semantic analysis tool

Caboto Caboto, the Kubernetes semantic analysis toolkit. It contains a lightweight Python library for semantic analysis of plain Kubernetes manifests

Michael Schilonka 8 Nov 26, 2022
A basic instruction for Kubernetes setup and understanding.

A basic instruction for Kubernetes setup and understanding Module ID Module Guide - Install Kubernetes Cluster k8s-install 3 Docker Core Technology mo

648 Jan 02, 2023
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Arie Bregman 35.1k Jan 02, 2023
ServerStatus 云探针、多服务器探针、云监控、多服务器云监控

ServerStatus 云探针、多服务器探针、云监控、多服务器云监控 基于ServerStatus-Hotaru膜改版的套娃膜改版(实际上本README也是抄它的)。 主要将client改为通过http提交数据,以及将服务端换成了php以便减小部署成本(PHP is the best!) 默认图片

shirakun 16 Apr 14, 2022
SSH to WebSockets Bridge

wssh wssh is a SSH to WebSockets Bridge that lets you invoke a remote shell using nothing but HTTP. The client connecting to wssh doesn't need to spea

Andrea Luzzardi 1.3k Dec 25, 2022
Rancher Kubernetes API compatible with RKE, RKE2 and maybe others?

kctl Rancher Kubernetes API compatible with RKE, RKE2 and maybe others? Documentation is WIP. Quickstart pip install --upgrade kctl Usage from lazycls

1 Dec 02, 2021
A little script and trick to make your heroku app run forever without being concerned about dyno hours.

A little script and trick to make your heroku app run forever without being concerned about dyno hours.

Tiararose Biezetta 152 Dec 25, 2022
Automate SSH in python easily!

RedExpect RedExpect makes automating remote machines over SSH very easy to do and is very fast in doing exactly what you ask of it. Based on ssh2-pyth

Red_M 19 Dec 17, 2022
Oncall is a calendar tool designed for scheduling and managing on-call shifts. It can be used as source of dynamic ownership info for paging systems like http://iris.claims.

Oncall See admin docs for information on how to run and manage Oncall. Development setup Prerequisites Debian/Ubuntu - sudo apt-get install libsasl2-d

LinkedIn 928 Dec 22, 2022
pyinfra automates infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployment, configuration management and more.

pyinfra automates/provisions/manages/deploys infrastructure super fast at massive scale. It can be used for ad-hoc command execution, service deployme

Nick Barrett 2.1k Dec 29, 2022
Webinar oficial Zabbix Brasil. Uma série de 4 aulas sobre API do Zabbix.

Repositório de scripts do Webinar de API do Zabbix Webinar oficial Zabbix Brasil. Uma série de 4 aulas sobre API do Zabbix. Nossos encontros [x] 04/11

Robert Silva 7 Mar 31, 2022