πŸ¦‰Data Version Control | Git for Data & Models

Overview

DVC logo

Website β€’ Docs β€’ Blog β€’ Twitter β€’ Chat (Community & Support) β€’ Tutorial β€’ Mailing List

Release GHA Tests Code Climate Codecov Donate DOI

PyPI deb|pkg|rpm|exe Homebrew Conda-forge Chocolatey Snapcraft


Data Version Control or DVC is an open-source tool for data science and machine learning projects. Key features:

  1. Simple command line Git-like experience. Does not require installing and maintaining any databases. Does not depend on any proprietary online services.
  2. Management and versioning of datasets and machine learning models. Data is saved in S3, Google cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD RAID.
  3. Makes projects reproducible and shareable; helping to answer questions about how a model was built.
  4. Helps manage experiments with Git tags/branches and metrics tracking.

DVC aims to replace spreadsheet and document sharing tools (such as Excel or Google Docs) which are being used frequently as both knowledge repositories and team ledgers. DVC also replaces both ad-hoc scripts to track, move, and deploy different model versions; as well as ad-hoc data file suffixes and prefixes.

How DVC works

We encourage you to read our Get Started guide to better understand what DVC is and how it can fit your scenarios.

The easiest (but not perfect!) analogy to describe it: DVC is Git (or Git-LFS to be precise) & Makefiles made right and tailored specifically for ML and Data Science scenarios.

  1. Git/Git-LFS part - DVC helps store and share data artifacts and models, connecting them with a Git repository.
  2. Makefiles part - DVC describes how one data or model artifact was built from other data and code.

DVC usually runs along with Git. Git is used as usual to store and version code (including DVC meta-files). DVC helps to store data and model files seamlessly out of Git, while preserving almost the same user experience as if they were stored in Git itself. To store and share the data cache, DVC supports multiple remotes - any cloud (S3, Azure, Google Cloud, etc) or any on-premise network storage (via SSH, for example).

how_dvc_works

The DVC pipelines (computational graph) feature connects code and data together. It is possible to explicitly specify all steps required to produce a model: input dependencies including data, commands to run, and output information to be saved. See the quick start section below or the Get Started tutorial to learn more.

Quick start

Please read Get Started guide for a full version. Common workflow commands include:

Step Command
Track data
$ git add train.py
$ dvc add images.zip
Connect code and data by commands
$ dvc run -d images.zip -o images/ unzip -q images.zip
$ dvc run -d images/ -d train.py -o model.p python train.py
Make changes and reproduce
$ vi train.py
$ dvc repro model.p.dvc
Share code
$ git add .
$ git commit -m 'The baseline model'
$ git push
Share data and ML models
$ dvc remote add myremote -d s3://mybucket/image_cnn
$ dvc push

Installation

There are four options to install DVC: pip, Homebrew, Conda (Anaconda) or an OS-specific package. Full instructions are available here.

Snap (Snapcraft/Linux)

Snapcraft

snap install dvc --classic

This corresponds to the latest tagged release. Add --beta for the latest tagged release candidate, or --edge for the latest master version.

Choco (Chocolatey/Windows)

Chocolatey

choco install dvc

Brew (Homebrew/Mac OS)

Homebrew

brew install dvc

Conda (Anaconda)

Conda-forge

conda install -c conda-forge dvc

pip (PyPI)

PyPI

pip install dvc

Depending on the remote storage type you plan to use to keep and share your data, you might need to specify one of the optional dependencies: s3, gs, azure, oss, ssh. Or all to include them all. The command should look like this: pip install dvc[s3] (in this case AWS S3 dependencies such as boto3 will be installed automatically).

To install the development version, run:

pip install git+git://github.com/iterative/dvc

Package

deb|pkg|rpm|exe

Self-contained packages for Linux, Windows, and Mac are available. The latest version of the packages can be found on the GitHub releases page.

Ubuntu / Debian (deb)

sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
sudo apt-get update
sudo apt-get install dvc

Fedora / CentOS (rpm)

sudo wget https://dvc.org/rpm/dvc.repo -O /etc/yum.repos.d/dvc.repo
sudo yum update
sudo yum install dvc

Comparison to related technologies

  1. Git-annex - DVC uses the idea of storing the content of large files (which should not be in a Git repository) in a local key-value store, and uses file hardlinks/symlinks instead of copying/duplicating files.
  2. Git-LFS - DVC is compatible with any remote storage (S3, Google Cloud, Azure, SSH, etc). DVC also uses reflinks or hardlinks to avoid copy operations on checkouts; thus handling large data files much more efficiently.
  3. Makefile (and analogues including ad-hoc scripts) - DVC tracks dependencies (in a directed acyclic graph).
  4. Workflow Management Systems - DVC is a workflow management system designed specifically to manage machine learning experiments. DVC is built on top of Git.
  5. DAGsHub - This is a Github equivalent for DVC. Pushing Git+DVC based repositories to DAGsHub will produce in a high level project dashboard; including DVC pipelines and metrics visualizations, as well as links to any DVC-managed files present in cloud storage.

Contributing

Code Climate Donate

Contributions are welcome! Please see our Contributing Guide for more details.

Mailing List

Want to stay up to date? Want to help improve DVC by participating in our occasional polls? Subscribe to our mailing list. No spam, really low traffic.

Copyright

This project is distributed under the Apache license version 2.0 (see the LICENSE file in the project root).

By submitting a pull request to this project, you agree to license your contribution under the Apache license version 2.0 to this project.

Citation

DOI

Iterative, DVC: Data Version Control - Git for Data & Models (2020) DOI:10.5281/zenodo.012345.

Barrak, A., Eghan, E.E. and Adams, B. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects , in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2021. Hawaii, USA.

Comments
  • Reconsider gc implementation

    Reconsider gc implementation

    As pointed out in discussion in #1691, we should reconsider gc implementation. Currently, if called without any options, dvc will collect current branch dependencies and outputs checksums, and remove everything besides it. We can easily clear history of changes with this command. gc should be safer with default options. Straightforward implementation could get all outputs for all revisions in git repo and remove everything that is not on list.

    As pointed out by @Suor, this approach might be slow for repository with long history.

    enhancement p1-important ui research 
    opened by pared 73
  • support push/pull/metrics/gc, etc across different commits

    support push/pull/metrics/gc, etc across different commits

    Currently dvc metrics show can show metric values across different branches (-a) and different tags (-T). Can you consider supporting showing different metric values across different commits in the same branch?


    The background of this is (simplified example): say I'm currently training a model, where I'm changing a certain parameter, param1 (for instance, number of trees in a forest). The way I probably would like to work is to find a first value for param1, commit the current state, continue changing param1 and continue committing the successive states that I consider worth saving. At some point I would like to look back and identify the setup that gave me the best results.

    The way DVC currently works forces me to create a new branch/tag for each trial I want to keep track of, and this seems a bit overwhelming.

    Depending on how different the experiments I'm running are and their level of granularity I could decide how to keep track of them (new commits VS new branches/tags).

    Notes:

    • The example above is overly simplified and there are better ways of tuning specific models parameters. But this gets more complicated if I'm changing more stuff (model hyperparameters, data processing, features to use, etc).
    • If dvc were to support what I'm proposing here, an extra argument would probably be required to limit how many commits DVC would look back at. Otherwise it would show all the metric values since the beginning of the repo history, which can be unhelpful and messy.
    feature request p1-important research 
    opened by silverdna 71
  • Unexpected error - Adding files

    Unexpected error - Adding files

    Everytime that im trying to add some individuals files or complete directories the same unexpected error appears:

    > dvc add -v -R model
    DEBUG: Trying to spawn '['c:\\users\\luisfelipe_melo_mora\\appdata\\local\\programs\\python\\python37-32\\python.exe', 'C:\\Users\\luisfelipe_melo_mora\\AppData\\Local\\Programs\\Python\\Python37-32\\Scripts\\dvc', 'daemon', '-q', 'updater']'
    DEBUG: Spawned '['c:\\users\\luisfelipe_melo_mora\\appdata\\local\\programs\\python\\python37-32\\python.exe', 'C:\\Users\\luisfelipe_melo_mora\\AppData\\Local\\Programs\\Python\\Python37-32\\Scripts\\dvc', 'daemon', '-q',
    'updater']'
    ERROR: unexpected error - Already unlocked
    ------------------------------------------------------------
    Traceback (most recent call last):
      File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\main.py", line 48, in main
        cmd = args.func(args)
      File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\command\base.py", line 48, in __init__
        updater.check()
      File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\updater.py", line 54, in check
        self._with_lock(self._check, "checking")
      File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\dvc\updater.py", line 45, in _with_lock
        func()
      File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\flufl\lock\_lockfile.py", line 338, in __exit__
        self.unlock()
      File "c:\users\luisfelipe_melo_mora\appdata\local\programs\python\python37-32\lib\site-packages\flufl\lock\_lockfile.py", line 287, in unlock
        raise NotLockedError('Already unlocked')
    flufl.lock._lockfile.NotLockedError: Already unlocked
    ------------------------------------------------------------
    
    
    Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
    

    I have a remote configuration by SSH:

    ['remote "myssh"']
    url = ssh://domain:/path
    user = myuser
    port = 22
    ask_password = true
    [core]
    remote = myssh
    

    And here the version of dvc that im using:

    > dvc version
    DVC version: 0.69.0
    Python version: 3.7.4
    Platform: Windows-10-10.0.17134-SP0
    Binary: False
    Package: pip
    Cache: reflink - False, hardlink - True, symlink - False
    
    

    Thanks for your help!

    bug p0-critical 
    opened by luchoPipe87 69
  • ML experiments and hyperparameters tuning

    ML experiments and hyperparameters tuning

    UPDATE: Skip to https://github.com/iterative/dvc/issues/2799#issuecomment-650464000 for a summary and updated requirements, and https://github.com/iterative/dvc/issues/2799#issuecomment-652969635 for the beginning of the implementation discussion.

    Problem

    There are a lot of discussions on how to manage ML experiments with DVC. Today's DVC design allows ML experiments through Git-based primitives such as commits and branches. This works nicely for large ML experiments when code writing and testing required. However, this model is too heavy for the hyperparameters tuning stage when the user makes dozens of small, one-line changes in config or code. Users don't want to have dozens of Git-commits or branches.

    Requirements

    A lightweight abstraction needs to be created in DVC to support hyperparameters-like tiny experiments without Git-commits. Hyperparameters tunning stage can be considered as a separate user activity outside of Git workflow. But the result of this activity still needs to be managed by Git preferably by a single commit.

    High-level requirements to the hyperparameters tunning stage:

    1. Run. Run dozens of experiments without committing any results into Git while keeping track of all the experiments. Each of the experiments includes a small config change or code change (usually, 1-2 lines).
    2. Compare. A user should be able to compare two experiments: see diffs for code (and probably metrics)
    3. Visualize. A user should be able to see all the experiments results: metrics that were generated. It might be some table with metrics or a graph. CSV table needs to be supported for custom visualization.
    4. Propagate. Choose "the best" experiment (not necessarily the highest metrics) and propagate it to the workspace (bring all the config and code changes. Important: without retraining). Then it can be committed to Git. This is the final result of the current hyperparameter tunning stage. After that, the user can continue to work with a project in a regular Git workflow.
    5. Store. Some (or all) of the experiments might be still useful (in additional to "the best" one). A user should be able to commit them to the Git as well. Preferably in a single commit to keep the Git history clean.
    6. Clean. Not useful experiments should be removed with all the code and data artifacts that were created. A special subcommand of dvc gc might be needed.
    7. [*] Parallel. In some cases, the experiments can be run in parallel which aligns with DVC parallel execution plans: #2212, #755. This might not be implemented now (in the 1st version of this feature) but it is important to support parallel execution by this new lightweight abstraction.
    8. Group. Iterations of hyperparameters tuning might be not related to each other and need to be managed and visualized separately. Experiments need to be grouped somehow.

    What should NOT be covered by this feature?

    This feature is NOT about the hyperparameter grid-search. In most cases, hyperparameters tuning is done by users manually using "smart" assumptions and hypotheses about hyperparameter space. Grid-search can be implemented on top of this feature/command using bash for example.

    1. The ability to run the experiments from bash might be also a requirement for this feature request.

    Possible implementations

    This is an open question but many data scientists create directories for each of the experiments. In some cases, people create directories for a group of experiments and then experiments inside. We can use some of these ideas/practices to better align with users' experience and intuition.

    Actions

    This is a high-level feature request (epic). The requirements and an initial design need to be discussed and more feature requests need to be created. @iterative/engineering please share your feedback. Is something missing here?

    EDITED:

    Related issues

    #2379 https://github.com/iterative/dvc/issues/2532 #1018 can be relevant (?) Discussion

    feature request 
    opened by dmpetrov 68
  • Introduce hyper parameters and config

    Introduce hyper parameters and config

    For an ML experiment, it is important to know metrics as well as the parameters that were used in order to get the metrics. Today there is no training/processing parameter concept in DVC which creates a problem when a user needs to visualize an experiment for example in some UI.

    A common workaround is to track parameters as metrics. However, the meaning of metrics is different. All the UI tools (including dvc metrics diff) need to show deltas where deltas do not make sense to some types of params. For example, delta for learning rate might be ok to see (values are still better), but delta for a number of layers (32, 64 or 128) does not make sense, the same for not numeric params like strings.

    Also, config/parameters are a pre-requisite for experiment management (#2799 or CI/CD scenarios) when DVC (or other automation tools) need to change training regarding provided parameters.

    Another benefit of the "understanding" parameter - DVC can use this information during repro. For example, DVC can realize that a step process which depends on config file config.json should not be run despite the config file change because the metrics it uses were not changed.

    We need to introduce the experiment config file/parameters file with a fixed structure that DVC can understand.

    Open questions:

    1. Filename. config.json, dvcconfig.json, params.json.
    2. File format: json, text config/ini (name=value), Hydra, ... We can run a survey.
    3. How to track param dependency for stages. We can introduce a new type of dependency: param. If it is given then the stage depends on the file and on particular params values. Like dvc run -p learning_rate -p LL5_levels ....
    4. DVC should probably support groups of params. Param name pattern could be used : dvc run -p 'processing.*' ...
    feature request discussion product 
    opened by dmpetrov 59
  • store whole DAG in one DVC-file

    store whole DAG in one DVC-file

    I understand the merits of having multiple .dvc files for complex processes, but it would be just great to have the option to store the whole DAG in one Dvcfile!

    I feel it might help the overall readability of the structure

    feature request p2-medium research product 
    opened by Casyfill 56
  • Using dvc only for dataset management (e.g. no dvc run pipeline).

    Using dvc only for dataset management (e.g. no dvc run pipeline).

    I am dealing with a large hierarchical data set. One where artifacts are pulled from various directories to generate contiguous data sets that are then fed to ML processes downstream. I don't want to use dvc to reproduce the pipeline, at least not yet. My needs are rather to be able to version the overall image dataset hierarchy, for the purpose of manual inspection of the whole hierarchy and moving images into groups or removing them altogether when necessary.

    This enables folks with less ML expertise control the data set they want to build by grouping the content together that they want to pick up when generating the data set. The data set is not a list of images, rather it is a list of lower dimensional feature vectors extracted from those images.

    I'm finding dvc taking a potentially unreasonable amount of time to just add and commit. Perhaps I don't understand what I'm doing or haven't set my expectations correctly.

    I wanted to keep these operations small in order to ensure things were working well. I have done the following. I have approximately 300K in total in this set right now.

    1. store 60K images on local file system, under the data/ directory.
    2. dvc add data/
    3. dvc push -r remote. I forgot to commit here since things took so long and I wanted to see if pushing worked.
    4. store 120K additional images to another sub directory under the data/ directory.
    5. dvc add data/ -> goes through all of the files in data/ regardless. I ran -v here and showed the previous files.
    6. dvc push -r remote.
    7. dvc commit. Here dvc is taking the greater amount of 99% of system memory (13 GB) and appears to be causing disk thrashing. It's been running nearly for a day so far.

    I am just looking for some guidance in managing a dataset of this nature using dvc in a way that will not eat up so much time, disk, compute, etc. If I'm doing something suboptimal, then I want to shine some light on that.

    question performance research 
    opened by JoeyCarson 54
  • add: --to-remote needed? OR --external needed?

    add: --to-remote needed? OR --external needed?

    Follow up to https://github.com/iterative/dvc/pull/5198#issuecomment-774299750, #5301, and https://github.com/iterative/dvc.org/pull/2172#discussion_r573963049:

    Question

    add --to-remote is a bit strange because normally add doesn't move target data, rather tracks it in-place (analog to git add). But --to-remote implies that external data will be moved into the workspace at some point, which we skip for now but "pre-push" (transfer) it to remote storage (for later pull/fetch).

    As of now add --to-remote has a similar result to get-url + add + push + remove, gc. So OK, maybe it's nice to have a shortcut to all that, but we already have import-url (--to-remote) to achieve the same.

    The only difference vs. importing is that the data source is not recorded as a dependency in the .dvc file. So you can't update it or unfreeze+repro it. However I don't see any use cases where you would want to prevent the .dvc from having this dep, as you can simply never update or unfreeze it.

    TLDR: I think import-url --to-remote is enough and what we should recommend for these situations. And add --to-remote breaks the Git analogy. Cc @dberenbaum

    Improvement

    • [x] But if we keep it, an improvement would be to NOT require the --external flag with it (cc @isidentical). This saves the user from typing a flag that is always needed, but also make sense since the data is not actually being treated as external in the sense that it won't be tracked/controlled in it's original location (requiring external cache, etc.).

    • [x] Finish or close iterative/dvc.org/pull/2172 when this is decided.
    enhancement discussion product 
    opened by jorgeorpinel 47
  • new command to list data artifacts in a DVC project

    new command to list data artifacts in a DVC project

    Especially useful for "browsing" external DVC projects on Git hosting before using dvc get or dvc import. Looking at the Git repo doesn't show the artifacts because they're only referenced in DVC-files (which can be found anywhere), not tracked by Git.

    Perhaps dvc list or dvc artifacts? (Or/and both dvc get list and dvc import list)

    As mentioned in https://github.com/iterative/dvc.org/pull/611#discussion_r324998285 and other discussions.


    UPDATE: Proposed spec (from https://github.com/iterative/dvc/issues/2509#issuecomment-533019513):

    usage: dvc list [-h] [-q | -v] [--recursive [LEVEL]] [--rev REV | --versions]
                    url [target [target ...]]
    
    positional arguments:
      url         URL of Git repository with DVC project to download from.
      target      Paths to DVC-files or directories within the repository to list outputs
                  for.
    

    UPDATE: Don't forget to update docs AND tab completion scripts when this is implemented.

    feature request p1-important c8-full-day 
    opened by jorgeorpinel 45
  • Incremental processing or streaming in micro-batches

    Incremental processing or streaming in micro-batches

    It seems like it is only possible to replace a dataset entirely and then re-run the analysis. Incremental processing would enable more efficient processing by avoiding recomputation. Here's how Pachyderm does it.

    enhancement feature request p2-medium research 
    opened by kskyten 44
  • dvc/dagascii: Use pager instead of AsciiCanvas._do_draw

    dvc/dagascii: Use pager instead of AsciiCanvas._do_draw

    Uses Stdlib's pydoc to draw the output in the interactive mode while doing e.g. dvc pipeline show ...

    Fixes #2807

    • [x] ❗ Have you followed the guidelines in the Contributing to DVC list?

    • [x] πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

    • [x] ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.

    Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

    Related MR: https://github.com/iterative/dvc.org/pull/831

    opened by xliiv 43
  • cloud versioning: fails with cache: false outputs

    cloud versioning: fails with cache: false outputs

    Bug Report

    Description

    It looks like cloud versioning is failing to push when stage outputs are marked as cache: false. Might be related to https://github.com/iterative/dvc/issues/4428.

    Reproduce

    Set up a cloud-versioned remote for https://github.com/iterative/example-get-started and push to it.

    Here's the output:

    $ dvc push -v
    2023-01-04 15:09:02,716 DEBUG: indexing latest worktree for 'dave-sandbox-versioning/example-get-started/remote'
    2023-01-04 15:09:03,269 DEBUG: Pushing worktree changes to 'dave-sandbox-versioning/example-get-started/remote'
    2023-01-04 15:09:03,658 ERROR: unexpected error - ('eval', 'live', 'plots')
    ------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/dave/Code/dvc/dvc/cli/__init__.py", line 185, in main
        ret = cmd.do_run()
      File "/Users/dave/Code/dvc/dvc/cli/command.py", line 22, in do_run
        return self.run()
      File "/Users/dave/Code/dvc/dvc/commands/data_sync.py", line 59, in run
        processed_files_count = self.repo.push(
      File "/Users/dave/Code/dvc/dvc/repo/__init__.py", line 48, in wrapper
        return f(repo, *args, **kwargs)
      File "/Users/dave/Code/dvc/dvc/repo/push.py", line 50, in push
        pushed += _push_worktree(
      File "/Users/dave/Code/dvc/dvc/repo/push.py", line 117, in _push_worktree
        return push_worktree(repo, remote, targets=targets, **kwargs)
      File "/Users/dave/Code/dvc/dvc/repo/worktree.py", line 148, in push_worktree
        _update_out_meta(out, repo.index.data[workspace])
      File "/Users/dave/Code/dvc/dvc/repo/worktree.py", line 179, in _update_out_meta
        entry = index[key]
      File "/Users/dave/Code/dvc-data/src/dvc_data/index/index.py", line 179, in __getitem__
        return self._trie[key]
      File "/Users/dave/miniforge3/envs/dvc/lib/python3.10/site-packages/pygtrie.py", line 859, in __getitem__
        node, _ = self._get_node(key_or_slice)
      File "/Users/dave/miniforge3/envs/dvc/lib/python3.10/site-packages/pygtrie.py", line 552, in _get_node
        raise KeyError(key)
    KeyError: ('eval', 'live', 'plots')
    ------------------------------------------------------------
    2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/.C898YREazrnPXGYsWbFsib.tmp'
    2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/.C898YREazrnPXGYsWbFsib.tmp'
    2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/.C898YREazrnPXGYsWbFsib.tmp'
    2023-01-04 15:09:04,223 DEBUG: Removing '/private/tmp/example-get-started/.dvc/cache/.DGQMVLCJsyvDKpKuGC3DdL.tmp'
    2023-01-04 15:09:04,225 DEBUG: Version info for developers:
    DVC version: 2.38.2.dev23+ga24c38967.d20230104
    ---------------------------------
    Platform: Python 3.10.2 on macOS-13.1-arm64-arm-64bit
    Subprojects:
            dvc_data = 0.28.5.dev1+ge0d19ab
            dvc_objects = 0.14.0
            dvc_render = 0.0.17
            dvc_task = 0.1.9
            dvclive = 1.3.1
            scmrepo = 0.1.5
    Supports:
            azure (adlfs = 2022.9.1, knack = 0.9.0, azure-identity = 1.7.1),
            gdrive (pydrive2 = 1.15.0),
            gs (gcsfs = 2022.11.0),
            hdfs (fsspec = 2022.11.0+18.g0c55724.dirty, pyarrow = 7.0.0),
            http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
            https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
            oss (ossfs = 2021.8.0),
            s3 (s3fs = 2022.11.0+6.g804057f, boto3 = 1.24.59),
            ssh (sshfs = 2022.6.0),
            webdav (webdav4 = 0.9.4),
            webdavs (webdav4 = 0.9.4),
            webhdfs (fsspec = 2022.11.0+18.g0c55724.dirty)
    Cache types: reflink, hardlink, symlink
    Cache directory: apfs on /dev/disk3s1s1
    Caches: local
    Remotes: https, s3
    Workspace directory: apfs on /dev/disk3s1s1
    Repo: dvc, git
    
    Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
    2023-01-04 15:09:04,225 DEBUG: Analytics is disabled.
    
    bug p1-important A: run-cache A: pipelines A: data-sync A: cloud-versioning 
    opened by dberenbaum 0
  • add --external: fails using Azure remote

    add --external: fails using Azure remote

    Bug Report

    Description

    I am trying to track existing data from a storage account in Azure following current documentation.

    Reproduce

    1. dvc init
    2. dvc remote add azcore azure://core-container
    3. dvc remote add azdata azure://data-container
    4. dvc add --external remote://azdata/existing-data

    Expected

    I'm not sure what is expected but the output is:

    ERROR: unexpected error - : 'azure'
    

    Environment information

    Output of dvc doctor:

    DVC version: 2.38.1 (pip)
    ---------------------------------
    Platform: Python 3.9.6 on macOS-13.1-x86_64-i386-64bit
    Subprojects:
    	dvc_data = 0.28.4
    	dvc_objects = 0.14.0
    	dvc_render = 0.0.15
    	dvc_task = 0.1.8
    	dvclive = 1.3.1
    	scmrepo = 0.1.4
    Supports:
    	azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
    	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
    	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
    Cache types: reflink, hardlink, symlink
    Cache directory: apfs on /dev/disk1s5s1
    Caches: local
    Remotes: azure, azure
    Workspace directory: apfs on /dev/disk1s5s1
    Repo: dvc, git
    

    Additional Information:

    2023-01-04 18:58:46,616 ERROR: unexpected error - : 'azure'
    ------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/odbmgr.py", line 65, in __getattr__
        return self._odb[name]
    KeyError: 'azure'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
        ret = cmd.do_run()
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
        return self.run()
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/commands/add.py", line 53, in run
        self.repo.add(
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/utils/collections.py", line 164, in inner
        result = func(*ba.args, **ba.kwargs)
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
        return f(repo, *args, **kwargs)
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
        return method(repo, *args, **kw)
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/repo/add.py", line 190, in add
        stage.save(merge_versioned=True)
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 469, in save
        self.save_outs(
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 512, in save_outs
        out.save()
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/output.py", line 643, in save
        self.odb,
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/output.py", line 450, in odb
        odb = getattr(self.repo.odb, odb_name)
      File "/Users/rmllopes/dev/auto-document-validation-ai/.venv/lib/python3.9/site-packages/dvc/odbmgr.py", line 67, in __getattr__
        raise AttributeError from exc
    AttributeError
    ------------------------------------------------------------
    2023-01-04 18:58:46,711 DEBUG: Version info for developers:
    DVC version: 2.38.1 (pip)
    ---------------------------------
    Platform: Python 3.9.6 on macOS-13.1-x86_64-i386-64bit
    Subprojects:
    	dvc_data = 0.28.4
    	dvc_objects = 0.14.0
    	dvc_render = 0.0.15
    	dvc_task = 0.1.8
    	dvclive = 1.3.1
    	scmrepo = 0.1.4
    Supports:
    	azure (adlfs = 2022.11.2, knack = 0.10.1, azure-identity = 1.12.0),
    	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
    	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
    Cache types: <https://error.dvc.org/no-dvc-cache>
    Caches: local
    Remotes: azure, azure
    Workspace directory: apfs on /dev/disk1s5s1
    Repo: dvc, git
    
    Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
    2023-01-04 18:58:46,714 DEBUG: Analytics is enabled.
    2023-01-04 18:58:46,911 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/st/05s6bkj55r9cw3hbrrdfvfqh0000gp/T/tmpoxhcmxev']'
    2023-01-04 18:58:46,913 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/st/05s6bkj55r9cw3hbrrdfvfqh0000gp/T/tmpoxhcmxev']'
    
    opened by rmlopes 0
  • external outputs: broken if pipeline output doesn't exist during stage initialization

    external outputs: broken if pipeline output doesn't exist during stage initialization

    Bug Report

    Description

    S3 external outputs are broken for pipelines since https://github.com/iterative/dvc/commit/7211bd02eda74d5434f1b7996647f7027e6e83b0 because of a bug in s3fs (and probably in other filesystems). They will only break if running a stage for which an output doesn't already exist. When initializing the stage, DVC will try to remove the nonexistent output and raise a FileNotFound error.

    Reproduce

    dvc repro will break if there is an external output and that output does not exist yet.

    In a new repo, using some <s3_path> that doesn't exist yet, do this:

    $ echo 'foo' > foo
    $ dvc stage add --external -n foo -d foo -O <s3_path> 'aws s3 cp params.yaml <s3_path>'
    $ dvc repro -v
    

    Expected

    dvc repro shouldn't fail while removing outputs. In this case, it fails because of what seems like a bug or at least inconsistent behavior in fsspec. Like mentioned in https://github.com/iterative/dvc/issues/5961#issuecomment-1365822275, output.remove for s3fs and other async filesystems calls _expand_path. When the path doesn't exist and recursive=True, _expand_path raises FileNotFoundError. When recursive=False, it returns the path. It also returns the path for the LocalFileSystem regardless of whether recursive=True, so not sure if it was intended to raise an error only for this specific scenario.

    regression 
    opened by dberenbaum 1
  • update: inconsistency between `--no-download` and `--to-remote`

    update: inconsistency between `--no-download` and `--to-remote`

    Seems like the behavior between --no-download and --to-remote is inconsistent. We can fix in this PR or follow up with another one. For --to-remote, the outs metadata is updated with the new info but the workspace remains untouched, while --no-download drop the outs metadata and deletes anything in the workspace.

    Originally posted by @dberenbaum in https://github.com/iterative/dvc/issues/8752#issuecomment-1369173097

    p3-nice-to-have A: data-sync 
    opened by dberenbaum 0
  • dvc add is stuck on Adding ... for ~20 hours

    dvc add is stuck on Adding ... for ~20 hours

    I'm trying to version control my 210 G data which contains 2.41M files. When I run

    dvc -v add data_clean/                                                        
    Adding...
    

    It stuck here for 20 hours. Is it supposed to happen?

    My DVC repository is present in the GCE instance.

    Thanks

    awaiting response performance 
    opened by mehadi92 1
Releases(2.38.1)
  • 2.38.1(Dec 15, 2022)

  • 2.38.0(Dec 14, 2022)

    What's Changed

    πŸš€ New Features and Enhancements

    • exp: Generate a human-readable name beforehand. by @daavoo in https://github.com/iterative/dvc/pull/8659

    πŸ› Bug Fixes

    • Reset all indices on the brancher iteration by @shcheklein in https://github.com/iterative/dvc/pull/8679

    πŸ”¨ Maintenance

    • build(deps-dev): Bump filelock from 3.8.0 to 3.8.2 by @dependabot in https://github.com/iterative/dvc/pull/8666
    • build(deps-dev): Bump pylint from 2.15.7 to 2.15.8 by @dependabot in https://github.com/iterative/dvc/pull/8661
    • build(deps-dev): Bump dvc-task from 0.1.6 to 0.1.8 by @dependabot in https://github.com/iterative/dvc/pull/8686

    Full Changelog: https://github.com/iterative/dvc/compare/2.37.0...2.38.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.38.0-1.x86_64.rpm(132.94 MB)
    dvc-2.38.0.exe(53.18 MB)
    dvc-2.38.0.pkg(103.07 MB)
    dvc_2.38.0_amd64.deb(133.89 MB)
  • 2.37.0(Dec 9, 2022)

    What's Changed

    πŸ› Bug Fixes

    • worktree: fix default worktree remote/odb exception by @pmrowla in https://github.com/iterative/dvc/pull/8672

    πŸ”¨ Maintenance

    • deps: bump dvc-data to 0.28.4 by @pmrowla in https://github.com/iterative/dvc/pull/8674

    Other Changes

    • dvc update: support worktree update by @pmrowla in https://github.com/iterative/dvc/pull/8649
    • remote: disable gc/status for versioned remotes by @pmrowla in https://github.com/iterative/dvc/pull/8662
    • cloud versioning: push/fetch behavior cleanup by @pmrowla in https://github.com/iterative/dvc/pull/8667
    • push/fetch: cleanup cloud versioning CLI flags behavior by @pmrowla in https://github.com/iterative/dvc/pull/8673
    • deps: remove 3.11 checks for hydra; has 3.11 support now by @skshetry in https://github.com/iterative/dvc/pull/8677

    Full Changelog: https://github.com/iterative/dvc/compare/2.36.0...2.37.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.37.0-1.x86_64.rpm(132.91 MB)
    dvc-2.37.0.exe(53.16 MB)
    dvc-2.37.0.pkg(103.04 MB)
    dvc_2.37.0_amd64.deb(133.86 MB)
  • 2.36.0(Dec 1, 2022)

    What's Changed

    πŸš€ New Features and Enhancements

    • Solve the locking problem in temp and celery dir executor initialization. by @karajan1001 in https://github.com/iterative/dvc/pull/8623
    • exp: Expose baseline and name via run_env. by @daavoo in https://github.com/iterative/dvc/pull/8630
    • exp save: initial implementation by @daavoo in https://github.com/iterative/dvc/pull/8599
    • feat: top level params and metrics by @skshetry in https://github.com/iterative/dvc/pull/8529

    πŸ› Bug Fixes

    • index: skip data index load on empty view by @pmrowla in https://github.com/iterative/dvc/pull/8632
    • Solve the unexpected error at the end of the queued tasks running by @karajan1001 in https://github.com/iterative/dvc/pull/8640
    • plots: fix multi-file plots by @dberenbaum in https://github.com/iterative/dvc/pull/8639
    • stage add: don't fail if unable to create .gitignore by @dberenbaum in https://github.com/iterative/dvc/pull/8644

    πŸ”¨ Maintenance

    • deps: add support for hdfs in Python 3.11 by @skshetry in https://github.com/iterative/dvc/pull/8627
    • exp list: cleanup and move logic inside repo api by @shcheklein in https://github.com/iterative/dvc/pull/8575
    • deps: bump dvc-data to 0.28.1 by @pmrowla in https://github.com/iterative/dvc/pull/8633
    • deps: bump dvc-data to 0.28.2 by @pmrowla in https://github.com/iterative/dvc/pull/8641
    • build(deps-dev): Bump pylint from 2.15.5 to 2.15.7 by @dependabot in https://github.com/iterative/dvc/pull/8643
    • deps: bump dvc-data to 0.28.3 by @pmrowla in https://github.com/iterative/dvc/pull/8648

    Other Changes

    • remote: separate worktree vs version_aware behavior by @pmrowla in https://github.com/iterative/dvc/pull/8634

    Full Changelog: https://github.com/iterative/dvc/compare/2.35.2...2.36.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.36.0-1.x86_64.rpm(132.87 MB)
    dvc-2.36.0.exe(53.14 MB)
    dvc-2.36.0.pkg(103.00 MB)
    dvc_2.36.0_amd64.deb(133.81 MB)
  • 2.35.2(Nov 24, 2022)

  • 2.35.0(Nov 23, 2022)

    What's Changed

    πŸš€ New Features and Enhancements

    • ui: Fix WSL check in open_browser by @daavoo in https://github.com/iterative/dvc/pull/8604

    πŸ”¨ Maintenance

    • build: fpm: don't create .build-id/* files by @efiop in https://github.com/iterative/dvc/pull/8611

    Other Changes

    • worktree push: do not push existing versions by @pmrowla in https://github.com/iterative/dvc/pull/8606
    • testing: api: test opening a file in subdir by @efiop in https://github.com/iterative/dvc/pull/8610

    Full Changelog: https://github.com/iterative/dvc/compare/2.34.3...2.35.0

    Source code(tar.gz)
    Source code(zip)
  • 2.34.3(Nov 22, 2022)

    What's Changed

    πŸ› Bug Fixes

    • Fix exp list ref heads handling by @shcheklein in https://github.com/iterative/dvc/pull/8554
    • parsing: Escape str interpolation in dict unpacking. by @daavoo in https://github.com/iterative/dvc/pull/8204
    • hydra: Use OmegaConf.to_yaml for dumping .yaml output. by @daavoo in https://github.com/iterative/dvc/pull/8587
    • queue kill: we can manually mark problematic tasks as failure by @karajan1001 in https://github.com/iterative/dvc/pull/8580
    • Solve the wrong checkpoint tip info during executor running by @karajan1001 in https://github.com/iterative/dvc/pull/8596

    πŸ”¨ Maintenance

    • build(deps-dev): Bump dvc-render from 0.0.12 to 0.0.13 by @dependabot in https://github.com/iterative/dvc/pull/8568
    • build(deps-dev): Bump dvc-render from 0.0.13 to 0.0.14 by @dependabot in https://github.com/iterative/dvc/pull/8591
    • deps: bump dvc-data, dvc-azure by @pmrowla in https://github.com/iterative/dvc/pull/8594
    • deps: bump dvc-data to 0.28.0 by @pmrowla in https://github.com/iterative/dvc/pull/8605

    Other Changes

    • deps: bump dvc-data to 0.26.0 by @efiop in https://github.com/iterative/dvc/pull/8566
    • import-url: disable push by default for cloud-versioned imports by @pmrowla in https://github.com/iterative/dvc/pull/8578
    • plots: data conversion: adjust for viewer backend by @pared in https://github.com/iterative/dvc/pull/8421
    • worktree: support push: false by @pmrowla in https://github.com/iterative/dvc/pull/8581
    • worktree add: preserve version metadata for unmodified files on dvc add by @pmrowla in https://github.com/iterative/dvc/pull/8595
    • plots: set default x label by @dberenbaum in https://github.com/iterative/dvc/pull/8589

    Full Changelog: https://github.com/iterative/dvc/compare/2.34.2...2.34.3

    Source code(tar.gz)
    Source code(zip)
    dvc-2.34.3-1.x86_64.rpm(131.53 MB)
    dvc-2.34.3.exe(52.90 MB)
    dvc-2.34.3.pkg(102.05 MB)
    dvc_2.34.3_amd64.deb(132.42 MB)
  • 2.34.2(Nov 15, 2022)

    What's Changed

    πŸ› Bug Fixes

    • hydra: Raise error when name and sweeps. by @daavoo in https://github.com/iterative/dvc/pull/8556
    • fetch/pull: fix regression when using targeted fetch in repo containing import-url imports by @pmrowla in https://github.com/iterative/dvc/pull/8551

    πŸ”¨ Maintenance

    • pyinstaller: use pydrive2 package hooks by @pmrowla in https://github.com/iterative/dvc/pull/8564

    Full Changelog: https://github.com/iterative/dvc/compare/2.34.1...2.34.2

    Source code(tar.gz)
    Source code(zip)
    dvc-2.34.2-1.x86_64.rpm(131.43 MB)
    dvc-2.34.2.exe(52.89 MB)
    dvc-2.34.2.pkg(101.96 MB)
    dvc_2.34.2_amd64.deb(132.33 MB)
  • 2.34.1(Nov 11, 2022)

    What's Changed

    πŸ› Bug Fixes

    • Make exp show handle errors better by @karajan1001 in https://github.com/iterative/dvc/pull/8533
    • Solve the crash on getting name of applied experiment branch by @karajan1001 in https://github.com/iterative/dvc/pull/8541
    • Fix some celery queue related ci failure. by @karajan1001 in https://github.com/iterative/dvc/pull/8404

    πŸ”¨ Maintenance

    • index: support filtering view by output by @pmrowla in https://github.com/iterative/dvc/pull/8537
    • dvc exceptions CyclicGraphError: add more clear message for the excep… by @ykasimov in https://github.com/iterative/dvc/pull/8263
    • build(deps-dev): Bump dvc-task from 0.1.4 to 0.1.5 by @dependabot in https://github.com/iterative/dvc/pull/8539
    • build(deps-dev): Bump dvc-gs from 2.19.1 to 2.20.0 by @dependabot in https://github.com/iterative/dvc/pull/8548
    • build(deps-dev): Bump mypy from 0.982 to 0.990 by @dependabot in https://github.com/iterative/dvc/pull/8535
    • build(deps-dev): Bump iterative-telemetry from 0.0.5 to 0.0.6 by @dependabot in https://github.com/iterative/dvc/pull/8538

    Other Changes

    • plots: support svg by @blakeNaccarato in https://github.com/iterative/dvc/pull/8542

    New Contributors

    • @blakeNaccarato made their first contribution in https://github.com/iterative/dvc/pull/8542

    Full Changelog: https://github.com/iterative/dvc/compare/2.34.0...2.34.1

    Source code(tar.gz)
    Source code(zip)
    dvc-2.34.1-1.x86_64.rpm(122.11 MB)
    dvc-2.34.1.exe(49.20 MB)
    dvc-2.34.1.pkg(92.38 MB)
    dvc_2.34.1_amd64.deb(122.76 MB)
  • 2.34.0(Nov 7, 2022)

    What's Changed

    πŸ”¨ Maintenance

    • hydra: Raise lazy DvcException for Python >= 3.11 by @daavoo in https://github.com/iterative/dvc/pull/8521
    • build(deps-dev): Bump dvc-s3 from 2.20.1 to 2.21.0 by @dependabot in https://github.com/iterative/dvc/pull/8524

    Other Changes

    • plots: allow top-level strings by @dberenbaum in https://github.com/iterative/dvc/pull/8482
    • import-url: include files entry for cloud versioned dir dependencies by @pmrowla in https://github.com/iterative/dvc/pull/8528
    • ci: bench: use 3.11 in benchmarks by @skshetry in https://github.com/iterative/dvc/pull/8525
    • fix hydra_sweeps referenced before assignment by @dberenbaum in https://github.com/iterative/dvc/pull/8530
    • DVCLive 1.0 by @daavoo in https://github.com/iterative/dvc/pull/8532

    Full Changelog: https://github.com/iterative/dvc/compare/2.33.2...2.34.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.34.0-1.x86_64.rpm(122.91 MB)
    dvc-2.34.0.exe(49.19 MB)
    dvc-2.34.0.pkg(92.39 MB)
    dvc_2.34.0_amd64.deb(123.57 MB)
  • 2.33.2(Nov 3, 2022)

    What's Changed

    πŸ› Bug Fixes

    • commit: skip changed_entries check on force commit by @pmrowla in https://github.com/iterative/dvc/pull/8505
    • exp run: catch hydra import in 3.11 by @pmrowla in https://github.com/iterative/dvc/pull/8519

    πŸ”¨ Maintenance

    • build(deps-dev): Bump pylint from 2.15.4 to 2.15.5 by @dependabot in https://github.com/iterative/dvc/pull/8463
    • build(deps-dev): Bump pytest from 7.1.3 to 7.2.0 by @dependabot in https://github.com/iterative/dvc/pull/8479
    • build(deps): Bump pyinstaller from 5.0 to 5.6.1 by @dependabot in https://github.com/iterative/dvc/pull/8475
    • build(deps-dev): Bump pytest-xdist from 2.5.0 to 3.0.2 by @dependabot in https://github.com/iterative/dvc/pull/8474
    • build(deps): Bump pyinstaller from 5.6.1 to 5.6.2 by @dependabot in https://github.com/iterative/dvc/pull/8499
    • build: bump pyinstaller packages python version to 3.10 by @skshetry in https://github.com/iterative/dvc/pull/8511
    • deps: bump scmrepo to 0.1.3 by @pmrowla in https://github.com/iterative/dvc/pull/8520

    New Contributors

    • @step-security-bot made their first contribution in https://github.com/iterative/dvc/pull/8496

    Full Changelog: https://github.com/iterative/dvc/compare/2.33.1...2.33.2

    Source code(tar.gz)
    Source code(zip)
    dvc-2.33.2-1.x86_64.rpm(122.92 MB)
    dvc-2.33.2.exe(49.15 MB)
    dvc-2.33.2.pkg(92.40 MB)
    dvc_2.33.2_amd64.deb(123.57 MB)
  • 2.33.1(Oct 31, 2022)

  • 2.33.0(Oct 30, 2022)

  • 2.32.1(Oct 29, 2022)

  • 2.32.0(Oct 29, 2022)

    What's Changed

    • Use celery status as the exp show status by @karajan1001 in https://github.com/iterative/dvc/pull/8369
    • index: support multiple targets within output in IndexView by @efiop in https://github.com/iterative/dvc/pull/8471
    • auto solve corrupted rwlock info by @karajan1001 in https://github.com/iterative/dvc/pull/8469

    Full Changelog: https://github.com/iterative/dvc/compare/2.31.0...2.32.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.32.0-1.x86_64.rpm(131.08 MB)
    dvc-2.32.0.exe(60.35 MB)
    dvc-2.32.0.pkg(101.38 MB)
    dvc_2.32.0_amd64.deb(131.95 MB)
  • 2.31.0(Oct 21, 2022)

  • 2.30.1(Oct 21, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • import-url: use dvc-data index.save() for fetching imports (#8249) @pmrowla
    • [pre-commit.ci] pre-commit autoupdate (#8441) @pre-commit-ci
    • plots: allow definition of plots section as list (#8412) @dtrifiro
    • config: ssh: Add passphrase, ask_passphrase (#8143) @daavoo
    • index: add IndexView, brancher: support index (#8407) @pmrowla
    • ignore: walk: support detail=True (#8398) @efiop

    πŸš€ New Features and Enhancements

    • exp show: Preserve full branch and tag names. (#8425) @daavoo

    πŸ‡ Optimizations

    • exp show: Use batch call on scm.describe (#8453) @karajan1001

    πŸ› Bug Fixes

    • Give lock acquiring more time in concurrency situation. (#8436) @karajan1001
    • exp show: Preserve full branch and tag names. (#8425) @daavoo

    πŸ”¨ Maintenance

    • build(deps): Bump dvc-task from 0.1.3 to 0.1.4 (#8447) @dependabot
    • deps: bump dvc-data to 0.20.0 (#8443) @pmrowla
    • build(deps-dev): Bump pylint from 2.15.2 to 2.15.4 (#8424) @dependabot
    • build(deps): Bump dvc-data from 0.18.0 to 0.19.0 (#8442) @dependabot
    • build(deps-dev): Bump pytest-mock from 3.9.0 to 3.10.0 (#8402) @dependabot
    • deps: bump dvc-data to 0.18.0 (#8432) @pmrowla
    • [pre-commit.ci] pre-commit autoupdate (#8422) @pre-commit-ci

    Thanks again to @daavoo, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @karajan1001, @pmrowla, @pre-commit-ci, @pre-commit-ci[bot] and @skshetry for the contributions! πŸŽ‰

    Source code(tar.gz)
    Source code(zip)
    dvc-2.30.1-1.x86_64.rpm(126.80 MB)
    dvc-2.30.1.exe(60.14 MB)
    dvc-2.30.1.pkg(101.28 MB)
    dvc_2.30.1_amd64.deb(127.60 MB)
  • 2.30.0(Oct 10, 2022)

    What's Changed

    • exp show :Add --hide-queued and --hide-failed flag by @karajan1001 in https://github.com/iterative/dvc/pull/8318
    • build(deps): Bump dvc-render from 0.0.11 to 0.0.12 by @dependabot in https://github.com/iterative/dvc/pull/8401
    • Refactor dvc get-url by @rlamy in https://github.com/iterative/dvc/pull/8410
    • deps: bump dvc-data to 0.17.1 by @pmrowla in https://github.com/iterative/dvc/pull/8416

    Full Changelog: https://github.com/iterative/dvc/compare/2.29.0...2.30.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.30.0-1.x86_64.rpm(126.83 MB)
    dvc-2.30.0.exe(60.18 MB)
    dvc-2.30.0.pkg(101.29 MB)
    dvc_2.30.0_amd64.deb(127.66 MB)
  • 2.29.0(Oct 4, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • hydra: Fix append and remove sweeps. (#8381) @daavoo
    • Create basic version of dvc ls-url command (#8299) @rlamy
    • deps: bump dvc-data to 0.14.0 (#8389) @efiop
    • dvcfs tests: copy pytest param instead of in-place update (#8388) @skshetry
    • Rename dvc.testing.test_*.py (#8386) @rlamy
    • cli: remove foreach-group from help text (#8383) @dberenbaum
    • [pre-commit.ci] pre-commit autoupdate (#8367) @pre-commit-ci

    πŸ› Bug Fixes

    • repo: fix crash while collecting stages with symlinks (#8364) @dtrifiro
    • import: fix rev lock and pull with --no-download (#8341) @dtrifiro
    • config: wrap UnicodeDecodeErrors on load (#8380) @pmrowla

    πŸ”¨ Maintenance

    • logger: init logging config before colorama (#8395) @pmrowla
    • build(deps-dev): Bump mypy from 0.981 to 0.982 (#8393) @dependabot
    • build(deps-dev): Bump mypy from 0.971 to 0.981 (#8368) @dependabot
    • config: wrap UnicodeDecodeErrors on load (#8380) @pmrowla
    • build(deps-dev): Bump pytest-mock from 3.8.2 to 3.9.0 (#8378) @dependabot
    • build(deps-dev): Bump pytest-cov from 3.0.0 to 4.0.0 (#8379) @dependabot
    • build(deps): Bump dvc-task from 0.1.2 to 0.1.3 (#8377) @dependabot

    Thanks again to @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @pmrowla, @pre-commit-ci, @pre-commit-ci[bot], @rlamy and @skshetry for the contributions! πŸŽ‰

    Source code(tar.gz)
    Source code(zip)
    dvc-2.29.0-1.x86_64.rpm(126.81 MB)
    dvc-2.29.0.exe(60.15 MB)
    dvc-2.29.0.pkg(101.28 MB)
    dvc_2.29.0_amd64.deb(127.63 MB)
  • 2.28.0(Sep 27, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • vscode: support flexible plots (#8282) @pared
    • pull: hide glob option (#8337) @dberenbaum
    • deps: bump codespell (#8199) @pared
    • import/import-url: ignore outs when using --no-download (#8343) @dtrifiro
    • fixed link to "get started: pipelines" docs (#8340) @MartinoMensio

    πŸš€ New Features and Enhancements

    • exp show: sync state between queue and exp show table (#8158) @karajan1001
    • merge-driver: support removes and changes (#8360) @dberenbaum

    πŸ› Bug Fixes

    • cloud-versioning: better handling for directories (#8362) @efiop
    • Solve the on_diverged function not executed error. (#8351) @karajan1001
    • hydra: Fix sweeps on Defaults List. (#8308) @daavoo

    πŸ”¨ Maintenance

    • build(deps): Bump dvc-data from 0.10.1 to 0.12.0 (#8346) @dependabot
    • deps: bump dvc-http to 2.27.2 (#8333) @dtrifiro
    • deps: bump dvc-data to 0.10.1 (#8330) @pmrowla

    Thanks again to @MartinoMensio, @daavoo, @dberenbaum, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @karajan1001, @pared, @pmrowla and @skshetry for the contributions! πŸŽ‰

    Source code(tar.gz)
    Source code(zip)
    dvc-2.28.0-1.x86_64.rpm(125.80 MB)
    dvc-2.28.0.exe(59.45 MB)
    dvc-2.28.0.pkg(100.24 MB)
    dvc_2.28.0_amd64.deb(126.62 MB)
  • 2.27.2(Sep 19, 2022)

  • 2.27.1(Sep 19, 2022)

  • 2.27.0(Sep 19, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • remove mergify (#8319) @skshetry
    • deps: add testing group for dvc.testing requirements (#8314) @dtrifiro
    • deps: bump dvc-data to 0.10.0 (#8313) @efiop
    • dvcfs: rename DvcFileSystem to DVCFileSystem (#8307) @skshetry
    • dvcfs: prevent opening file object in write mode (#8306) @skshetry

    πŸ”¨ Maintenance

    • analytics: use iterative-telemetry for user_id lookup (#8317) @efiop
    • deps: bump dvc-azure to 2.20.4 (#8305) @pmrowla
    • build(deps): Bump dvc-render from 0.0.10 to 0.0.11 (#8303) @dependabot

    Thanks again to @dependabot, @dependabot[bot], @dtrifiro, @efiop, @pmrowla and @skshetry for the contributions! πŸŽ‰

    Source code(tar.gz)
    Source code(zip)
    dvc-2.27.0-1.x86_64.rpm(125.51 MB)
    dvc-2.27.0.exe(59.39 MB)
    dvc-2.27.0.pkg(99.99 MB)
    dvc_2.27.0_amd64.deb(126.31 MB)
  • 2.26.2(Sep 15, 2022)

  • 2.26.1(Sep 15, 2022)

  • 2.26.0(Sep 15, 2022)

    What's Changed

    • expose dvcfs in dvc.api and add to fsspec's registry by @skshetry in https://github.com/iterative/dvc/pull/8287
    • import-url: pass fs_config down from imp_url to get_cloud_fs by @dtrifiro in https://github.com/iterative/dvc/pull/8286
    • deps: remove unused mock dep by @dtrifiro in https://github.com/iterative/dvc/pull/8290
    • dvcfs: default open to binary mode by @skshetry in https://github.com/iterative/dvc/pull/8295
    • worktree push/fetch: support dirs by @pmrowla in https://github.com/iterative/dvc/pull/8273
    • schema: add strict schema validation for top-level plots by @skshetry in https://github.com/iterative/dvc/pull/8289

    Full Changelog: https://github.com/iterative/dvc/compare/2.25.0...2.26.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.26.0-1.x86_64.rpm(125.52 MB)
    dvc-2.26.0.exe(59.39 MB)
    dvc-2.26.0.pkg(100.01 MB)
    dvc_2.26.0_amd64.deb(126.32 MB)
  • 2.25.0(Sep 13, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • dvc: cloud versioning POC (#8264) @efiop
    • typo in setup config causing versioning errors in poetry (#8229) @jlhbaseball15
    • tests: set celery ping_task_timeout 3x the default (#8221) @skshetry

    πŸš€ New Features and Enhancements

    • exp run: Support hydra basic sweeper. (#8187) @daavoo
    • data ls: new command to show metadata with outputs (#8252) @skshetry
    • dvcfs: remove config (#8276) @skshetry
    • add metadata support to dvc.yaml (#8251) @skshetry
    • Add support for custom metadata (#8250) @skshetry
    • add metadata fields: label, type to data (#8232) @skshetry
    • add support for foreach target (#8210) @skshetry
    • output: support version ID (#8223) @pmrowla
    • add support for git credentials helpers (#6586, scmrepo#138) @dtrifiro

    πŸ‡ Optimizations

    • Optimise dvc ls -R (#8241) @rlamy

    πŸ› Bug Fixes

    • dvc.yaml: preserve outputs' desc on rewrites/updates to the stage (#8247) @skshetry
    • import: fix broken auth https://github.com/iterative/dvc/issues/7898

    πŸ”¨ Maintenance

    • build(deps-dev): Bump pylint from 2.15.0 to 2.15.2 (#8268) @dependabot
    • build(deps): Bump scmrepo from 0.1.0 to 0.1.1 (#8269) @dependabot
    • build(deps): Bump dvc-render from 0.0.9 to 0.0.10 (#8254) @dependabot
    • build(deps): Bump dvc-data from 0.4.0 to 0.5.3 (#8237) @dependabot
    • build(deps-dev): Bump pytest from 7.1.2 to 7.1.3 (#8239) @dependabot
    • build(deps-dev): Bump dvc-azure from 2.20.0 to 2.20.2 (#8240) @dependabot
    • deps: bump dvc-data to 0.7.1 (#8266) @efiop
    • deps: bump dvc-data to 0.6.3 (#8257) @efiop
    • deps: bump dvc-azure and dvc-s3 to 2.20.0 (#8224) @efiop

    Thanks again to @daavoo, @dependabot, @dependabot[bot], @dtrifiro, @efiop, @jlhbaseball15, @pmrowla, @rlamy and @skshetry for the contributions! πŸŽ‰

    Source code(tar.gz)
    Source code(zip)
    dvc-2.25.0-1.x86_64.rpm(125.50 MB)
    dvc-2.25.0.exe(59.39 MB)
    dvc-2.25.0.pkg(99.98 MB)
    dvc_2.25.0_amd64.deb(126.31 MB)
  • 2.24.0(Sep 1, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • deps: bump dvc-data to 0.4.0 by @efiop in https://github.com/iterative/dvc/pull/8219 and https://github.com/iterative/dvc/pull/8213

    πŸš€ New Features and Enhancements

    • exp run: Support composing and dumping Hydra config. by @daavoo in https://github.com/iterative/dvc/pull/8093

    πŸ› Bug Fixes

    • data status: fix path for committed changes in Windows by @skshetry in https://github.com/iterative/dvc/pull/8220

    Full Changelog: https://github.com/iterative/dvc/compare/2.23.0...2.24.0

    Source code(tar.gz)
    Source code(zip)
    dvc-2.24.0-1.x86_64.rpm(125.46 MB)
    dvc-2.24.0.exe(59.34 MB)
    dvc-2.24.0.pkg(99.92 MB)
    dvc_2.24.0_amd64.deb(126.26 MB)
  • 2.23.0(Aug 30, 2022)

    Refer to https://dvc.org/doc/install for installation instructions.

    Changes

    • output.get_obj: catch ObjectCorruptedError (#8212) @skshetry
    • data status: fix quoting on command hints for untracked files (#8211) @skshetry
    • fetch: do not checkout partial imports (#8205) @dtrifiro
    • data status: update hints to include fetch and checkout (#8209) @dberenbaum
    • test on 3.11 (#8196) @skshetry
    • plots: support dirs in top level definitions (#8159) @pared
    • repo: Handle no commits for exp show and plots diff. (#8177) @daavoo
    • plots templates: change ui to not dump to file (#8129) @dberenbaum

    πŸš€ New Features and Enhancements

    • info: Include subprojects. (#8201) @daavoo
    • data status: remove --withdirs, show unknowns in CLI (#8189) @skshetry
    • api: Add details forparams_show stages syntax. (#8167) @daavoo
    • Better error message when specifying file as target for remove (#8044) @alexmojaki

    Thanks again to @alexmojaki, @daavoo, @dberenbaum, @dtrifiro, @pared, @pre-commit-ci[bot] and @skshetry for the contributions! πŸŽ‰

    Source code(tar.gz)
    Source code(zip)
    dvc-2.23.0-1.x86_64.rpm(125.02 MB)
    dvc-2.23.0.exe(59.08 MB)
    dvc-2.23.0.pkg(99.51 MB)
    dvc_2.23.0_amd64.deb(125.81 MB)
Owner
Iterative
Developer Tools for Machine Learning
Iterative
A simple version control system built on top of Git

Gitless Gitless is a version control system built on top of Git, that is easy to learn and use: Simple commit workflow Track or untrack files to contr

Gitless 1.7k Dec 22, 2022
ViewVC is a browser interface for CVS and Subversion version control repositories.

ViewVC - Version Control Browser Interface ViewVC is a browser interface for CVS and Subversion version control repositories. It generates templatized

ViewVC 270 Dec 30, 2022
The new home of rabbitvcs

RabbitVCS RabbitVCS is a set of graphical tools written to provide simple and straightforward access to the version control systems you use. We curren

RabbitVCS 349 Dec 05, 2022
Patchwork is a web-based patch tracking system designed to facilitate the contribution and management of contributions to an open-source project.

Patchwork Patchwork is a patch tracking system for community-based projects. It is intended to make the patch management process easier for both the p

Patchwork 220 Nov 29, 2022
git-cola: The highly caffeinated Git GUI

git-cola: The highly caffeinated Git GUI git-cola is a powerful Git GUI with a slick and intuitive user interface. Copyright (C) 2007-2020, David Agu

git-cola 2k Dec 30, 2022
Trac is an enhanced wiki and issue tracking system for software development projects (mirror)

About Trac Trac is a minimalistic web-based software project management and bug/issue tracking system. It provides an interface to the Git and Subvers

Edgewall Software 442 Dec 10, 2022
πŸ¦‰Data Version Control | Git for Data & Models

Website β€’ Docs β€’ Blog β€’ Twitter β€’ Chat (Community & Support) β€’ Tutorial β€’ Mailing List Data Version Control or DVC is an open-source tool for data sci

Iterative 10.9k Jan 09, 2023
Mirror of Apache Allura

Apache Allura Allura is an open source implementation of a software "forge", a web site that manages source code repositories, bug reports, discussion

The Apache Software Foundation 106 Dec 21, 2022
docker run klaus / pip install klaus β€” the first Git web viewer that Just Worksβ„’.

klaus: a simple, easy-to-set-up Git web viewer that Just Worksβ„’. (If it doesn't Just Work for you, please file a bug.) Super easy to set up -- no conf

Jonas Haag 638 Dec 24, 2022