Reproduce#

To reproduce the results of a project, we first clone the data managed with DVC:

$ git clone https://github.com/veit/dvc-example.git
$ cd dvc-example
$ dvc pull -TR
A       data/data.xml
1 file added
$ ls data/
data.xml    data.xml.dvc

Then you can easily reproduce the results with dvc repro:

$ dvc repro
Verifying data sources in stage: 'data/data.xml.dvc'
Stage 'split' didn't change, skipping
Stage 'featurize' didn't change, skipping
Stage 'train' didn't change, skipping
Stage 'evaluate' didn't change, skipping

You can now, for example, change parameters in the params.yaml file and then run through the pipeline again:

$ dvc repro
Stage 'data/data.xml.dvc' didn't change, skipping
Stage 'split' didn't change, skipping
Running stage 'featurize' with command:
    python src/featurization.py data/splitted data/features

Stage 'train' didn't change, skipping
Stage 'evaluate' didn't change, skipping
To track the changes with git, run:
    git add dvc.lock

In our case, changing the parameters had no effect on the result.

Note

DVC recognises changes to dependencies and outputs via md5 hash values in dvc.lock.