Create project¶

DVC can be easily initialised with:

$ uv init --package dvc-example
$ cd dvc-example
$ git init
$ git add --all
$ git commit -m ':tada: Initial commit'
$ uv add dvc
$ uv run dvc init
$ git add pyproject.toml .dvc .dvcignore
$ git commit -m ":heavy_plus_sign: Add and initialise DVC"

uv run dvc init

creates a .dvc/ directory with config, .gitignore and cache/ directory.

The first time you run dvc init, you will be informed that DVC collects and transmits anonymised usage statistics. If you want to disable this, you can do so with the command dvc config:

$ uv run dvc config core.analytics false

This will disable it for the project. Alternatively, you can use the --global or --system options of dvc config to disable analytics for the active account or for all accounts in the system.

git add pyproject.toml .dvc .dvcignore

places .dvc/config, .dvc/.gitignore and the updated pyproject.toml under Git version control.

Configure remote storage¶

Before using DVC, remote storage should be set up. This should be accessible to everyone who needs to access the data or model. It is similar to using a Git server. However, this is often also an NFS mount, which can be integrated as follows, for example:

$ mkdir ~/dvc-storage
$ uv run dvc remote add -d local ~/dvc-storage
Setting 'local' as a default remote.
$ git commit .dvc/config -m ":wrench: Configure local remote"
[main 3e0c8fb] :wrench: Configure local remote
 1 file changed, 4 insertions(+)

-d, --default

Default value for remote storage space

local

Name of remote storage space

~/dvc-storage

URL of remote storage space

Other protocols are also supported and can be prefixed to the path, including ssh:, hdfs:, https:.

This means that another remote data storage location can easily be added, for example with:

$ uv run dvc remote add webserver https://dvc.cusy.io/dvc-example

The corresponding configuration file .dvc/config then looks like this:

[core]
    remote = local
['remote "local"']
    url = /Users/veit/dvc-storage
['remote "webserver"']
    url = https://dvc.cusy.io/dvc-example

Configure pre-commit¶

You can check the data managed by DVC with the pre-commit framework before every git commit and git push, as well as after every git checkout. With dvc config --use-pre-commit-tool, the .pre-commit-config.yaml file receives the following checks:

- repo: https://github.com/iterative/dvc
  rev: 3.63.0
  hooks:
  - id: dvc-pre-commit
    additional_dependencies:
    - .[all]
    language_version: python3
    stages:
    - pre-commit
  - id: dvc-pre-push
    additional_dependencies:
    - .[all]
    language_version: python3
    stages:
    - pre-push
  - id: dvc-post-checkout
    additional_dependencies:
    - .[all]
    language_version: python3
    stages:
    - post-checkout
    always_run: true

To ensure that not only the pre-commit hook is used, you must also activate the pre-push and post-checkout hooks:

$ pre-commit install --hook-type pre-commit --hook-type pre-push --hook-type post-checkout
pre-commit installed at .git/hooks/pre-commit
pre-commit installed at .git/hooks/pre-push
pre-commit installed at .git/hooks/post-checkout