Parameterisation¶
In the next phase of our example, we parameterise the processing and create the
following content in the params.yaml file for this purpose:
featurize:
max_features: 100
ngrams: 1
To read the parameters, -p featurize.max_features,featurize.ngrams is added
to the dvc stage command, in our example:
$ uv run dvc stage add \
-n featurize \
-p featurize.max_features,featurize.ngrams \
-d src/dvc_example/featurization.py -d data/prepared \
-o data/features \
uv run python src/dvc_example/featurization.py data/prepared data/features
This adds the following to the dvc.yaml file:
featurize:
cmd: uv run python src/dvc_example/featurization.py data/prepared
data/features
deps:
- data/prepared
- src/dvc_example/featurization.py
params:
- featurize.max_features
- featurize.ngrams
outs:
- data/features
To enable this phase to be repeated, the MD5 hash and parameter values are
stored in the dvc.lock file:
featurize:
cmd: uv run python src/dvc_example/featurization.py data/prepared
data/features
deps:
- path: data/prepared
hash: md5
md5: 153aad06d376b6595932470e459ef42a.dir
size: 8437363
nfiles: 2
- path: src/dvc_example/featurization.py
hash: md5
md5: e22789fc9581cad11ef7a6fa3aa3f17b
size: 4158
params:
params.yaml:
featurize.max_features: 100
featurize.ngrams: 1
outs:
- path: data/features
hash: md5
md5: 820664b8b793837e74ea3a5d334eb85c.dir
size: 1556292
nfiles: 2
Finally, data/.gitignore, dvc.lock, dvc.yaml,
params.yaml and src/dvc_example/featurization.py must be added
to the Git repository:
$ git add data/.gitignore dvc.lock dvc.yaml src/dvc_example/featurization.py
See also