Skip to content

Getting started

If you want to jump straight in, install pyDVL and then check out the examples. You will probably want to install with support for influence function computation.

We have introductions to the ideas behind Data valuation and Influence functions, as well as a short overview of common applications.

Installing pyDVL

To install the latest release use:

pip install pyDVL

See Extras for optional dependencies, in particular if you are interested in influence functions. You can also install the latest development version from TestPyPI:

pip install pyDVL --index-url https://test.pypi.org/simple/

In order to check the installation you can use:

python -c "import pydvl; print(pydvl.__version__)"

Dependencies

pyDVL requires Python >= 3.8, numpy, scikit-learn, scipy, cvxpy for the core methods, and joblib for parallelization locally. Additionally,the Influence functions module requires PyTorch (see Extras below).

Extras

pyDVL has a few extra dependencies that can be optionally installed:

Influence functions

To use the module on influence functions, pydvl.influence, run:

pip install pyDVL[influence]

This includes a dependency on PyTorch (Version 2.0 and above) and thus is left out by default.

CuPy

In case that you have a supported version of CUDA installed (v11.2 to 11.8 as of this writing), you can enable eigenvalue computations for low-rank approximations with CuPy on the GPU by using:

pip install pyDVL[cupy]

This installs cupy-cuda11x.

If you use a different version of CUDA, please install CuPy manually.

Ray

If you want to use Ray to distribute data valuation workloads across nodes in a cluster (it can be used locally as well, but for this we recommend joblib instead) install pyDVL using:

pip install pyDVL[ray]

see the intro to parallelization for more details on how to use it.

Memcached

If you want to use Memcached for caching utility evaluations, use:

pip install pyDVL[memcached]

This installs pymemcache additionally. Be aware that you still have to start a memcached server manually. See Setting up the Memcached cache.