Getting started¶

If you want to jump straight in, install pyDVL and then check out the examples. You will probably want to install with support for influence function computation.

We have introductions to the ideas behind Data valuation and Influence functions, as well as a short overview of common applications.

Installing pyDVL¶

To install the latest release use:

pip install pyDVL

See Extras for optional dependencies, in particular if you are interested in influence functions. You can also install the latest development version from TestPyPI:

pip install pyDVL --index-url https://test.pypi.org/simple/

In order to check the installation you can use:

python -c "import pydvl; print(pydvl.__version__)"

Dependencies¶

pyDVL requires Python >= 3.9, numpy, scikit-learn, scipy, cvxpy for the core methods, and joblib for parallelization locally. Additionally,the Influence functions module requires PyTorch (see Extras below).

Extras¶

pyDVL has a few extra dependencies that can be optionally installed:

Influence functions¶

pytorch dependency

While only pydvl.influence completely depends on PyTorch, some valuation methods in pydvl.valuation use PyTorch as well (e.g. DeepSets). If you want to use these, you can also follow the instructions below.

To use the module on influence functions, pydvl.influence, run:

pip install pyDVL[influence]

This includes a dependency on PyTorch (Version 2.0 and above) and thus is left out by default.

CuPy¶

In case that you have a supported version of CUDA installed (v11.2 to 11.8 as of this writing), you can enable eigenvalue computations for low-rank approximations with CuPy on the GPU by using:

pip install pyDVL[cupy]

This installs cupy-cuda11x.

If you use a different version of CUDA, please install CuPy manually.

Ray¶

If you want to use Ray to distribute data valuation workloads across nodes in a cluster (it can be used locally as well, but for this we recommend joblib instead) install pyDVL using:

pip install pyDVL[ray]

see the intro to parallelization for more details on how to use it.

Memcached¶

If you want to use Memcached for caching utility evaluations, use:

pip install pyDVL[memcached]

This installs pymemcache additionally. Be aware that you still have to start a memcached server manually. See Setting up the Memcached cache.