Installing pyDVL¶
To install the latest release use:
To use all features of influence functions use instead:
This includes a dependency on PyTorch (Version 2.0 and above) and thus is left out by default.
In case that you have a supported version of CUDA installed (v11.2 to 11.8 as of this writing), you can enable eigenvalue computations for low-rank approximations with CuPy on the GPU by using:
If you use a different version of CUDA, please install CuPy manually.
In order to check the installation you can use:
You can also install the latest development version from TestPyPI:
Dependencies¶
pyDVL requires Python >= 3.8, Memcached for caching and Ray for parallelization in a cluster (locally it uses joblib). Additionally, the Influence functions module requires PyTorch (see Installing pyDVL).
ray is used to distribute workloads across nodes in a cluster (it can be used locally as well, but for this we recommend joblib instead). Please follow the instructions in their documentation to set up the cluster. Once you have a running cluster, you can use it by passing the address of the head node to parallel methods via ParallelConfig.
Setting up the cache¶
memcached is an in-memory key-value store accessible over the network. pyDVL uses it to cache the computation of the utility function and speed up some computations (in particular, semi-value computations with the PermutationSampler but other methods may benefit as well).
You can either install it as a package or run it inside a docker container (the simplest). For installation instructions, refer to the Getting started section in memcached's wiki. Then you can run it with:
To run memcached inside a container in daemon mode instead, do:
Using the cache
Continue reading about the cache in the First Steps and the documentation for the caching module.
Created: 2023-10-14