Skip to content

First steps

Info

Make sure you have read Getting started before using the library. In particular read about which extra dependencies you may need.

Main concepts

pyDVL aims to be a repository of production-ready, reference implementations of algorithms for data valuation and influence functions. Read the following sections to get started:

Supported frameworks

  • The module for influence functions is built around PyTorch. Because of our use of the torch.func stateless api, we do not support jitted modules yet (see #640).

  • Up until v0.10.0, pyDVL only supported NumPy arrays for data valuation. From version 0.10.1 onwards, the library also supports PyTorch tensors for most valuation methods. The implementation attempts to preserve the input data type for the Dataset throughout computations where possible.

Note that some features have specific requirements or limitations when using tensors. For details on tensor support and caveats, see the [[tensor-support]] section.

Running the examples

If you are somewhat familiar with the concepts of data valuation, you can start by browsing our worked-out examples illustrating pyDVL's capabilities either:

  • In the examples under Basics of data valuation and Computing Influence Values.
  • Using binder notebooks, deployed from each example's page.
  • Locally, by starting a jupyter server at the root of the project. You will have to install jupyter first manually since it's not a dependency of the library.

Advanced usage

Refer to the Advanced usage page for explanations on how to enable and use parallelization and caching.