Changelog¶
0.7.0 - 📚🆕 Documentation and IF overhaul, new methods and bug fixes 💥🐞¶
This is our first β release! We have worked hard to deliver improvements across
the board, with a focus on documentation and usability. We have also reworked
the internals of the influence module, improved parallelism and handling of
randomness.
Added¶
- Implemented solving the Hessian equation via spectral low-rank approximation PR #365
- Enabled parallel computation for Leave-One-Out values PR #406
- Added more abbreviations to documentation PR #415
- Added seed to functions from
pydvl.utils.numeric,pydvl.value.shapleyandpydvl.value.semivalues. Introduced new typeSeedand conversion functionensure_seed_sequence. PR #396
Changed¶
- Replaced sphinx with mkdocs for documentation. Major overhaul of documentation PR #352
- Made ray an optional dependency, relying on joblib as default parallel backend PR #408
- Decoupled
ray.initfromParallelConfigPR #373 - Breaking Changes
- Signature change: return information about Hessian inversion from
compute_influence_factorsPR #375 - Major changes to IF interface and functionality. Foundation for a framework abstraction for IF computation. PR #278 PR #394
- Renamed
semivaluestocompute_generic_semivaluesPR #413 - New
joblibbackend as default instead of ray. Simplify MapReduceJob. PR #355 - Bump torch dependency for influence package to 2.0 PR #365
Fixed¶
- Fixes to parallel computation of generic semi-values: properly handle all samplers and stopping criteria, irrespective of parallel backend. PR #372
- Optimises memory usage in IF calculation PR #375
- Fix adding valuation results with overlapping indices and different lengths PR #370
- Fixed bugs in conjugate gradient and
linear_solvePR #358 - Fix installation of dev requirements for Python3.10 PR #382
- Improvements to IF documentation PR #371
0.6.1 - 🏗 Bug fixes and small improvements¶
- Fix parsing keyword arguments of
compute_semivaluesdispatch function PR #333 - Create new
RayExecutorclass based on the concurrent.futures API, use the new class to fix an issue with Truncated Monte Carlo Shapley (TMCS) starting too many processes and dying, plus other small changes PR #329 - Fix creation of GroupedDataset objects using the
from_arraysandfrom_sklearnclass methods PR #324 - Fix release job not triggering on CI when a new tag is pushed PR #331
- Added alias
ApproShapleyfrom Castro et al. 2009 for permutation Shapley PR #332
0.6.0 - 🆕 New algorithms, cleanup and bug fixes 🏗¶
- Fixes in
ValuationResult: bugs around data names, semantics ofempty(), new methodzeros()and normalised random values PR #327 - New method: Implements generalised semi-values for data valuation, including Data Banzhaf and Beta Shapley, with configurable sampling strategies PR #319
- Adds kwargs parameter to
from_arrayandfrom_sklearnDataset and GroupedDataset class methods PR #316 - PEP-561 conformance: added
py.typedPR #307 - Removed default non-negativity constraint on least core subsidy
and added instead a
non_negative_subsidyboolean flag. Renamedoptionstosolver_optionsand pass it as dict. Change default least-core solver to SCS with 10000 max_iters. PR #304 - Cleanup: removed unnecessary decorator
@unpackablePR #233 - Stopping criteria: fixed problem with
StandardErrorand enable proper composition of index convergence statuses. Fixed a bug withn_jobsintruncated_montecarlo_shapley. PR #300 and PR #305 - Shuffling code around to allow for simpler user imports, some cleanup and documentation fixes. PR #284
- Bug fix: Warn instead of raising an error when
n_iterationsis less than the size of the dataset in Monte Carlo Least Core PR #281
0.5.0 - 💥 Fixes, nicer interfaces and... more breaking changes 😒¶
- Fixed parallel and antithetic Owen sampling for Shapley values. Simplified and extended tests. PR #267
- Added
Scorerclass for a cleaner interface. Fixed minor bugs around Group-Testing Shapley, added more tests and switched to cvxpy for the solver. PR #264 - Generalised stopping criteria for valuation algorithms. Improved classes
ValuationResultandStatuswith more operations. Some minor issues fixed. PR #252 - Fixed a bug whereby
compute_shapley_valueswould only spawn one process when usingn_jobs=-1and Monte Carlo methods. PR #270 - Bugfix in
RayParallelBackend: wrong semantics forkwargs. PR #268 - Splitting of problem preparation and solution in Least-Core computation. Umbrella function for LC methods. PR #257
- Operations on
ValuationResultandStatusand some cleanup PR #248 - Bug fix and minor improvements: Fixes bug in TMCS with remote Ray cluster,
raises an error for dummy sequential parallel backend with TMCS, clones model
inside
Utilitybefore fitting by default, with flagclone_before_fitto disable it, catches all warnings inUtilitywhenshow_warningsisFalse. Adds Miner and Gloves toy games utilities PR #247
0.4.0 - 🏭💥 New algorithms and more breaking changes¶
- GH action to mark issues as stale PR #201
- Disabled caching of Utility values as well as repeated evaluations by default PR #211
- Test and officially support Python version 3.9 and 3.10 PR #208
- Breaking change: Introduces a class ValuationResult to gather and inspect results from all valuation algorithms PR #214
- Fixes bug in Influence calculation with multidimensional input and adds new example notebook PR #195
- Breaking change: Passes the input to
MapReduceJobat initialization, removeschunkify_inputsargument fromMapReduceJob, removesn_runsargument fromMapReduceJob, calls the parallel backend'sput()method for each generated chunk in_chunkify(), renames ParallelConfig'snum_workersattribute ton_local_workers, fixes a bug inMapReduceJob's chunkification whenn_runs>=n_jobs, and defines a sequential parallel backend to run all jobs in the current thread PR #232 - New method: Implements exact and monte carlo Least Core for data valuation,
adds
from_arrays()class method to theDatasetandGroupedDatasetclasses, addsextra_valuesargument toValuationResult, addscompute_removal_score()andcompute_random_removal_score()helper functions PR #237 - New method: Group Testing Shapley for valuation, from Jia et al. 2019 PR #240
- Fixes bug in ray initialization in
RayParallelBackendclass PR #239 - Implements "Egalitarian Least Core", adds cvxpy as a dependency and uses it instead of scipy as optimizer PR #243
0.3.0 - 💥 Breaking changes¶
- Simplified and fixed powerset sampling and testing PR #181
- Simplified and fixed publishing to PyPI from CI PR #183
- Fixed bug in release script and updated contributing docs. PR #184
- Added Pull Request template PR #185
- Modified Pull Request template to automatically link PR to issue PR ##186
- First implementation of Owen Sampling, squashed scores, better testing PR #194
- Improved documentation on caching, Shapley, caveats of values, bibtex PR #194
- Breaking change: Rearranging of modules to accommodate for new methods PR #194
0.2.0 - 📚 Better docs¶
Mostly API documentation and notebooks, plus some bugfixes.
Added¶
In PR #161:
- Support for $$ math in sphinx docs.
- Usage of sphinx extension for external links (introducing new directives like
:gh:, :issue: and :tfl: to construct standardised links to external
resources).
- Only update auto-generated documentation files if there are changes. Some
minor additions to update_docs.py.
- Parallelization of exact combinatorial Shapley.
- Integrated KNN shapley into the main interface compute_shapley_values.
Changed¶
In PR #161: - Improved main docs and Shapley notebooks. Added or fixed many docstrings, readme and documentation for contributors. Typos, grammar and style in code, documentation and notebooks. - Internal renaming and rearranging in the parallelization and caching modules.
Fixed¶
- Bug in random matrix generation PR #161.
- Bugs in MapReduceJob's
_chunkifyand_backpressuremethods PR #176.
0.1.0 - 🎉 first release¶
This is very first release of pyDVL.
It contains:
-
Data Valuation Methods:
-
Leave-One-Out
- Influence Functions
- Shapley:
- Exact Permutation and Combinatorial
- Montecarlo Permutation and Combinatorial
- Truncated Montecarlo Permutation
- Caching of results with Memcached
- Parallelization of computations with Ray
- Documentation
- Notebooks containing examples of different use cases
Created: 2021-04-01