Methods

We currently implement the following methods:

Data valuation¶

\(\delta\)-Shapley (Watson et al., 2023)¹
Beta Shapley (Kwon and Zou, 2022)².
Class-Wise Shapley (Schoch et al., 2022)³.
Data Banzhaf and MSR sampling (Wang and Jia, 2023)⁴.
Data Utility Learning (Wang et al., 2022)⁵.
Data-OOB (Kwon and Zou, 2023)⁶.
Group Testing Shapley (Jia et al., 2019)⁷
kNN-Shapley, exact only (Jia et al., 2019)⁸.
Least Core (Yan and Procaccia, 2021)⁹.
Leave-One-Out values.
Owen Shapley (Okhrati and Lipani, 2021)¹⁰.
Permutation Shapley, also called ApproxShapley (Castro et al., 2009)¹¹.
Truncated Monte Carlo Shapley (Ghorbani and Zou, 2019)¹².
Variance-Reduced Data Shapley (Wu et al., 2023)¹³.

Influence functions¶

CG Influence (Koh and Liang, 2017)¹⁴.
Direct Influence (Koh and Liang, 2017)¹⁴.
LiSSA (Agarwal et al., 2017)¹⁵.
Arnoldi Influence (Schioppa et al., 2022)¹⁶.
EKFAC Influence (George et al., 2018; Martens and Grosse, 2015)¹⁷ ¹⁸.
Nyström Influence, based on the ideas in (Hataya and Yamada, 2023)¹⁹ for bi-level optimization.
Inverse-harmonic-mean Influence (Kwon et al., 2023)²⁰.

Watson, L., Kujawa, Z., Andreeva, R., Yang, H.-T., Elahi, T., Sarkar, R., 2023. Accelerated Shapley Value Approximation for Data Evaluation [WWW Document]. https://doi.org/10.48550/arXiv.2311.05346 ↩
Kwon, Y., Zou, J., 2022. Beta Shapley: A Unified and [Noise-reduced Data Valuation Framework]{.nocase} for Machine Learning, in: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022,. Presented at the AISTATS 2022, PMLR, Valencia, Spain. ↩
Schoch, S., Xu, H., Ji, Y., 2022. CS-Shapley: [Class-wise Shapley Values]{.nocase} for Data Valuation in Classification, in: Proc. Of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS). Presented at the Advances in Neural Information Processing Systems (NeurIPS 2022), New Orleans, Louisiana, USA. ↩
Wang, J.T., Jia, R., 2023. Data Banzhaf: A Robust Data Valuation Framework for Machine Learning, in: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics. Presented at the International Conference on Artificial Intelligence and Statistics, PMLR, pp. 6388--6421. ↩
Wang, T., Yang, Y., Jia, R., 2022. Improving [Cooperative Game Theory-based Data Valuation]{.nocase} via Data Utility Learning. Presented at the International Conference on Learning Representations (ICLR 2022). Workshop on Socially Responsible Machine Learning, arXiv. https://doi.org/10.48550/arXiv.2107.06336 ↩
Kwon, Y., Zou, J., 2023. Data-OOB: [Out-of-bag Estimate]{.nocase} as a Simple and Efficient Data Value, in: Proceedings of the 40th International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 18135--18152. ↩
Jia, R., Dao, D., Wang, B., Hubis, F.A., Hynes, N., Gürel, N.M., Li, B., Zhang, C., Song, D., Spanos, C.J., 2019. Towards Efficient Data Valuation Based on the Shapley Value, in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. Presented at the International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, pp. 1167--1176. ↩
Jia, R., Dao, D., Wang, B., Hubis, F.A., Gurel, N.M., Li, B., Zhang, C., Spanos, C., Song, D., 2019. Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endow. 12, 1610--1623. https://doi.org/10.14778/3342263.3342637 ↩
Yan, T., Procaccia, A.D., 2021. If You Like Shapley Then You'll Love the Core, in: Proceedings of the 35th AAAI Conference on Artificial Intelligence. Presented at the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Virtual conference, pp. 5751--5759. https://doi.org/10.1609/aaai.v35i6.16721 ↩
Okhrati, R., Lipani, A., 2021. A Multilinear Sampling Algorithm to Estimate Shapley Values, in: 2020 25th International Conference on Pattern Recognition (ICPR). Presented at the 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp. 7992--7999. https://doi.org/10.1109/ICPR48806.2021.9412511 ↩
Castro, J., Gómez, D., Tejada, J., 2009. Polynomial calculation of the Shapley value based on sampling. Computers\ & Operations Research, Selected papers presented at the Tenth International Symposium on Locational Decisions (ISOLDE X) 36, 1726--1730. https://doi.org/10.1016/j.cor.2008.04.004 ↩
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning, in: Proceedings of the 36th International Conference on Machine Learning, PMLR. Presented at the International Conference on Machine Learning (ICML 2019), PMLR, pp. 2242--2251. ↩
Wu, M., Jia, R., Lin, C., Huang, W., Chang, X., 2023. Variance reduced Shapley value estimation for trustworthy data valuation. Computers\ & Operations Research 159, 106305. https://doi.org/10.1016/j.cor.2023.106305 ↩
Koh, P.W., Liang, P., 2017. Understanding [Black-box Predictions]{.nocase} via Influence Functions, in: Proceedings of the 34th International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 1885--1894. ↩↩
Agarwal, N., Bullins, B., Hazan, E., 2017. Second-Order Stochastic Optimization for Machine Learning in Linear Time. JMLR 18, 1--40. ↩
Schioppa, A., Zablotskaia, P., Vilar, D., Sokolov, A., 2022. Scaling Up Influence Functions. Proc. AAAI Conf. Artif. Intell. 36, 8179--8186. https://doi.org/10.1609/aaai.v36i8.20791 ↩
George, T., Laurent, C., Bouthillier, X., Ballas, N., Vincent, P., 2018. Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis, in: Advances in Neural Information Processing Systems. Curran Associates, Inc. ↩
Martens, J., Grosse, R., 2015. Optimizing Neural Networks with [Kronecker-factored Approximate Curvature]{.nocase}, in: Proceedings of the 32nd International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 2408--2417. ↩
Hataya, R., Yamada, M., 2023. Nyström Method for Accurate and Scalable Implicit Differentiation, in: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics. Presented at the International Conference on Artificial Intelligence and Statistics, PMLR, pp. 4643--4654. ↩
Kwon, Y., Wu, E., Wu, K., Zou, J., 2023. DataInf: Efficiently Estimating Data Influence in [LoRA-tuned LLMs]{.nocase} and Diffusion Models. Presented at the The Twelfth International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2310.00902 ↩