Methods
We currently implement the following methods:
Data valuation¶
-
LOO.
-
Permutation Shapley (also called ApproxShapley) (Castro et al., 2009)1.
-
Data Banzhaf [@wang_data_2022].
-
Beta Shapley (Kwon and Zou, 2022)3.
-
CS-Shapley (Schoch et al., 2022)4.
-
Least Core (Yan and Procaccia, 2021)5.
-
Owen Sampling (Okhrati and Lipani, 2021)6.
-
Data Utility Learning (Wang et al., 2022)7.
-
kNN-Shapley (Jia et al., 2019)8.
-
Group Testing (Jia et al., 2019)9
Influence functions¶
-
CG Influence. (Koh and Liang, 2017)11.
-
Direct Influence (Koh and Liang, 2017)11.
-
Arnoldi Influence (Schioppa et al., 2022)13.
-
EKFAC Influence (George et al., 2018; Martens and Grosse, 2015)1415.
-
Nyström Influence, based on the ideas in (Hataya and Yamada, 2023)16 for bi-level optimization.
-
Castro, J., Gómez, D., Tejada, J., 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, Selected papers presented at the Tenth International Symposium on Locational Decisions (ISOLDE X) 36, 1726–1730. https://doi.org/10.1016/j.cor.2008.04.004 ↩
-
Ghorbani, A., Zou, J., 2019. Data Shapley: Equitable Valuation of Data for Machine Learning, in: Proceedings of the 36th International Conference on Machine Learning, PMLR. Presented at the International Conference on Machine Learning (ICML 2019), PMLR, pp. 2242–2251. ↩
-
Kwon, Y., Zou, J., 2022. Beta Shapley: A Unified and Noise-reduced Data Valuation Framework for Machine Learning, in: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022,. Presented at the AISTATS 2022, PMLR. ↩
-
Schoch, S., Xu, H., Ji, Y., 2022. CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification, in: Proc. Of the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS). Presented at the Advances in Neural Information Processing Systems (NeurIPS 2022). ↩
-
Yan, T., Procaccia, A.D., 2021. If You Like Shapley Then You’ll Love the Core, in: Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021. Presented at the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, pp. 5751–5759. https://doi.org/10.1609/aaai.v35i6.16721 ↩
-
Okhrati, R., Lipani, A., 2021. A Multilinear Sampling Algorithm to Estimate Shapley Values, in: 2020 25th International Conference on Pattern Recognition (ICPR). Presented at the 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp. 7992–7999. https://doi.org/10.1109/ICPR48806.2021.9412511 ↩
-
Wang, T., Yang, Y., Jia, R., 2022. Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning. Presented at the International Conference on Learning Representations (ICLR 2022). Workshop on Socially Responsible Machine Learning, arXiv. https://doi.org/10.48550/arXiv.2107.06336 ↩
-
Jia, R., Dao, D., Wang, B., Hubis, F.A., Gurel, N.M., Li, B., Zhang, C., Spanos, C., Song, D., 2019. Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endow. 12, 1610–1623. https://doi.org/10.14778/3342263.3342637 ↩
-
Jia, R., Dao, D., Wang, B., Hubis, F.A., Hynes, N., Gürel, N.M., Li, B., Zhang, C., Song, D., Spanos, C.J., 2019. Towards Efficient Data Valuation Based on the Shapley Value, in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. Presented at the International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, pp. 1167–1176. ↩
-
Kwon, Y., Zou, J., 2023. Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value, in: Proceedings of the 40th International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 18135–18152. ↩
-
Koh, P.W., Liang, P., 2017. Understanding Black-box Predictions via Influence Functions, in: Proceedings of the 34th International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 1885–1894. ↩↩
-
Agarwal, N., Bullins, B., Hazan, E., 2017. Second-Order Stochastic Optimization for Machine Learning in Linear Time. JMLR 18, 1–40. ↩
-
Schioppa, A., Zablotskaia, P., Vilar, D., Sokolov, A., 2022. Scaling Up Influence Functions. Proc. AAAI Conf. Artif. Intell. 36, 8179–8186. https://doi.org/10.1609/aaai.v36i8.20791 ↩
-
George, T., Laurent, C., Bouthillier, X., Ballas, N., Vincent, P., 2018. Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis, in: Advances in Neural Information Processing Systems. Curran Associates, Inc. ↩
-
Martens, J., Grosse, R., 2015. Optimizing Neural Networks with Kronecker-factored Approximate Curvature, in: Proceedings of the 32nd International Conference on Machine Learning. Presented at the International Conference on Machine Learning, PMLR, pp. 2408–2417. ↩
-
Hataya, R., Yamada, M., 2023. Nyström Method for Accurate and Scalable Implicit Differentiation, in: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics. Presented at the International Conference on Artificial Intelligence and Statistics, PMLR, pp. 4643–4654. ↩