pydvl.utils.array
¶
This module contains utility functions for working with arrays in a type-agnostic way. It provides a consistent interface for operations on both NumPy arrays and PyTorch tensors.
The functions in this module are designed to:
- Detect array types automatically (numpy.ndarray or torch.Tensor)
- Perform operations using the appropriate library
- Preserve the input type in the output, except for functions intended to operate on indices, which always return NDArrays for convenience.
- Minimize unnecessary type conversions
Some examples
import numpy as np
import torch
from pydvl.utils.array import array_concatenate, is_tensor
# Type checking
is_tensor(x_torch) # Returns True
is_tensor(x_np) # Returns False
# Operations preserve types
result = array_concatenate([x_np, zeros_np]) # Returns numpy.ndarray
result = array_concatenate([x_torch, zeros_torch]) # Returns torch.Tensor
The module uses a TypeVar ArrayT
to ensure type preservation across functions,
allowing for proper static type checking with both array types.
Array
¶
Bases: Protocol[DT]
Protocol defining a common interface for NumPy arrays and PyTorch tensors.
This protocol defines the essential methods and properties required for array-like operations in PyDVL. It serves as a structural type for both numpy.ndarray and torch.Tensor, enabling type-safe generic functions that work with either type.
The generic parameter DT represents the data type of the array elements.
Type Preservation
Functions that accept Array types will generally preserve the input type in their outputs. For example, if you pass a torch.Tensor, you'll get a torch.Tensor back; if you pass a numpy.ndarray, you'll get a numpy.ndarray back.
Warning
This is a "best-effort" implementation that covers the methods and properties needed by PyDVL, but it is not a complete representation of all functionality in NumPy and PyTorch arrays.
array_concatenate
¶
Join a sequence of arrays along an existing axis.
PARAMETER | DESCRIPTION |
---|---|
arrays
|
Sequence of arrays. |
axis
|
Axis along which to concatenate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray | Tensor
|
Concatenated array of the same type as the inputs. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the input list is empty. |
Source code in src/pydvl/utils/array.py
array_count_nonzero
¶
Count the number of non-zero elements in the array.
PARAMETER | DESCRIPTION |
---|---|
x
|
Input array.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
Number of non-zero elements. |
Source code in src/pydvl/utils/array.py
array_nonzero
¶
Find the indices of non-zero elements.
PARAMETER | DESCRIPTION |
---|---|
x
|
Input array.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[int_]
|
Tuple of arrays, one for each dimension of x, |
...
|
containing the indices of the non-zero elements in that dimension. |
Source code in src/pydvl/utils/array.py
array_unique
¶
array_unique(
array: NDArray | Tensor, return_index: bool = False, **kwargs: Any
) -> Union[NDArray | Tensor, Tuple[NDArray | Tensor, NDArray]]
Return the unique elements in an array, optionally with indices of their first occurrences.
PARAMETER | DESCRIPTION |
---|---|
array
|
Input array. |
return_index
|
If True, also return the indices of the unique elements.
TYPE:
|
**kwargs
|
Extra keyword arguments for the underlying unique function.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[NDArray | Tensor, Tuple[NDArray | Tensor, NDArray]]
|
A unique set of elements, and optionally the indices (only for numpy arrays; |
Union[NDArray | Tensor, Tuple[NDArray | Tensor, NDArray]]
|
for torch tensors indices are computed manually). |
Source code in src/pydvl/utils/array.py
atleast1d
¶
Ensures that the input is at least 1D.
For scalar builtin types, the output is an NDArray. Scalar tensors are converted to 1D tensors
PARAMETER | DESCRIPTION |
---|---|
a
|
Input array-like object or a scalar. |
RETURNS | DESCRIPTION |
---|---|
NDArray | Tensor
|
The input, as a 1D structure. |
Source code in src/pydvl/utils/array.py
check_X_y
¶
check_X_y(
X: NDArray | Tensor,
y: NDArray | Tensor,
*,
multi_output: bool = False,
estimator: str | object | None = None,
copy: bool = False,
) -> Tuple[NDArray | Tensor, NDArray | Tensor]
Validate X and y mimicking the functionality of sklearn's check_X_y.
For torch tensors, delegates to check_X_y_torch.
PARAMETER | DESCRIPTION |
---|---|
X
|
Input data (at least 2D). |
y
|
Target values (1D for single-output or 2D for multi-output if enabled). |
multi_output
|
Whether multi-output targets are allowed.
TYPE:
|
estimator
|
The name or instance of the estimator (used in error messages). |
copy
|
If True, a copy of the arrays is made.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[NDArray | Tensor, NDArray | Tensor]
|
A tuple (X_converted, y_converted). |
Source code in src/pydvl/utils/array.py
check_X_y_torch
¶
check_X_y_torch(
X: Tensor,
y: Tensor,
*,
multi_output: bool = False,
estimator: str | object | None = None,
copy: bool = False,
)
Validate torch tensors X and y similarly to sklearn's check_X_y.
PARAMETER | DESCRIPTION |
---|---|
X
|
Input tensor (at least 2D).
TYPE:
|
y
|
Target tensor (1D for single-output or 2D for multi-output if allowed).
TYPE:
|
multi_output
|
Whether multi-output targets are allowed.
TYPE:
|
estimator
|
Estimator name or instance (used in error messages). |
copy
|
If True, clones the inputs.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
A tuple (X_converted, y_converted). |
Source code in src/pydvl/utils/array.py
499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 |
|
is_categorical
¶
Check if an array contains categorical data (suitable for unique labels).
For numpy arrays, checks if the dtype.kind is in "OSUiub" (Object, String, Unicode, Unsigned integer, Signed integer, Boolean).
For torch tensors, checks if the dtype is an integer or boolean type.
PARAMETER | DESCRIPTION |
---|---|
x
|
Input array to check. |
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the array contains categorical data, False otherwise. |
Source code in src/pydvl/utils/array.py
is_numpy
¶
is_tensor
¶
stratified_split_indices
¶
stratified_split_indices(
y: ArrayT, train_size: float | int = 0.8, random_state: int | None = None
) -> Tuple[ArrayT, ArrayT]
Compute stratified train/test split indices based on labels.
PARAMETER | DESCRIPTION |
---|---|
y
|
Labels array (numpy array or torch tensor).
TYPE:
|
train_size
|
Fraction or absolute number of training samples. |
random_state
|
Random seed for reproducibility.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[ArrayT, ArrayT]
|
A tuple (train_indices, test_indices) matching the type of y. |
Source code in src/pydvl/utils/array.py
to_numpy
¶
Convert array to a numpy.ndarray if it's not already.
PARAMETER | DESCRIPTION |
---|---|
array
|
Input array. |
RETURNS | DESCRIPTION |
---|---|
NDArray
|
A numpy.ndarray representation of the input. |
Source code in src/pydvl/utils/array.py
to_tensor
¶
Convert array to torch.Tensor if it's not already.
PARAMETER | DESCRIPTION |
---|---|
array
|
Input array. |
RETURNS | DESCRIPTION |
---|---|
Tensor
|
A torch.Tensor representation of the input. |
RAISES | DESCRIPTION |
---|---|
ImportError
|
If PyTorch is not available. |
Source code in src/pydvl/utils/array.py
try_torch_import
¶
try_torch_import(require: bool = False) -> ModuleType | None
Import torch if available, otherwise return None. Args: require: If True, raise an ImportError if torch is not available.