Skip to content

pydvl.utils.config

ParallelConfig dataclass

ParallelConfig(
    backend: Literal["joblib", "ray"] = "joblib",
    address: Optional[Union[str, Tuple[str, int]]] = None,
    n_cpus_local: Optional[int] = None,
    logging_level: Optional[int] = None,
    wait_timeout: float = 1.0,
)

Configuration for parallel computation backend.

PARAMETER DESCRIPTION
backend

Type of backend to use. Defaults to 'joblib'

TYPE: Literal['joblib', 'ray'] DEFAULT: 'joblib'

address

(DEPRECATED) Address of existing remote or local cluster to use.

TYPE: Optional[Union[str, Tuple[str, int]]] DEFAULT: None

n_cpus_local

(DEPRECATED) Number of CPUs to use when creating a local ray cluster. This has no effect when using an existing ray cluster.

TYPE: Optional[int] DEFAULT: None

logging_level

(DEPRECATED) Logging level for the parallel backend's worker.

TYPE: Optional[int] DEFAULT: None

wait_timeout

(DEPRECATED) Timeout in seconds for waiting on futures.

TYPE: float DEFAULT: 1.0

CachedFuncConfig dataclass

CachedFuncConfig(
    hash_prefix: Optional[str] = None,
    ignore_args: Collection[str] = list(),
    time_threshold: float = 0.3,
    allow_repeated_evaluations: bool = False,
    rtol_stderr: float = 0.1,
    min_repetitions: int = 3,
)

Configuration for cached functions and methods, providing memoization of function calls.

Instances of this class are typically used as arguments for the construction of a Utility.

PARAMETER DESCRIPTION
hash_prefix

Optional string prefix that be prepended to the cache key. This can be provided in order to guarantee cache reuse across runs.

TYPE: Optional[str] DEFAULT: None

ignore_args

Do not take these keyword arguments into account when hashing the wrapped function for usage as key. This allows sharing the cache among different jobs for the same experiment run if the callable happens to have "nuisance" parameters like job_id which do not affect the result of the computation.

TYPE: Collection[str] DEFAULT: list()

time_threshold

Computations taking less time than this many seconds are not cached. A value of 0 means that it will always cache results.

TYPE: float DEFAULT: 0.3

allow_repeated_evaluations

If True, repeated calls to a function with the same arguments will be allowed and outputs averaged until the running standard deviation of the mean stabilizes below rtol_stderr * mean.

TYPE: bool DEFAULT: False

rtol_stderr

relative tolerance for repeated evaluations. More precisely, memcached() will stop evaluating the function once the standard deviation of the mean is smaller than rtol_stderr * mean.

TYPE: float DEFAULT: 0.1

min_repetitions

minimum number of times that a function evaluation on the same arguments is repeated before returning cached values. Useful for stochastic functions only. If the model training is very noisy, set this number to higher values to reduce variance.

TYPE: int DEFAULT: 3