π API#
|
Take any existing maximization oracle and apply it to multiple devices using a gather-scatter implementation within the data distributed parallel (DDP) framework. |
|
Solution to the isotonic regression problem when using the centered l2 loss. |
Solution to the isotonic regression problem when using the centered negative entropy loss. |
|
|
Create a spectrum based on the exponential spectral risk measure (ESRM) for |
|
Create a spectrum based on the extremile for |
|
Create a function which computes the sample weights from a vector of losses when using a spectral risk measure ambiguity set. |
|
Create a spectrum based on the superquantile (or conditional value-at-risk) for |
Maximization oracle to compute the sample weights based on a particular spectral risk measure objective. |
Create risk measure#
- deshift.make_spectral_risk_measure(spectrum: ndarray, penalty: str = 'chi2', shift_cost: float = 0.0)#
Create a function which computes the sample weights from a vector of losses when using a spectral risk measure ambiguity set.
- Parameters:
spectrum β a Numpy array containing the spectrum weights, which should be the same length as the batch size.
penalty β either βchi2β or βklβ indicating which f-divergence to use as the dual regularizer.
shift_cost β the non-negative dual regularization parameter.
group_dist
- Returns:
- compute_sample_weight
a function that maps
n
losses to a vector ofn
weights on each training example.
- deshift.spectral_risk_measure_maximization_oracle(spectrum: ndarray, shift_cost: float, penalty: str, losses: ndarray)#
Maximization oracle to compute the sample weights based on a particular spectral risk measure objective.
- Parameters:
spectrum β a Numpy array containing the spectrum weights, which should be the same length as the batch size.
shift_cost β a non-negative dual regularization parameter.
penalty β either
chi2
orkl
indicating which f-divergence to use as the dual regularizer.losses β a Numpy array containing the loss incurred by the model on each example in the batch.
- Returns:
- sample_weight
a vector of
n
weights on each training example.
Dual maximization oracles#
Pool adjacent violator algorithm#
- deshift.l2_centered_isotonic_regression(losses: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]])#
Solution to the isotonic regression problem when using the centered l2 loss.
- Parameters:
spectrum β a Numpy array containing the spectrum weights, which should be the same length as the batch size.
losses β a Numpy array containing the loss on each example in the batch. These are the labels for isotonic regression.
- Returns:
- sample_weight
a set of
n
weights on each training example in the batch.
- deshift.neg_entropy_centered_isotonic_regression(losses: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]])#
Solution to the isotonic regression problem when using the centered negative entropy loss.
- Parameters:
spectrum β a Numpy array containing the spectrum weights, which should be the same length as the batch size.
losses β a Numpy array containing the loss on each example in the batch. These are the labels for isotonic regression.
- Returns:
- sample_weight
a set of
n
weights on each training example in the batch.
Spectrums#
Extremile#
- deshift.make_extremile_spectrum(batch_size: int, n_draws: float)#
Create a spectrum based on the extremile for
n
samples.The spectrum is chosen so that the expectation of the loss vector under this spectrum equals the uniform expected maximum of
n_draws
elements from the loss vector.See [Dauoia (2019)](https://www.tandfonline.com/doi/full/10.1080/01621459.2018.1498348) for more information.
- Parameters:
batch_size β the batch size.
n_draws β the number of independent draws from the loss vector. It can be fractional.
- Returns:
- spectrum
a sorted vector of
n
weights on each training example.
Superquantile#
- deshift.make_superquantile_spectrum(batch_size: int, tail_prob: float)#
Create a spectrum based on the superquantile (or conditional value-at-risk) for
n
samples.- Parameters:
batch_size β the batch size.
tail_prob β the proportion of largest elements to keep in the loss computation, i.e.
k/n
for the top-k loss.
- Returns:
- spectrum
a sorted vector of
n
weights on each training example.
Exponential spectral risk measure#
- deshift.make_esrm_spectrum(batch_size: int, risk_param: float)#
Create a spectrum based on the exponential spectral risk measure (ESRM) for
n
samples.See [Cotter (2006)](https://www.sciencedirect.com/science/article/pii/S0378426606001373) for more information.
- Parameters:
batch_size β the batch size.
risk_param β The
R
parameter from Cotter (2006).
- Returns:
- spectrum
a sorted vector of
n
weights on each training example.
Distributed computations#
- deshift.ddp_max_oracle(max_oracle, losses, src_device=0)#
Take any existing maximization oracle and apply it to multiple devices using a gather-scatter implementation within the data distributed parallel (DDP) framework. Assumes that process rank is discoverable, e.g. the job is run using torchrun.
- Parameters:
max_oracle β a function that consumes
n
(full-batch size) loss values and returnsn
weights (wheren == micro_size * n_gpus
)losses β a PyTorch tensor of
micro_size
losses
- Returns:
- weights
a vector of weights of size
len(losses)
indicating the weight on each example