πŸ“– API#

ddp_max_oracle(max_oracle,Β losses[,Β src_device])

Take any existing maximization oracle and apply it to multiple devices using a gather-scatter implementation within the data distributed parallel (DDP) framework.

l2_centered_isotonic_regression(losses,Β spectrum)

Solution to the isotonic regression problem when using the centered l2 loss.

neg_entropy_centered_isotonic_regression(...)

Solution to the isotonic regression problem when using the centered negative entropy loss.

make_esrm_spectrum(batch_size,Β risk_param)

Create a spectrum based on the exponential spectral risk measure (ESRM) for n samples.

make_extremile_spectrum(batch_size,Β n_draws)

Create a spectrum based on the extremile for n samples.

make_spectral_risk_measure(spectrum[,Β ...])

Create a function which computes the sample weights from a vector of losses when using a spectral risk measure ambiguity set.

make_superquantile_spectrum(batch_size,Β ...)

Create a spectrum based on the superquantile (or conditional value-at-risk) for n samples.

spectral_risk_measure_maximization_oracle(...)

Maximization oracle to compute the sample weights based on a particular spectral risk measure objective.

Create risk measure#

deshift.make_spectral_risk_measure(spectrum: ndarray, penalty: str = 'chi2', shift_cost: float = 0.0)#

Create a function which computes the sample weights from a vector of losses when using a spectral risk measure ambiguity set.

Parameters:
  • spectrum – a Numpy array containing the spectrum weights, which should be the same length as the batch size.

  • penalty – either β€˜chi2’ or β€˜kl’ indicating which f-divergence to use as the dual regularizer.

  • shift_cost – the non-negative dual regularization parameter.

  • group_dist

Returns:

compute_sample_weight

a function that maps n losses to a vector of n weights on each training example.

deshift.spectral_risk_measure_maximization_oracle(spectrum: ndarray, shift_cost: float, penalty: str, losses: ndarray)#

Maximization oracle to compute the sample weights based on a particular spectral risk measure objective.

Parameters:
  • spectrum – a Numpy array containing the spectrum weights, which should be the same length as the batch size.

  • shift_cost – a non-negative dual regularization parameter.

  • penalty – either chi2 or kl indicating which f-divergence to use as the dual regularizer.

  • losses – a Numpy array containing the loss incurred by the model on each example in the batch.

Returns:

sample_weight

a vector of n weights on each training example.

Dual maximization oracles#

Pool adjacent violator algorithm#

deshift.l2_centered_isotonic_regression(losses: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]])#

Solution to the isotonic regression problem when using the centered l2 loss.

Parameters:
  • spectrum – a Numpy array containing the spectrum weights, which should be the same length as the batch size.

  • losses – a Numpy array containing the loss on each example in the batch. These are the labels for isotonic regression.

Returns:

sample_weight

a set of n weights on each training example in the batch.

deshift.neg_entropy_centered_isotonic_regression(losses: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]])#

Solution to the isotonic regression problem when using the centered negative entropy loss.

Parameters:
  • spectrum – a Numpy array containing the spectrum weights, which should be the same length as the batch size.

  • losses – a Numpy array containing the loss on each example in the batch. These are the labels for isotonic regression.

Returns:

sample_weight

a set of n weights on each training example in the batch.

Spectrums#

Extremile#

deshift.make_extremile_spectrum(batch_size: int, n_draws: float)#

Create a spectrum based on the extremile for n samples.

The spectrum is chosen so that the expectation of the loss vector under this spectrum equals the uniform expected maximum of n_draws elements from the loss vector.

See [Dauoia (2019)](https://www.tandfonline.com/doi/full/10.1080/01621459.2018.1498348) for more information.

Parameters:
  • batch_size – the batch size.

  • n_draws – the number of independent draws from the loss vector. It can be fractional.

Returns:

spectrum

a sorted vector of n weights on each training example.

Superquantile#

deshift.make_superquantile_spectrum(batch_size: int, tail_prob: float)#

Create a spectrum based on the superquantile (or conditional value-at-risk) for n samples.

Parameters:
  • batch_size – the batch size.

  • tail_prob – the proportion of largest elements to keep in the loss computation, i.e. k/n for the top-k loss.

Returns:

spectrum

a sorted vector of n weights on each training example.

Exponential spectral risk measure#

deshift.make_esrm_spectrum(batch_size: int, risk_param: float)#

Create a spectrum based on the exponential spectral risk measure (ESRM) for n samples.

See [Cotter (2006)](https://www.sciencedirect.com/science/article/pii/S0378426606001373) for more information.

Parameters:
  • batch_size – the batch size.

  • risk_param – The R parameter from Cotter (2006).

Returns:

spectrum

a sorted vector of n weights on each training example.

Distributed computations#

deshift.ddp_max_oracle(max_oracle, losses, src_device=0)#

Take any existing maximization oracle and apply it to multiple devices using a gather-scatter implementation within the data distributed parallel (DDP) framework. Assumes that process rank is discoverable, e.g. the job is run using torchrun.

Parameters:
  • max_oracle – a function that consumes n (full-batch size) loss values and returns n weights (where n == micro_size * n_gpus)

  • losses – a PyTorch tensor of micro_size losses

Returns:

weights

a vector of weights of size len(losses) indicating the weight on each example