draw_strat_sample#

mbgdml._gdml.sample.draw_strat_sample(T, n, excl_idxs=None)[source]#

Draw sample from dataset that preserves its original distribution.

The distribution is estimated from a histogram were the bin size is determined using the Freedman-Diaconis rule. This rule is designed to minimize the difference between the area under the empirical probability distribution and the area under the theoretical probability distribution. A reduced histogram is then constructed by sampling uniformly in each bin. It is intended to populate all bins with at least one sample in the reduced histogram, even for small training sizes.

Parameters:
  • T (numpy.ndarray) – Dataset to sample from.

  • n (int) – Number of examples.

  • excl_idxs (numpy.ndarray, optional) – Array of indices to exclude from sample.

Returns:

Array of indices that form the sample.

Return type:

numpy.ndarray