Data Providers¶
All data consumed by Hebel models must be provided in the form of
DataProvider objects. DataProviders are classes that provide
iterators which return batches for training. By writing custom
DataProviders`, this creates a lot of flexibility about where data
can come from and enables any sort of pre-processing on the data. For
example, a user could write a DataProvider that receives data from
the internet or through a pipe from a different process. Or, when
working with text data, a user may define a custom DataProvider to
perform tokenization and stemming on the text before returning it.
A DataProvider is defined by subclassing the
hebel.data_provider.DataProvider class and must implement at
a minimum the special methods __iter__ and next.
Abstract Base Class¶
-
class
hebel.data_providers.DataProvider(data, targets, batch_size)¶ This is the abstract base class for
DataProviderobjects. Subclass this class to implement a custom design. At a minimum you must provide implementations of thenextmethod.
Minibatch Data Provider¶
-
class
hebel.data_providers.MiniBatchDataProvider(data, targets, batch_size)¶ This is the standard
DataProviderfor mini-batch learning with stochastic gradient descent.Input and target data may either be provided as
numpy.arrayobjects, or aspycuda.GPUArrayobjects. The latter is preferred if the data can fit on GPU memory and will be much faster, as the data won’t have to be transferred to the GPU for every minibatch. If the data is provided as anumpy.array, then every minibatch is automatically converted to to apycuda.GPUArrayand transferred to the GPU.Parameters: - data – Input data.
- targets – Target data.
- batch_size – The size of mini-batches.
Multi-Task Data Provider¶
-
class
hebel.data_providers.MultiTaskDataProvider(data, targets, batch_size=None)¶ DataProviderfor multi-task learning that uses the same training data for multiple targets.This
DataProvideris similar to thehebel.data_provider.MiniBatchDataProvider, except that it has not one but multiple targets.Parameters: - data – Input data.
- targets – Multiple targets as a list or tuple.
- batch_size – The size of mini-batches.
See also:
hebel.models.MultitaskNeuralNet,hebel.layers.MultitaskTopLayer
Batch Data Provider¶
-
class
hebel.data_providers.BatchDataProvider(data, targets)¶ DataProviderfor batch learning. Always returns the full data set.Parameters: - data – Input data.
- targets – Target data.
See also:
Dummy Data Provider¶
-
class
hebel.data_providers.DummyDataProvider(*args, **kwargs)¶ A dummy
DataProviderthat does not store any data and always returnsNone.
MNIST Data Provider¶
-
class
hebel.data_providers.MNISTDataProvider(array, batch_size=None)¶ DataProviderthat automatically provides data from the MNIST data set of hand-written digits.Depends on the skdata package.
Parameters: - array – {‘train’, ‘val’, ‘test’} Whether to use the official training, validation, or test data split of MNIST.
- batch_size – The size of mini-batches.