Layers¶
Top Layers¶
Abstract Base Class Top Layer¶
-
class
hebel.layers.
TopLayer
(n_in, n_units, activation_function='sigmoid', dropout=0.0, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None)¶ Abstract base class for a top-level layer.
Logistic Layer¶
-
class
hebel.layers.
LogisticLayer
(n_in, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None, test_error_fct='class_error')¶ A logistic classification layer for two classes, using cross-entropy loss function and sigmoid activations.
Parameters:
- n_in : integer
- Number of input units.
- parameters : array_like of
GPUArray
- Parameters used to initialize the layer. If this is omitted,
then the weights are initalized randomly using Bengio’s rule
(uniform distribution with scale \(4 \cdot \sqrt{6 /
(\mathtt{n\_in} + \mathtt{n\_out})}\)) and the biases are
initialized to zero. If
parameters
is given, then is must be in the form[weights, biases]
, where the shape of weights is(n_in, n_out)
and the shape ofbiases
is(n_out,)
. Both weights and biases must beGPUArray
. - weights_scale : float, optional
- If
parameters
is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule. - l1_penalty_weight : float, optional
- Weight used for L1 regularization of the weights.
- l2_penalty_weight : float, optional
- Weight used for L2 regularization of the weights.
- lr_multiplier : float, optional
- If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.
- test_error_fct : {
class_error
,kl_error
,cross_entropy_error
}, optional - Which error function to use on the test set. Default is
class_error
for classification error. Other choices arekl_error
, the Kullback-Leibler divergence, orcross_entropy_error
.
See also:
hebel.layers.SoftmaxLayer
,hebel.models.NeuralNet
,hebel.models.NeuralNetRegression
,hebel.layers.LinearRegressionLayer
Examples:
# Use the simple initializer and initialize with random weights logistic_layer = LogisticLayer(1000) # Sample weights yourself, specify an L1 penalty, and don't # use learning rate scaling import numpy as np from pycuda import gpuarray n_in = 1000 weights = gpuarray.to_gpu(.01 * np.random.randn(n_in, 1)) biases = gpuarray.to_gpu(np.zeros((1,))) softmax_layer = SoftmaxLayer(n_in, parameters=(weights, biases), l1_penalty_weight=.1, lr_multiplier=1.)
-
backprop
(input_data, targets, cache=None)¶ Backpropagate through the logistic layer.
Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- targets :
GPUArray
- The target values of the units.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
Returns:
- gradients : tuple of
GPUArray
- Gradients with respect to the weights and biases in the
form
(df_weights, df_biases)
. - df_input :
GPUArray
- Gradients with respect to the input.
- input_data :
-
class_error
(input_data, targets, average=True, cache=None, prediction=False)¶ Return the classification error rate
-
cross_entropy_error
(input_data, targets, average=True, cache=None, prediction=False)¶ Return the cross entropy error
-
feed_forward
(input_data, prediction=False)¶ Propagate forward through the layer.
Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns:
- activations :
GPUArray
- The activations of the output units.
- input_data :
-
test_error
(input_data, targets, average=True, cache=None, prediction=True)¶ Compute the test error function given some data and targets.
Uses the error function defined in
SoftmaxLayer.test_error_fct
, which may be different from the cross-entropy error function used for training’. Alternatively, the other test error functions may be called directly.Parameters:
- input_data :
GPUArray
- Inpute data to compute the test error function for.
- targets :
GPUArray
- The target values of the units.
- average : bool
- Whether to divide the value of the error function by the number of data points given.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns: test_error : float
- input_data :
-
train_error
(input_data, targets, average=True, cache=None, prediction=False)¶ Return the cross entropy error
Softmax Layer¶
-
class
hebel.layers.
SoftmaxLayer
(n_in, n_out, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None, test_error_fct='class_error')¶ A multiclass classification layer, using cross-entropy loss function and softmax activations.
Parameters:
- n_in : integer
- Number of input units.
- n_out : integer
- Number of output units (classes).
- parameters : array_like of
GPUArray
- Parameters used to initialize the layer. If this is omitted,
then the weights are initalized randomly using Bengio’s rule
(uniform distribution with scale \(4 \cdot \sqrt{6 /
(\mathtt{n\_in} + \mathtt{n\_out})}\)) and the biases are
initialized to zero. If
parameters
is given, then is must be in the form[weights, biases]
, where the shape of weights is(n_in, n_out)
and the shape ofbiases
is(n_out,)
. Both weights and biases must beGPUArray
. - weights_scale : float, optional
- If
parameters
is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule. - l1_penalty_weight : float, optional
- Weight used for L1 regularization of the weights.
- l2_penalty_weight : float, optional
- Weight used for L2 regularization of the weights.
- lr_multiplier : float, optional
- If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.
- test_error_fct : {
class_error
,kl_error
,cross_entropy_error
}, optional - Which error function to use on the test set. Default is
class_error
for classification error. Other choices arekl_error
, the Kullback-Leibler divergence, orcross_entropy_error
.
See also:
hebel.layers.LogisticLayer
,hebel.models.NeuralNet
,hebel.models.NeuralNetRegression
,hebel.layers.LinearRegressionLayer
Examples:
# Use the simple initializer and initialize with random weights softmax_layer = SoftmaxLayer(1000, 10) # Sample weights yourself, specify an L1 penalty, and don't # use learning rate scaling import numpy as np from pycuda import gpuarray n_in = 1000 n_out = 10 weights = gpuarray.to_gpu(.01 * np.random.randn(n_in, n_out)) biases = gpuarray.to_gpu(np.zeros((n_out,))) softmax_layer = SoftmaxLayer(n_in, n_out, parameters=(weights, biases), l1_penalty_weight=.1, lr_multiplier=1.)
-
backprop
(input_data, targets, cache=None)¶ Backpropagate through the logistic layer.
Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- targets :
GPUArray
- The target values of the units.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
Returns:
- gradients : tuple of
GPUArray
- Gradients with respect to the weights and biases in the
form
(df_weights, df_biases)
. - df_input :
GPUArray
- Gradients with respect to the input.
- input_data :
-
class_error
(input_data, targets, average=True, cache=None, prediction=False)¶ Return the classification error rate
-
cross_entropy_error
(input_data, targets, average=True, cache=None, prediction=False)¶ Return the cross entropy error
-
feed_forward
(input_data, prediction=False)¶ Propagate forward through the layer.
Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns:
- activations :
GPUArray
- The activations of the output units.
- input_data :
-
kl_error
(input_data, targets, average=True, cache=None, prediction=True)¶ The KL divergence error
-
test_error
(input_data, targets, average=True, cache=None, prediction=True)¶ Compute the test error function given some data and targets.
Uses the error function defined in
SoftmaxLayer.test_error_fct
, which may be different from the cross-entropy error function used for training’. Alternatively, the other test error functions may be called directly.Parameters:
- input_data :
GPUArray
- Inpute data to compute the test error function for.
- targets :
GPUArray
- The target values of the units.
- average : bool
- Whether to divide the value of the error function by the number of data points given.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns: test_error : float
- input_data :
-
train_error
(input_data, targets, average=True, cache=None, prediction=False)¶ Return the cross entropy error
Linear Regression Layer¶
-
class
hebel.layers.
LinearRegressionLayer
(n_in, n_out, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None)¶ Linear regression layer with linear outputs and squared loss error function.
Parameters:- n_in : integer
- Number of input units.
- n_out : integer
- Number of output units (classes).
- parameters : array_like of
GPUArray
- Parameters used to initialize the layer. If this is omitted,
then the weights are initalized randomly using Bengio’s rule
(uniform distribution with scale \(4 \cdot \sqrt{6 /
(\mathtt{n\_in} + \mathtt{n\_out})}\)) and the biases are
initialized to zero. If
parameters
is given, then is must be in the form[weights, biases]
, where the shape of weights is(n_in, n_out)
and the shape ofbiases
is(n_out,)
. Both weights and biases must beGPUArray
. - weights_scale : float, optional
- If
parameters
is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule. - l1_penalty_weight : float, optional
- Weight used for L1 regularization of the weights.
- l2_penalty_weight : float, optional
- Weight used for L2 regularization of the weights.
- lr_multiplier : float, optional
- If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.
- test_error_fct : {
class_error
,kl_error
,cross_entropy_error
}, optional - Which error function to use on the test set. Default is
class_error
for classification error. Other choices arekl_error
, the Kullback-Leibler divergence, orcross_entropy_error
.
See also:
hebel.models.NeuralNetRegression
,hebel.models.NeuralNet
,hebel.layers.LogisticLayer
-
feed_forward
(input_data, prediction=False)¶ Propagate forward through the layer.
Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns:
- activations :
GPUArray
- The activations of the output units.
- input_data :
-
test_error
(input_data, targets, average=True, cache=None, prediction=True)¶ Compute the test error function given some data and targets.
Uses the error function defined in
SoftmaxLayer.test_error_fct
, which may be different from the cross-entropy error function used for training’. Alternatively, the other test error functions may be called directly.Parameters:
- input_data :
GPUArray
- Inpute data to compute the test error function for.
- targets :
GPUArray
- The target values of the units.
- average : bool
- Whether to divide the value of the error function by the number of data points given.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns: test_error : float
- input_data :
Multitask Top Layer¶
-
class
hebel.layers.
MultitaskTopLayer
(n_in=None, n_out=None, test_error_fct='class_error', l1_penalty_weight=0.0, l2_penalty_weight=0.0, tasks=None, task_weights=None, n_tasks=None, lr_multiplier=None)¶ Top layer for performing multi-task training.
This is a top layer that enables multi-task training, which can be thought of as training multiple models on the same data and sharing weights in all but the final layer. A
MultitaskTopLayer
has multiple layers as children that are subclasses ofhebel.layers.TopLayer
. During the forward pass, the input from the previous layer is passed on to all tasks and during backpropagation, the gradients are added together from the different tasks (with different weights if necessary).There are two ways of initializing
MultitaskTopLayer
:- By supplying
n_in
,n_out
, and optionallyn_tasks
, which will initialize all tasks withhebel.layers.LogisticLayer
. Ifn_tasks
is given,n_out
must be an integer andn_tasks
identical tasks will be created. Ifn_out
is anarray_like
, then as many tasks will be created as there are elements inn_out
andn_tasks
will be ignored. - If
tasks
is supplied, then it must be anarray_like
of objects derived fromhebel.layers.TopLayer
, one object for each class. In this casen_in
,n_out
, andn_tasks
will be ignored. The user must make sure that all tasks have theirn_in
member variable set to the same value.
Parameters:
- n_in : integer, optional
- Number of input units. Is ignored, when
tasks
is supplied. - n_out : integer or array_like, optional
- Number of output units. May be an integer (all tasks get
the same number of units;
n_tasks
must be given), orarray_like
(create as many tasks as elements inn_out
with different sizes;n_tasks is ignored). Is always ignored when ``tasks
is supplied. - test_error_fct : string, optional
- See
hebel.layers.LogisticLayer
for options. Ignored whentasks
is supplied. - l1_penalty_weight : float or list/tuple of floats, optional
- Weight(s) for L1 regularization. Ignored when
tasks
is supplied. - l2_penalty_weight : float or list/tuple of floats, optional
- Weight(s)for L2 regularization. Ignored when
tasks
is supplied. - tasks : list/tuple of
hebel.layers.TopLayer
objects, optional - Tasks for multitask learning. Overrides
n_in
,n_out
,test_error_fct
,l1_penalty_weight
,l2_penalty_weight
,n_tasks
, andlr_multiplier
. - task_weights : list/tuple of floats, optional
- Weights to use when adding the gradients from the
different tasks. Default is
1./self.n_tasks
. The weights don’t need to necessarily add up to one. - n_tasks : integer, optional
- Number of tasks. Ignored if
n_out
is a list, ortasks
is supplied. - lr_multiplier : float or list/tuple of floats
- A task dependant multiplier for the learning rate. If this
is ignored, then the tasks default is used. It is ignored
when
tasks
is supplied.
See also:
hebel.layers.TopLayer
,hebel.layers.LogisticLayer
Examples:
# Simple form of the constructor # Creating five tasks with same number of classes multitask_layer = MultitaskTopLayer(n_in=1000, n_out=10, n_tasks=5) # Extended form of the constructor # Initializing every task independently n_in = 1000 # n_in must be the same for all tasks tasks = ( SoftmaxLayer(n_in, 10, l1_penalty_weight=.1), SoftmaxLayer(n_in, 15, l2_penalty_weight=.2), SoftmaxLayer(n_in, 10), SoftmaxLayer(n_in, 10), SoftmaxLayer(n_in, 20) ) task_weights = [1./5, 1./10, 1./10, 2./5, 1./5] multitask_layer = MultitaskTopLayer(tasks=tasks, task_weights=task_weights)
-
architecture
¶ Returns a dictionary describing the architecture of the layer.
-
backprop
(input_data, targets, cache=None)¶ Compute gradients for each task and combine the results.
Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- targets :
GPUArray
- The target values of the units.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
Returns:
- gradients : list
- Gradients with respect to the weights and biases for each task
- df_input :
GPUArray
- Gradients with respect to the input, obtained by adding
the gradients with respect to the inputs from each task,
weighted by
MultitaskTopLayer.task_weights
.
- input_data :
-
cross_entropy_error
(input_data, targets, average=True, cache=None, prediction=False, sum_errors=True)¶ Computes the cross-entropy error for all tasks.
-
feed_forward
(input_data, prediction=False)¶ Call
feed_forward
for each task and combine the activations.Passes
input_data
to all tasks and returns the activations as a list.Parameters:
- input_data :
GPUArray
- Inpute data to compute activations for.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
Returns:
- activations : list of
GPUArray
- The activations of the output units, one element for each task.
- input_data :
-
l1_penalty
¶ Compute the L1 penalty for all tasks.
-
l2_penalty
¶ Compute the L2 penalty for all tasks.
-
parameters
¶ Return a list where each element contains the parameters for a task.
-
test_error
(input_data, targets, average=True, cache=None, prediction=False, sum_errors=True)¶ Compute the error function on a test data set.
Parameters:
- input_data :
GPUArray
- Inpute data to compute the test error function for.
- targets :
GPUArray
- The target values of the units.
- average : bool
- Whether to divide the value of the error function by the number of data points given.
- cache : list of
GPUArray
- Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
- prediction : bool, optional
- Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
- sum_errors : bool, optional
- Whether to add up the errors from the different tasks. If this option is chosen, the user must make sure that all tasks use the same test error function.
Returns:
- test_error : float or list
- Returns a float when
sum_errors == True
and a list with the individual errors otherwise.
- input_data :
-
train_error
(input_data, targets, average=True, cache=None, prediction=False, sum_errors=True)¶ Computes the cross-entropy error for all tasks.
- By supplying