Layers

Hidden Layer

class hebel.layers.HiddenLayer(n_in, n_units, activation_function='sigmoid', dropout=0.0, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None)

A fully connected hidden layer.

The HiddenLayer class represents a fully connected hidden layer that can use a multitude of activation functions and supports dropout, L1, and L2 regularization.

Parameters:

n_in : integer
Number of input units.
n_out : integer
Number of hidden units.
activation_function : {sigmoid, tanh, relu, linear}, optional
Which activation function to use. Default is sigmoid.
dropout : float in [0, 1)
Probability of dropping out each hidden unit during training. Default is 0.
parameters : array_like of GPUArray
Parameters used to initialize the layer. If this is omitted, then the weights are initalized randomly using Bengio’s rule (uniform distribution with scale \(4 \cdot \sqrt{6 / (\mathtt{n\_in} + \mathtt{n\_out})}\) if using sigmoid activations and \(\sqrt{6 / (\mathtt{n\_in} + \mathtt{n\_out})}\) if using tanh, relu, or linear activations) and the biases are initialized to zero. If parameters is given, then is must be in the form [weights, biases], where the shape of weights is (n_in, n_out) and the shape of biases is (n_out,). Both weights and biases must be GPUArray.
weights_scale : float, optional
If parameters is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule.
l1_penalty_weight : float, optional
Weight used for L1 regularization of the weights.
l2_penalty_weight : float, optional
Weight used for L2 regularization of the weights.
lr_multiplier : float, optional
If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.

Examples:

# Use the simple initializer and initialize with random weights
hidden_layer = HiddenLayer(500, 10000)

# Sample weights yourself, specify an L1 penalty, and don't
# use learning rate scaling
import numpy as np
from pycuda import gpuarray

n_in = 500
n_out = 1000
weights = gpuarray.to_gpu(.01 * np.random.randn(n_in, n_out))
biases = gpuarray.to_gpu(np.zeros((n_out,)))
hidden_layer = HiddenLayer(n_in, n_out,
                           parameters=(weights, biases),
                           l1_penalty_weight=.1,
                           lr_multiplier=1.)
architecture

Returns a dictionary describing the architecture of the layer.

backprop(input_data, df_output, cache=None)

Backpropagate through the hidden layer

Parameters:

input_data : GPUArray
Input data to compute activations for.
df_output : GPUArray
Gradients with respect to the activations of this layer (received from the layer above).
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.

Returns:

gradients : tuple of GPUArray
Gradients with respect to the weights and biases in the form (df_weights, df_biases).
df_input : GPUArray
Gradients with respect to the input.
feed_forward(input_data, prediction=False)

Propagate forward through the layer

Parameters:

input_data : GPUArray
Input data to compute activations for.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns:

activations : GPUArray
The activations of the hidden units.
parameters

Return a tuple (weights, biases)

class hebel.layers.InputDropout(n_in, dropout_probability=0.2, compute_input_gradients=False)

This layer performs dropout on the input data.

It does not have any learnable parameters of its own. It should be used as the first layer and will perform dropout with any dropout probability on the incoming data.

Parameters:

n_in : integer
Number of input units.
dropout_probability : float in [0, 1)
Probability of dropping out each input during training. Default is 0.2.
compute_input_gradients : Bool
Whether to compute the gradients with respect to the input data. This only necessary if you’re training a model where the input itself is learned.
backprop(input_data, df_output, cache=None)

Backpropagate through the hidden layer

Parameters:

input_data : GPUArray
Inpute data to perform dropout on.
df_output : GPUArray
Gradients with respect to the output of this layer (received from the layer above).
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.

Returns:

gradients : empty tuple
Gradients are empty since this layer has no parameters.
df_input : GPUArray
Gradients with respect to the input.
feed_forward(input_data, prediction=False)

Propagate forward through the layer

Parameters:

input_data : GPUArray
Inpute data to perform dropout on.
prediction : bool, optional
Whether to use prediction model. If true, then the data is scaled by 1 - dropout_probability uses dropout.

Returns:

dropout_data : GPUArray
The data after performing dropout.
class hebel.layers.DummyLayer(n_in)

This class has no hidden units and simply passes through its input

Top Layers

Abstract Base Class Top Layer

class hebel.layers.TopLayer(n_in, n_units, activation_function='sigmoid', dropout=0.0, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None)

Abstract base class for a top-level layer.

Logistic Layer

class hebel.layers.LogisticLayer(n_in, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None, test_error_fct='class_error')

A logistic classification layer for two classes, using cross-entropy loss function and sigmoid activations.

Parameters:

n_in : integer
Number of input units.
parameters : array_like of GPUArray
Parameters used to initialize the layer. If this is omitted, then the weights are initalized randomly using Bengio’s rule (uniform distribution with scale \(4 \cdot \sqrt{6 / (\mathtt{n\_in} + \mathtt{n\_out})}\)) and the biases are initialized to zero. If parameters is given, then is must be in the form [weights, biases], where the shape of weights is (n_in, n_out) and the shape of biases is (n_out,). Both weights and biases must be GPUArray.
weights_scale : float, optional
If parameters is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule.
l1_penalty_weight : float, optional
Weight used for L1 regularization of the weights.
l2_penalty_weight : float, optional
Weight used for L2 regularization of the weights.
lr_multiplier : float, optional
If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.
test_error_fct : {class_error, kl_error, cross_entropy_error}, optional
Which error function to use on the test set. Default is class_error for classification error. Other choices are kl_error, the Kullback-Leibler divergence, or cross_entropy_error.

See also:

hebel.layers.SoftmaxLayer, hebel.models.NeuralNet, hebel.models.NeuralNetRegression, hebel.layers.LinearRegressionLayer

Examples:

# Use the simple initializer and initialize with random weights
logistic_layer = LogisticLayer(1000)

# Sample weights yourself, specify an L1 penalty, and don't
# use learning rate scaling
import numpy as np
from pycuda import gpuarray

n_in = 1000
weights = gpuarray.to_gpu(.01 * np.random.randn(n_in, 1))
biases = gpuarray.to_gpu(np.zeros((1,)))
softmax_layer = SoftmaxLayer(n_in,
                             parameters=(weights, biases),
                             l1_penalty_weight=.1,
                             lr_multiplier=1.)
backprop(input_data, targets, cache=None)

Backpropagate through the logistic layer.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
targets : GPUArray
The target values of the units.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.

Returns:

gradients : tuple of GPUArray
Gradients with respect to the weights and biases in the form (df_weights, df_biases).
df_input : GPUArray
Gradients with respect to the input.
class_error(input_data, targets, average=True, cache=None, prediction=False)

Return the classification error rate

cross_entropy_error(input_data, targets, average=True, cache=None, prediction=False)

Return the cross entropy error

feed_forward(input_data, prediction=False)

Propagate forward through the layer.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns:

activations : GPUArray
The activations of the output units.
test_error(input_data, targets, average=True, cache=None, prediction=True)

Compute the test error function given some data and targets.

Uses the error function defined in SoftmaxLayer.test_error_fct, which may be different from the cross-entropy error function used for training’. Alternatively, the other test error functions may be called directly.

Parameters:

input_data : GPUArray
Inpute data to compute the test error function for.
targets : GPUArray
The target values of the units.
average : bool
Whether to divide the value of the error function by the number of data points given.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns: test_error : float

train_error(input_data, targets, average=True, cache=None, prediction=False)

Return the cross entropy error

Softmax Layer

class hebel.layers.SoftmaxLayer(n_in, n_out, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None, test_error_fct='class_error')

A multiclass classification layer, using cross-entropy loss function and softmax activations.

Parameters:

n_in : integer
Number of input units.
n_out : integer
Number of output units (classes).
parameters : array_like of GPUArray
Parameters used to initialize the layer. If this is omitted, then the weights are initalized randomly using Bengio’s rule (uniform distribution with scale \(4 \cdot \sqrt{6 / (\mathtt{n\_in} + \mathtt{n\_out})}\)) and the biases are initialized to zero. If parameters is given, then is must be in the form [weights, biases], where the shape of weights is (n_in, n_out) and the shape of biases is (n_out,). Both weights and biases must be GPUArray.
weights_scale : float, optional
If parameters is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule.
l1_penalty_weight : float, optional
Weight used for L1 regularization of the weights.
l2_penalty_weight : float, optional
Weight used for L2 regularization of the weights.
lr_multiplier : float, optional
If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.
test_error_fct : {class_error, kl_error, cross_entropy_error}, optional
Which error function to use on the test set. Default is class_error for classification error. Other choices are kl_error, the Kullback-Leibler divergence, or cross_entropy_error.

See also:

hebel.layers.LogisticLayer, hebel.models.NeuralNet, hebel.models.NeuralNetRegression, hebel.layers.LinearRegressionLayer

Examples:

# Use the simple initializer and initialize with random weights
softmax_layer = SoftmaxLayer(1000, 10)

# Sample weights yourself, specify an L1 penalty, and don't
# use learning rate scaling
import numpy as np
from pycuda import gpuarray

n_in = 1000
n_out = 10
weights = gpuarray.to_gpu(.01 * np.random.randn(n_in, n_out))
biases = gpuarray.to_gpu(np.zeros((n_out,)))
softmax_layer = SoftmaxLayer(n_in, n_out,
                               parameters=(weights, biases),
                               l1_penalty_weight=.1,
                               lr_multiplier=1.)
backprop(input_data, targets, cache=None)

Backpropagate through the logistic layer.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
targets : GPUArray
The target values of the units.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.

Returns:

gradients : tuple of GPUArray
Gradients with respect to the weights and biases in the form (df_weights, df_biases).
df_input : GPUArray
Gradients with respect to the input.
class_error(input_data, targets, average=True, cache=None, prediction=False)

Return the classification error rate

cross_entropy_error(input_data, targets, average=True, cache=None, prediction=False)

Return the cross entropy error

feed_forward(input_data, prediction=False)

Propagate forward through the layer.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns:

activations : GPUArray
The activations of the output units.
kl_error(input_data, targets, average=True, cache=None, prediction=True)

The KL divergence error

test_error(input_data, targets, average=True, cache=None, prediction=True)

Compute the test error function given some data and targets.

Uses the error function defined in SoftmaxLayer.test_error_fct, which may be different from the cross-entropy error function used for training’. Alternatively, the other test error functions may be called directly.

Parameters:

input_data : GPUArray
Inpute data to compute the test error function for.
targets : GPUArray
The target values of the units.
average : bool
Whether to divide the value of the error function by the number of data points given.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns: test_error : float

train_error(input_data, targets, average=True, cache=None, prediction=False)

Return the cross entropy error

Linear Regression Layer

class hebel.layers.LinearRegressionLayer(n_in, n_out, parameters=None, weights_scale=None, l1_penalty_weight=0.0, l2_penalty_weight=0.0, lr_multiplier=None)

Linear regression layer with linear outputs and squared loss error function.

Parameters:
n_in : integer
Number of input units.
n_out : integer
Number of output units (classes).
parameters : array_like of GPUArray
Parameters used to initialize the layer. If this is omitted, then the weights are initalized randomly using Bengio’s rule (uniform distribution with scale \(4 \cdot \sqrt{6 / (\mathtt{n\_in} + \mathtt{n\_out})}\)) and the biases are initialized to zero. If parameters is given, then is must be in the form [weights, biases], where the shape of weights is (n_in, n_out) and the shape of biases is (n_out,). Both weights and biases must be GPUArray.
weights_scale : float, optional
If parameters is omitted, then this factor is used as scale for initializing the weights instead of Bengio’s rule.
l1_penalty_weight : float, optional
Weight used for L1 regularization of the weights.
l2_penalty_weight : float, optional
Weight used for L2 regularization of the weights.
lr_multiplier : float, optional
If this parameter is omitted, then the learning rate for the layer is scaled by \(2 / \sqrt{\mathtt{n\_in}}\). You may specify a different factor here.
test_error_fct : {class_error, kl_error, cross_entropy_error}, optional
Which error function to use on the test set. Default is class_error for classification error. Other choices are kl_error, the Kullback-Leibler divergence, or cross_entropy_error.

See also:

hebel.models.NeuralNetRegression, hebel.models.NeuralNet, hebel.layers.LogisticLayer

feed_forward(input_data, prediction=False)

Propagate forward through the layer.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns:

activations : GPUArray
The activations of the output units.
test_error(input_data, targets, average=True, cache=None, prediction=True)

Compute the test error function given some data and targets.

Uses the error function defined in SoftmaxLayer.test_error_fct, which may be different from the cross-entropy error function used for training’. Alternatively, the other test error functions may be called directly.

Parameters:

input_data : GPUArray
Inpute data to compute the test error function for.
targets : GPUArray
The target values of the units.
average : bool
Whether to divide the value of the error function by the number of data points given.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns: test_error : float

Multitask Top Layer

class hebel.layers.MultitaskTopLayer(n_in=None, n_out=None, test_error_fct='class_error', l1_penalty_weight=0.0, l2_penalty_weight=0.0, tasks=None, task_weights=None, n_tasks=None, lr_multiplier=None)

Top layer for performing multi-task training.

This is a top layer that enables multi-task training, which can be thought of as training multiple models on the same data and sharing weights in all but the final layer. A MultitaskTopLayer has multiple layers as children that are subclasses of hebel.layers.TopLayer. During the forward pass, the input from the previous layer is passed on to all tasks and during backpropagation, the gradients are added together from the different tasks (with different weights if necessary).

There are two ways of initializing MultitaskTopLayer:

  1. By supplying n_in, n_out, and optionally n_tasks, which will initialize all tasks with hebel.layers.LogisticLayer. If n_tasks is given, n_out must be an integer and n_tasks identical tasks will be created. If n_out is an array_like, then as many tasks will be created as there are elements in n_out and n_tasks will be ignored.
  2. If tasks is supplied, then it must be an array_like of objects derived from hebel.layers.TopLayer, one object for each class. In this case n_in, n_out, and n_tasks will be ignored. The user must make sure that all tasks have their n_in member variable set to the same value.

Parameters:

n_in : integer, optional
Number of input units. Is ignored, when tasks is supplied.
n_out : integer or array_like, optional
Number of output units. May be an integer (all tasks get the same number of units; n_tasks must be given), or array_like (create as many tasks as elements in n_out with different sizes; n_tasks is ignored). Is always ignored when ``tasks is supplied.
test_error_fct : string, optional
See hebel.layers.LogisticLayer for options. Ignored when tasks is supplied.
l1_penalty_weight : float or list/tuple of floats, optional
Weight(s) for L1 regularization. Ignored when tasks is supplied.
l2_penalty_weight : float or list/tuple of floats, optional
Weight(s)for L2 regularization. Ignored when tasks is supplied.
tasks : list/tuple of hebel.layers.TopLayer objects, optional
Tasks for multitask learning. Overrides n_in, n_out, test_error_fct, l1_penalty_weight, l2_penalty_weight, n_tasks, and lr_multiplier.
task_weights : list/tuple of floats, optional
Weights to use when adding the gradients from the different tasks. Default is 1./self.n_tasks. The weights don’t need to necessarily add up to one.
n_tasks : integer, optional
Number of tasks. Ignored if n_out is a list, or tasks is supplied.
lr_multiplier : float or list/tuple of floats
A task dependant multiplier for the learning rate. If this is ignored, then the tasks default is used. It is ignored when tasks is supplied.

See also: hebel.layers.TopLayer, hebel.layers.LogisticLayer

Examples:

# Simple form of the constructor
# Creating five tasks with same number of classes
multitask_layer = MultitaskTopLayer(n_in=1000, n_out=10, n_tasks=5)

# Extended form of the constructor
# Initializing every task independently

n_in = 1000              # n_in must be the same for all tasks
tasks = (
    SoftmaxLayer(n_in, 10, l1_penalty_weight=.1),
    SoftmaxLayer(n_in, 15, l2_penalty_weight=.2),
    SoftmaxLayer(n_in, 10),
    SoftmaxLayer(n_in, 10),
    SoftmaxLayer(n_in, 20)
)
task_weights = [1./5, 1./10, 1./10, 2./5, 1./5]
multitask_layer = MultitaskTopLayer(tasks=tasks,
                                    task_weights=task_weights)
architecture

Returns a dictionary describing the architecture of the layer.

backprop(input_data, targets, cache=None)

Compute gradients for each task and combine the results.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
targets : GPUArray
The target values of the units.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.

Returns:

gradients : list
Gradients with respect to the weights and biases for each task
df_input : GPUArray
Gradients with respect to the input, obtained by adding the gradients with respect to the inputs from each task, weighted by MultitaskTopLayer.task_weights.
cross_entropy_error(input_data, targets, average=True, cache=None, prediction=False, sum_errors=True)

Computes the cross-entropy error for all tasks.

feed_forward(input_data, prediction=False)

Call feed_forward for each task and combine the activations.

Passes input_data to all tasks and returns the activations as a list.

Parameters:

input_data : GPUArray
Inpute data to compute activations for.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.

Returns:

activations : list of GPUArray
The activations of the output units, one element for each task.
l1_penalty

Compute the L1 penalty for all tasks.

l2_penalty

Compute the L2 penalty for all tasks.

parameters

Return a list where each element contains the parameters for a task.

test_error(input_data, targets, average=True, cache=None, prediction=False, sum_errors=True)

Compute the error function on a test data set.

Parameters:

input_data : GPUArray
Inpute data to compute the test error function for.
targets : GPUArray
The target values of the units.
average : bool
Whether to divide the value of the error function by the number of data points given.
cache : list of GPUArray
Cache obtained from forward pass. If the cache is provided, then the activations are not recalculated.
prediction : bool, optional
Whether to use prediction model. Only relevant when using dropout. If true, then weights are multiplied by 1 - dropout if the layer uses dropout.
sum_errors : bool, optional
Whether to add up the errors from the different tasks. If this option is chosen, the user must make sure that all tasks use the same test error function.

Returns:

test_error : float or list
Returns a float when sum_errors == True and a list with the individual errors otherwise.
train_error(input_data, targets, average=True, cache=None, prediction=False, sum_errors=True)

Computes the cross-entropy error for all tasks.