Welcome to Neural-Nets’s documentation!

class neuralnets.neural_nets.NeuralNet(n_neurons, activations='relu', learning_rate=0.005, n_epochs=200, batch_size=64, dropout=1, lambda_reg=0, init_strat=None, solver='sgd', beta_1=0.9, beta_2=0.999, seed=None, check_gradients=False, verbose=False)

A neural network model.

  • n_neurons (list) – list of H + 2 integers indicating the number of neurons in each layer, including input and output layers. First value is the number of features of a training example. Last value is either 1 (2, classes, logistic loss) or C > 1 (C > 2 classes, cross entropy), where C is the number of classes. In-between, the H values indicate the number of neurons of each of the H hidden layer.
  • activations (str or list of str) – The activation functions to use for each of the H hidden layers. Allowed values are ‘sigmoid’, ‘tanh’, ‘relu’ or ‘linear’ (i.e. no activation). If a str is given, then all activations are the same for each hidden layer. If a list of string is given, it must be of size H. Default is ‘relu’. Note: the activation function of the last layer is automatically inferred from the value of n_neurons[-1]: if the output layer size is 1 then a sigmoid is used, else it’s a softmax.
  • learning_rate (float) – The learning rate for gradient descent. Default is .005.
  • n_epochs (int) – The number of iteration of the gradient descent procedure, i.e. number of times the whole training set is gone through. Default is 200.
  • batch_size (int) – The batch size. If 0, the full trainset is used. Default is 64.
  • dropout (float) – Probability of keeping a neuron of the hidden layers in dropout. Default is 1, i.e. no dropout is applied.
  • lambda_reg (float) – The regularization constant. Default is 0, i.e. no regularization.
  • init_strat (str) – Initialization strategy for weights. Can be ‘He’ for ‘He’ initialization, recommended for relu layers. Default is None, which reverts to a centered normal distribution * 0.1.
  • solver (str) – Solver to use: either ‘sgd’ or ‘adam’ for SGD or... adam ;). Default is ‘sgd’.
  • beta_1 (float) – Exponential decay rate for first moment estimate (only used if solver is ‘adam’. Default is .9
  • beta_2 (float) – Exponential decay rate for second moment estimate (only used if solver is ‘adam’. Default is .999.
  • seed (int) – A random seed to use for the RNG at weights initialization. Default is None, i.e. no seeding is done.
  • check_gradients (bool) – Whether to check gradients at each iteration, for each parameter. It’s done with np.isclose() with default tolerance values. Default is False.
  • verbose (int) – if not False or 0, will print the loss every ‘verbose’ epochs. Default is False.

Note: The NeuralNet estimator is (roughly) compliant with scikit-learn API so the inputs X fit and predict is [n_entries, n_features], but internally we use X.T because it seems more convenient.

adam(dW, db)

Adam step.

backward(X, y, cache)

Backward pass. Returns gradients.

check_gradients(X, y, dW, db)

Do gradient checking for every single parameter. Raises an exception if computed gradients and estimated gradients are not close enough.

fit(X, y)

Fit model with input X[n_entries, n_features] and output y.


Forward pass. Returns output layer and intermediate values in cache which will be used during backprop.

get_batches(X, y)

Return a list of batches (X_b, yb) to train on.


Initialize activations.

init_adam(beta_1, beta_2)

Initialize adam parameters.

init_params(seed, init_strat)

Initialize weights and biases.


Predict outputs of entries in X [n_entries, n_features]

sgd(dW, db)

SGD step.