Search in sources :

Example 6 with IActivation

use of org.nd4j.linalg.activations.IActivation in project deeplearning4j by deeplearning4j.

the class GradientCheckUtil method checkGradients.

/**Check backprop gradients for a ComputationGraph
     * @param graph ComputationGraph to test. This must be initialized.
     * @param epsilon Usually on the order of 1e-4 or so.
     * @param maxRelError Maximum relative error. Usually < 0.01, though maybe more for deep networks
     * @param minAbsoluteError Minimum absolute error to cause a failure. Numerical gradients can be non-zero due to precision issues.
     *                         For example, 0.0 vs. 1e-18: relative error is 1.0, but not really a failure
     * @param print Whether to print full pass/failure details for each parameter gradient
     * @param exitOnFirstError If true: return upon first failure. If false: continue checking even if
     *  one parameter gradient has failed. Typically use false for debugging, true for unit tests.
     * @param inputs Input arrays to use for forward pass. May be mini-batch data.
     * @param labels Labels/targets (output) arrays to use to calculate backprop gradient. May be mini-batch data.
     * @return true if gradients are passed, false otherwise.
     */
public static boolean checkGradients(ComputationGraph graph, double epsilon, double maxRelError, double minAbsoluteError, boolean print, boolean exitOnFirstError, INDArray[] inputs, INDArray[] labels) {
    //Basic sanity checks on input:
    if (epsilon <= 0.0 || epsilon > 0.1)
        throw new IllegalArgumentException("Invalid epsilon: expect epsilon in range (0,0.1], usually 1e-4 or so");
    if (maxRelError <= 0.0 || maxRelError > 0.25)
        throw new IllegalArgumentException("Invalid maxRelativeError: " + maxRelError);
    if (graph.getNumInputArrays() != inputs.length)
        throw new IllegalArgumentException("Invalid input arrays: expect " + graph.getNumInputArrays() + " inputs");
    if (graph.getNumOutputArrays() != labels.length)
        throw new IllegalArgumentException("Invalid labels arrays: expect " + graph.getNumOutputArrays() + " outputs");
    //Check configuration
    int layerCount = 0;
    for (String vertexName : graph.getConfiguration().getVertices().keySet()) {
        GraphVertex gv = graph.getConfiguration().getVertices().get(vertexName);
        if (!(gv instanceof LayerVertex))
            continue;
        LayerVertex lv = (LayerVertex) gv;
        org.deeplearning4j.nn.conf.Updater u = lv.getLayerConf().getLayer().getUpdater();
        if (u == org.deeplearning4j.nn.conf.Updater.SGD) {
            //Must have LR of 1.0
            double lr = lv.getLayerConf().getLayer().getLearningRate();
            if (lr != 1.0) {
                throw new IllegalStateException("When using SGD updater, must also use lr=1.0 for layer \"" + vertexName + "\"; got " + u);
            }
        } else if (u != org.deeplearning4j.nn.conf.Updater.NONE) {
            throw new IllegalStateException("Must have Updater.NONE (or SGD + lr=1.0) for layer \"" + vertexName + "\"; got " + u);
        }
        double dropout = lv.getLayerConf().getLayer().getDropOut();
        if (lv.getLayerConf().isUseRegularization() && dropout != 0.0) {
            throw new IllegalStateException("Must have dropout == 0.0 for gradient checks - got dropout = " + dropout + " for layer " + layerCount);
        }
        IActivation activation = lv.getLayerConf().getLayer().getActivationFn();
        if (activation != null) {
            if (!VALID_ACTIVATION_FUNCTIONS.contains(activation.getClass())) {
                log.warn("Layer \"" + vertexName + "\" is possibly using an unsuitable activation function: " + activation.getClass() + ". Activation functions for gradient checks must be smooth (like sigmoid, tanh, softmax) and not " + "contain discontinuities like ReLU or LeakyReLU (these may cause spurious failures)");
            }
        }
    }
    for (int i = 0; i < inputs.length; i++) graph.setInput(i, inputs[i]);
    for (int i = 0; i < labels.length; i++) graph.setLabel(i, labels[i]);
    graph.computeGradientAndScore();
    Pair<Gradient, Double> gradAndScore = graph.gradientAndScore();
    ComputationGraphUpdater updater = new ComputationGraphUpdater(graph);
    updater.update(graph, gradAndScore.getFirst(), 0, graph.batchSize());
    //need dup: gradients are a *view* of the full gradient array (which will change every time backprop is done)
    INDArray gradientToCheck = gradAndScore.getFirst().gradient().dup();
    //need dup: params are a *view* of full parameters
    INDArray originalParams = graph.params().dup();
    int nParams = originalParams.length();
    Map<String, INDArray> paramTable = graph.paramTable();
    List<String> paramNames = new ArrayList<>(paramTable.keySet());
    int[] paramEnds = new int[paramNames.size()];
    paramEnds[0] = paramTable.get(paramNames.get(0)).length();
    for (int i = 1; i < paramEnds.length; i++) {
        paramEnds[i] = paramEnds[i - 1] + paramTable.get(paramNames.get(i)).length();
    }
    int currParamNameIdx = 0;
    int totalNFailures = 0;
    double maxError = 0.0;
    MultiDataSet mds = new MultiDataSet(inputs, labels);
    //Assumption here: params is a view that we can modify in-place
    INDArray params = graph.params();
    for (int i = 0; i < nParams; i++) {
        //Get param name
        if (i >= paramEnds[currParamNameIdx]) {
            currParamNameIdx++;
        }
        String paramName = paramNames.get(currParamNameIdx);
        //(w+epsilon): Do forward pass and score
        double origValue = params.getDouble(i);
        params.putScalar(i, origValue + epsilon);
        //training == true for batch norm, etc (scores and gradients need to be calculated on same thing)
        double scorePlus = graph.score(mds, true);
        //(w-epsilon): Do forward pass and score
        params.putScalar(i, origValue - epsilon);
        double scoreMinus = graph.score(mds, true);
        //Reset original param value
        params.putScalar(i, origValue);
        //Calculate numerical parameter gradient:
        double scoreDelta = scorePlus - scoreMinus;
        double numericalGradient = scoreDelta / (2 * epsilon);
        if (Double.isNaN(numericalGradient))
            throw new IllegalStateException("Numerical gradient was NaN for parameter " + i + " of " + nParams);
        double backpropGradient = gradientToCheck.getDouble(i);
        //http://cs231n.github.io/neural-networks-3/#gradcheck
        //use mean centered
        double relError = Math.abs(backpropGradient - numericalGradient) / (Math.abs(numericalGradient) + Math.abs(backpropGradient));
        if (backpropGradient == 0.0 && numericalGradient == 0.0)
            //Edge case: i.e., RNNs with time series length of 1.0
            relError = 0.0;
        if (relError > maxError)
            maxError = relError;
        if (relError > maxRelError || Double.isNaN(relError)) {
            double absError = Math.abs(backpropGradient - numericalGradient);
            if (absError < minAbsoluteError) {
                log.info("Param " + i + " (" + paramName + ") passed: grad= " + backpropGradient + ", numericalGrad= " + numericalGradient + ", relError= " + relError + "; absolute error = " + absError + " < minAbsoluteError = " + minAbsoluteError);
            } else {
                if (print)
                    log.info("Param " + i + " (" + paramName + ") FAILED: grad= " + backpropGradient + ", numericalGrad= " + numericalGradient + ", relError= " + relError + ", scorePlus=" + scorePlus + ", scoreMinus= " + scoreMinus);
                if (exitOnFirstError)
                    return false;
                totalNFailures++;
            }
        } else if (print) {
            log.info("Param " + i + " (" + paramName + ") passed: grad= " + backpropGradient + ", numericalGrad= " + numericalGradient + ", relError= " + relError);
        }
    }
    if (print) {
        int nPass = nParams - totalNFailures;
        log.info("GradientCheckUtil.checkGradients(): " + nParams + " params checked, " + nPass + " passed, " + totalNFailures + " failed. Largest relative error = " + maxError);
    }
    return totalNFailures == 0;
}
Also used : LayerVertex(org.deeplearning4j.nn.conf.graph.LayerVertex) Gradient(org.deeplearning4j.nn.gradient.Gradient) ComputationGraphUpdater(org.deeplearning4j.nn.updater.graph.ComputationGraphUpdater) ArrayList(java.util.ArrayList) IActivation(org.nd4j.linalg.activations.IActivation) GraphVertex(org.deeplearning4j.nn.conf.graph.GraphVertex) INDArray(org.nd4j.linalg.api.ndarray.INDArray) MultiDataSet(org.nd4j.linalg.dataset.MultiDataSet)

Example 7 with IActivation

use of org.nd4j.linalg.activations.IActivation in project deeplearning4j by deeplearning4j.

the class ComputationGraphConfiguration method fromJson.

/**
     * Create a computation graph configuration from json
     *
     * @param json the neural net configuration from json
     * @return {@link ComputationGraphConfiguration}
     */
public static ComputationGraphConfiguration fromJson(String json) {
    //As per MultiLayerConfiguration.fromJson()
    ObjectMapper mapper = NeuralNetConfiguration.mapper();
    ComputationGraphConfiguration conf;
    try {
        conf = mapper.readValue(json, ComputationGraphConfiguration.class);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
    //To maintain backward compatibility after activation function refactoring (configs generated with v0.7.1 or earlier)
    // Previously: enumeration used for activation functions. Now: use classes
    int layerCount = 0;
    Map<String, GraphVertex> vertexMap = conf.getVertices();
    JsonNode vertices = null;
    for (Map.Entry<String, GraphVertex> entry : vertexMap.entrySet()) {
        if (!(entry.getValue() instanceof LayerVertex)) {
            continue;
        }
        LayerVertex lv = (LayerVertex) entry.getValue();
        if (lv.getLayerConf() != null && lv.getLayerConf().getLayer() != null) {
            Layer layer = lv.getLayerConf().getLayer();
            if (layer.getActivationFn() == null) {
                String layerName = layer.getLayerName();
                try {
                    if (vertices == null) {
                        JsonNode jsonNode = mapper.readTree(json);
                        vertices = jsonNode.get("vertices");
                    }
                    JsonNode vertexNode = vertices.get(layerName);
                    JsonNode layerVertexNode = vertexNode.get("LayerVertex");
                    if (layerVertexNode == null || !layerVertexNode.has("layerConf") || !layerVertexNode.get("layerConf").has("layer")) {
                        continue;
                    }
                    JsonNode layerWrapperNode = layerVertexNode.get("layerConf").get("layer");
                    if (layerWrapperNode == null || layerWrapperNode.size() != 1) {
                        continue;
                    }
                    JsonNode layerNode = layerWrapperNode.elements().next();
                    //Should only have 1 element: "dense", "output", etc
                    JsonNode activationFunction = layerNode.get("activationFunction");
                    if (activationFunction != null) {
                        IActivation ia = Activation.fromString(activationFunction.asText()).getActivationFunction();
                        layer.setActivationFn(ia);
                    }
                } catch (IOException e) {
                    log.warn("Layer with null ActivationFn field or pre-0.7.2 activation function detected: could not parse JSON", e);
                }
            }
        }
    }
    return conf;
}
Also used : LayerVertex(org.deeplearning4j.nn.conf.graph.LayerVertex) JsonNode(org.nd4j.shade.jackson.databind.JsonNode) IOException(java.io.IOException) IActivation(org.nd4j.linalg.activations.IActivation) OutputLayer(org.deeplearning4j.nn.conf.layers.OutputLayer) Layer(org.deeplearning4j.nn.conf.layers.Layer) GraphVertex(org.deeplearning4j.nn.conf.graph.GraphVertex) ObjectMapper(org.nd4j.shade.jackson.databind.ObjectMapper)

Example 8 with IActivation

use of org.nd4j.linalg.activations.IActivation in project deeplearning4j by deeplearning4j.

the class VariationalAutoencoder method computeGradientAndScore.

@Override
public void computeGradientAndScore() {
    //Forward pass through the encoder and mean for P(Z|X)
    VAEFwdHelper fwd = doForward(true, true);
    IActivation afn = conf().getLayer().getActivationFn();
    //Forward pass through logStd^2 for P(Z|X)
    INDArray pzxLogStd2W = params.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_W);
    INDArray pzxLogStd2b = params.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_B);
    INDArray pzxLogStd2Pre = fwd.encoderActivations[fwd.encoderActivations.length - 1].mmul(pzxLogStd2W).addiRowVector(pzxLogStd2b);
    INDArray meanZ = fwd.pzxMeanPreOut.dup();
    INDArray logStdev2Z = pzxLogStd2Pre.dup();
    pzxActivationFn.getActivation(meanZ, true);
    pzxActivationFn.getActivation(logStdev2Z, true);
    INDArray pzxSigmaSquared = Transforms.exp(logStdev2Z, true);
    INDArray pzxSigma = Transforms.sqrt(pzxSigmaSquared, true);
    int minibatch = input.size(0);
    int size = fwd.pzxMeanPreOut.size(1);
    Map<String, INDArray> gradientMap = new HashMap<>();
    double scaleFactor = 1.0 / numSamples;
    Level1 blasL1 = Nd4j.getBlasWrapper().level1();
    INDArray[] encoderActivationDerivs = (numSamples > 1 ? new INDArray[encoderLayerSizes.length] : null);
    for (int l = 0; l < numSamples; l++) {
        //Default (and in most cases) numSamples == 1
        //0 for first one (to get rid of previous buffer data), otherwise 1 (for adding)
        double gemmCConstant = (l == 0 ? 0.0 : 1.0);
        INDArray e = Nd4j.randn(minibatch, size);
        //z = mu + sigma * e, with e ~ N(0,1)
        INDArray z = pzxSigma.mul(e).addi(meanZ);
        //Need to do forward pass through decoder layers
        int nDecoderLayers = decoderLayerSizes.length;
        INDArray current = z;
        //Need pre-out for backprop later
        INDArray[] decoderPreOut = new INDArray[nDecoderLayers];
        INDArray[] decoderActivations = new INDArray[nDecoderLayers];
        for (int i = 0; i < nDecoderLayers; i++) {
            String wKey = "d" + i + WEIGHT_KEY_SUFFIX;
            String bKey = "d" + i + BIAS_KEY_SUFFIX;
            INDArray weights = params.get(wKey);
            INDArray bias = params.get(bKey);
            current = current.mmul(weights).addiRowVector(bias);
            decoderPreOut[i] = current.dup();
            afn.getActivation(current, true);
            decoderActivations[i] = current;
        }
        INDArray pxzw = params.get(VariationalAutoencoderParamInitializer.PXZ_W);
        INDArray pxzb = params.get(VariationalAutoencoderParamInitializer.PXZ_B);
        if (l == 0) {
            //Need to add other component of score, in addition to negative log probability
            //Note the negative here vs. the equation in Kingma & Welling: this is because we are minimizing the negative of
            // variational lower bound, rather than maximizing the variational lower bound
            //Unlike log probability (which is averaged over samples) this should be calculated just once
            INDArray temp = meanZ.mul(meanZ).addi(pzxSigmaSquared).negi();
            temp.addi(logStdev2Z).addi(1.0);
            double scorePt1 = -0.5 / minibatch * temp.sumNumber().doubleValue();
            this.score = scorePt1 + (calcL1(false) + calcL2(false)) / minibatch;
        }
        INDArray pxzDistributionPreOut = current.mmul(pxzw).addiRowVector(pxzb);
        double logPTheta = reconstructionDistribution.negLogProbability(input, pxzDistributionPreOut, true);
        this.score += logPTheta / numSamples;
        //If we have any training listeners (for example, for UI StatsListener - pass on activations)
        if (trainingListeners != null && trainingListeners.size() > 0 && l == 0) {
            //Note: only doing this on the *first* sample
            Map<String, INDArray> activations = new LinkedHashMap<>();
            for (int i = 0; i < fwd.encoderActivations.length; i++) {
                activations.put("e" + i, fwd.encoderActivations[i]);
            }
            activations.put(VariationalAutoencoderParamInitializer.PZX_PREFIX, z);
            for (int i = 0; i < decoderActivations.length; i++) {
                activations.put("d" + i, decoderActivations[i]);
            }
            activations.put(VariationalAutoencoderParamInitializer.PXZ_PREFIX, reconstructionDistribution.generateAtMean(pxzDistributionPreOut));
            for (TrainingListener tl : trainingListeners) {
                tl.onForwardPass(this, activations);
            }
        }
        /////////////////////////////////////////////////////////
        //Backprop
        //First: calculate the gradients at the input to the reconstruction distribution
        INDArray dpdpxz = reconstructionDistribution.gradient(input, pxzDistributionPreOut);
        //Do backprop for output reconstruction distribution -> final decoder layer
        INDArray dLdxzw = gradientViews.get(VariationalAutoencoderParamInitializer.PXZ_W);
        INDArray dLdxzb = gradientViews.get(VariationalAutoencoderParamInitializer.PXZ_B);
        INDArray lastDecActivations = decoderActivations[decoderActivations.length - 1];
        Nd4j.gemm(lastDecActivations, dpdpxz, dLdxzw, true, false, scaleFactor, gemmCConstant);
        if (l == 0) {
            //TODO: do this without the assign
            dLdxzb.assign(dpdpxz.sum(0));
            if (numSamples > 1) {
                dLdxzb.muli(scaleFactor);
            }
        } else {
            blasL1.axpy(dLdxzb.length(), scaleFactor, dpdpxz.sum(0), dLdxzb);
        }
        gradientMap.put(VariationalAutoencoderParamInitializer.PXZ_W, dLdxzw);
        gradientMap.put(VariationalAutoencoderParamInitializer.PXZ_B, dLdxzb);
        INDArray epsilon = pxzw.mmul(dpdpxz.transpose()).transpose();
        //Next: chain derivatives backwards through the decoder layers
        for (int i = nDecoderLayers - 1; i >= 0; i--) {
            String wKey = "d" + i + WEIGHT_KEY_SUFFIX;
            String bKey = "d" + i + BIAS_KEY_SUFFIX;
            //TODO activation functions with params
            INDArray currentDelta = afn.backprop(decoderPreOut[i], epsilon).getFirst();
            INDArray weights = params.get(wKey);
            INDArray dLdW = gradientViews.get(wKey);
            INDArray dLdB = gradientViews.get(bKey);
            INDArray actInput;
            if (i == 0) {
                actInput = z;
            } else {
                actInput = decoderActivations[i - 1];
            }
            Nd4j.gemm(actInput, currentDelta, dLdW, true, false, scaleFactor, gemmCConstant);
            if (l == 0) {
                //TODO: do this without the assign
                dLdB.assign(currentDelta.sum(0));
                if (numSamples > 1) {
                    dLdB.muli(scaleFactor);
                }
            } else {
                blasL1.axpy(dLdB.length(), scaleFactor, currentDelta.sum(0), dLdB);
            }
            gradientMap.put(wKey, dLdW);
            gradientMap.put(bKey, dLdB);
            epsilon = weights.mmul(currentDelta.transpose()).transpose();
        }
        //Do backprop through p(z|x)
        INDArray eZXMeanW = params.get(VariationalAutoencoderParamInitializer.PZX_MEAN_W);
        INDArray eZXLogStdev2W = params.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_W);
        INDArray dLdz = epsilon;
        //If we were maximizing the equation in Kinga and Welling, this would be a .sub(meanZ). Here: we are minimizing the negative instead
        INDArray dLdmu = dLdz.add(meanZ);
        INDArray dLdLogSigma2 = dLdz.mul(e).muli(pzxSigma).addi(pzxSigmaSquared).subi(1).muli(0.5);
        INDArray dLdPreMu = pzxActivationFn.backprop(fwd.getPzxMeanPreOut().dup(), dLdmu).getFirst();
        INDArray dLdPreLogSigma2 = pzxActivationFn.backprop(pzxLogStd2Pre.dup(), dLdLogSigma2).getFirst();
        //Weight gradients for weights feeding into p(z|x)
        INDArray lastEncoderActivation = fwd.encoderActivations[fwd.encoderActivations.length - 1];
        INDArray dLdZXMeanW = gradientViews.get(VariationalAutoencoderParamInitializer.PZX_MEAN_W);
        INDArray dLdZXLogStdev2W = gradientViews.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_W);
        Nd4j.gemm(lastEncoderActivation, dLdPreMu, dLdZXMeanW, true, false, scaleFactor, gemmCConstant);
        Nd4j.gemm(lastEncoderActivation, dLdPreLogSigma2, dLdZXLogStdev2W, true, false, scaleFactor, gemmCConstant);
        //Bias gradients for p(z|x)
        INDArray dLdZXMeanb = gradientViews.get(VariationalAutoencoderParamInitializer.PZX_MEAN_B);
        INDArray dLdZXLogStdev2b = gradientViews.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_B);
        //If we were maximizing the equation in Kinga and Welling, this would be a .sub(meanZ). Here: we are minimizing the negative instead
        if (l == 0) {
            dLdZXMeanb.assign(pzxActivationFn.backprop(fwd.getPzxMeanPreOut().dup(), dLdz.add(meanZ)).getFirst().sum(0));
            dLdZXLogStdev2b.assign(dLdPreLogSigma2.sum(0));
            if (numSamples > 1) {
                dLdZXMeanb.muli(scaleFactor);
                dLdZXLogStdev2b.muli(scaleFactor);
            }
        } else {
            blasL1.axpy(dLdZXMeanb.length(), scaleFactor, pzxActivationFn.backprop(fwd.getPzxMeanPreOut().dup(), dLdz.add(meanZ)).getFirst().sum(0), dLdZXMeanb);
            blasL1.axpy(dLdZXLogStdev2b.length(), scaleFactor, dLdPreLogSigma2.sum(0), dLdZXLogStdev2b);
        }
        gradientMap.put(VariationalAutoencoderParamInitializer.PZX_MEAN_W, dLdZXMeanW);
        gradientMap.put(VariationalAutoencoderParamInitializer.PZX_MEAN_B, dLdZXMeanb);
        gradientMap.put(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_W, dLdZXLogStdev2W);
        gradientMap.put(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_B, dLdZXLogStdev2b);
        //Epsilon (dL/dActivation) at output of the last encoder layer:
        //Equivalent to: epsilon = eZXMeanW.mmul(dLdPreMu.transpose()).transpose(); using   (AxB^T)^T = BxA^T
        epsilon = Nd4j.gemm(dLdPreMu, eZXMeanW, false, true);
        //Next line: equivalent to epsilon.addi(eZXLogStdev2W.mmul(dLdPreLogSigma2.transpose()).transpose());       using: (AxB^T)^T = BxA^T
        Nd4j.gemm(dLdPreLogSigma2, eZXLogStdev2W, epsilon, false, true, 1.0, 1.0);
        //Backprop through encoder:
        int nEncoderLayers = encoderLayerSizes.length;
        for (int i = nEncoderLayers - 1; i >= 0; i--) {
            String wKey = "e" + i + WEIGHT_KEY_SUFFIX;
            String bKey = "e" + i + BIAS_KEY_SUFFIX;
            INDArray weights = params.get(wKey);
            INDArray dLdW = gradientViews.get(wKey);
            INDArray dLdB = gradientViews.get(bKey);
            INDArray preOut = fwd.encoderPreOuts[i];
            INDArray currentDelta;
            if (numSamples > 1) {
                // only the errors do
                if (l == 0) {
                    //Not the most elegent implementation (with the ND4j.ones()), but it works...
                    encoderActivationDerivs[i] = afn.backprop(fwd.encoderPreOuts[i], Nd4j.ones(fwd.encoderPreOuts[i].shape())).getFirst();
                }
                currentDelta = epsilon.muli(encoderActivationDerivs[i]);
            } else {
                currentDelta = afn.backprop(preOut, epsilon).getFirst();
            }
            INDArray actInput;
            if (i == 0) {
                actInput = input;
            } else {
                actInput = fwd.encoderActivations[i - 1];
            }
            Nd4j.gemm(actInput, currentDelta, dLdW, true, false, scaleFactor, gemmCConstant);
            if (l == 0) {
                //TODO: do this without the assign
                dLdB.assign(currentDelta.sum(0));
                if (numSamples > 1) {
                    dLdB.muli(scaleFactor);
                }
            } else {
                blasL1.axpy(dLdB.length(), scaleFactor, currentDelta.sum(0), dLdB);
            }
            gradientMap.put(wKey, dLdW);
            gradientMap.put(bKey, dLdB);
            epsilon = weights.mmul(currentDelta.transpose()).transpose();
        }
    }
    //Insert the gradients into the Gradient map in the correct order, in case we need to flatten the gradient later
    // to match the parameters iteration order
    Gradient gradient = new DefaultGradient(gradientsFlattened);
    Map<String, INDArray> g = gradient.gradientForVariable();
    for (int i = 0; i < encoderLayerSizes.length; i++) {
        String w = "e" + i + VariationalAutoencoderParamInitializer.WEIGHT_KEY_SUFFIX;
        g.put(w, gradientMap.get(w));
        String b = "e" + i + VariationalAutoencoderParamInitializer.BIAS_KEY_SUFFIX;
        g.put(b, gradientMap.get(b));
    }
    g.put(VariationalAutoencoderParamInitializer.PZX_MEAN_W, gradientMap.get(VariationalAutoencoderParamInitializer.PZX_MEAN_W));
    g.put(VariationalAutoencoderParamInitializer.PZX_MEAN_B, gradientMap.get(VariationalAutoencoderParamInitializer.PZX_MEAN_B));
    g.put(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_W, gradientMap.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_W));
    g.put(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_B, gradientMap.get(VariationalAutoencoderParamInitializer.PZX_LOGSTD2_B));
    for (int i = 0; i < decoderLayerSizes.length; i++) {
        String w = "d" + i + VariationalAutoencoderParamInitializer.WEIGHT_KEY_SUFFIX;
        g.put(w, gradientMap.get(w));
        String b = "d" + i + VariationalAutoencoderParamInitializer.BIAS_KEY_SUFFIX;
        g.put(b, gradientMap.get(b));
    }
    g.put(VariationalAutoencoderParamInitializer.PXZ_W, gradientMap.get(VariationalAutoencoderParamInitializer.PXZ_W));
    g.put(VariationalAutoencoderParamInitializer.PXZ_B, gradientMap.get(VariationalAutoencoderParamInitializer.PXZ_B));
    this.gradient = gradient;
}
Also used : Gradient(org.deeplearning4j.nn.gradient.Gradient) DefaultGradient(org.deeplearning4j.nn.gradient.DefaultGradient) TrainingListener(org.deeplearning4j.optimize.api.TrainingListener) IActivation(org.nd4j.linalg.activations.IActivation) DefaultGradient(org.deeplearning4j.nn.gradient.DefaultGradient) INDArray(org.nd4j.linalg.api.ndarray.INDArray) Level1(org.nd4j.linalg.api.blas.Level1)

Example 9 with IActivation

use of org.nd4j.linalg.activations.IActivation in project deeplearning4j by deeplearning4j.

the class VariationalAutoencoder method backpropGradient.

@Override
public Pair<Gradient, INDArray> backpropGradient(INDArray epsilon) {
    if (!zeroedPretrainParamGradients) {
        for (Map.Entry<String, INDArray> entry : gradientViews.entrySet()) {
            if (isPretrainParam(entry.getKey())) {
                entry.getValue().assign(0);
            }
        }
        zeroedPretrainParamGradients = true;
    }
    Gradient gradient = new DefaultGradient();
    VAEFwdHelper fwd = doForward(true, true);
    INDArray currentDelta = pzxActivationFn.backprop(fwd.pzxMeanPreOut, epsilon).getFirst();
    //Finally, calculate mean value:
    INDArray meanW = params.get(VariationalAutoencoderParamInitializer.PZX_MEAN_W);
    //f order
    INDArray dLdMeanW = gradientViews.get(VariationalAutoencoderParamInitializer.PZX_MEAN_W);
    INDArray lastEncoderActivation = fwd.encoderActivations[fwd.encoderActivations.length - 1];
    Nd4j.gemm(lastEncoderActivation, currentDelta, dLdMeanW, true, false, 1.0, 0.0);
    INDArray dLdMeanB = gradientViews.get(VariationalAutoencoderParamInitializer.PZX_MEAN_B);
    //TODO: do this without the assign
    dLdMeanB.assign(currentDelta.sum(0));
    gradient.gradientForVariable().put(VariationalAutoencoderParamInitializer.PZX_MEAN_W, dLdMeanW);
    gradient.gradientForVariable().put(VariationalAutoencoderParamInitializer.PZX_MEAN_B, dLdMeanB);
    epsilon = meanW.mmul(currentDelta.transpose()).transpose();
    int nEncoderLayers = encoderLayerSizes.length;
    IActivation afn = conf().getLayer().getActivationFn();
    for (int i = nEncoderLayers - 1; i >= 0; i--) {
        String wKey = "e" + i + WEIGHT_KEY_SUFFIX;
        String bKey = "e" + i + BIAS_KEY_SUFFIX;
        INDArray weights = params.get(wKey);
        INDArray dLdW = gradientViews.get(wKey);
        INDArray dLdB = gradientViews.get(bKey);
        INDArray preOut = fwd.encoderPreOuts[i];
        currentDelta = afn.backprop(preOut, epsilon).getFirst();
        INDArray actInput;
        if (i == 0) {
            actInput = input;
        } else {
            actInput = fwd.encoderActivations[i - 1];
        }
        Nd4j.gemm(actInput, currentDelta, dLdW, true, false, 1.0, 0.0);
        //TODO: do this without the assign
        dLdB.assign(currentDelta.sum(0));
        gradient.gradientForVariable().put(wKey, dLdW);
        gradient.gradientForVariable().put(bKey, dLdB);
        epsilon = weights.mmul(currentDelta.transpose()).transpose();
    }
    return new Pair<>(gradient, epsilon);
}
Also used : Gradient(org.deeplearning4j.nn.gradient.Gradient) DefaultGradient(org.deeplearning4j.nn.gradient.DefaultGradient) DefaultGradient(org.deeplearning4j.nn.gradient.DefaultGradient) INDArray(org.nd4j.linalg.api.ndarray.INDArray) IActivation(org.nd4j.linalg.activations.IActivation) Pair(org.deeplearning4j.berkeley.Pair)

Example 10 with IActivation

use of org.nd4j.linalg.activations.IActivation in project deeplearning4j by deeplearning4j.

the class LSTMHelpers method activateHelper.

/**
     * Returns FwdPassReturn object with activations/INDArrays. Allows activateHelper to be used for forward pass, backward pass
     * and rnnTimeStep whilst being reasonably efficient for all
     */
public static FwdPassReturn activateHelper(final Layer layer, final NeuralNetConfiguration conf, //Activation function for the gates - sigmoid or hard sigmoid (must be found in range 0 to 1)
final IActivation gateActivationFn, //Shape: [hiddenLayerSize,4*hiddenLayerSize+3]; order: [wI,wF,wO,wG,wFF,wOO,wGG]
final INDArray input, //Shape: [hiddenLayerSize,4*hiddenLayerSize+3]; order: [wI,wF,wO,wG,wFF,wOO,wGG]
final INDArray recurrentWeights, //Shape: [n^(L-1),4*hiddenLayerSize]; order: [wi,wf,wo,wg]
final INDArray originalInputWeights, //Shape: [4,hiddenLayerSize]; order: [bi,bf,bo,bg]^T
final INDArray biases, final boolean training, final INDArray originalPrevOutputActivations, final INDArray originalPrevMemCellState, boolean forBackprop, boolean forwards, //Input mask: should only be used with bidirectional RNNs + variable length
final String inputWeightKey, //Input mask: should only be used with bidirectional RNNs + variable length
INDArray maskArray) {
    //Data has shape [m,nIn,T]. Layer activations/output has shape [m,nHiddenUnits,T]
    if (input == null || input.length() == 0)
        throw new IllegalArgumentException("Invalid input: not set or 0 length");
    INDArray inputWeights = originalInputWeights;
    INDArray prevOutputActivations = originalPrevOutputActivations;
    //Edge case of T=1, may have shape [m,nIn], equiv. to [m,nIn,1]
    boolean is2dInput = input.rank() < 3;
    int timeSeriesLength = (is2dInput ? 1 : input.size(2));
    int hiddenLayerSize = recurrentWeights.size(0);
    int miniBatchSize = input.size(0);
    INDArray prevMemCellState;
    if (originalPrevMemCellState == null) {
        prevMemCellState = Nd4j.create(new int[] { miniBatchSize, hiddenLayerSize }, 'f');
    } else {
        prevMemCellState = originalPrevMemCellState.dup('f');
    }
    INDArray recurrentWeightsIFOG = recurrentWeights.get(NDArrayIndex.all(), NDArrayIndex.interval(0, 4 * hiddenLayerSize)).dup('f');
    //Apply dropconnect to input (not recurrent) weights only:
    if (conf.isUseDropConnect() && training && conf.getLayer().getDropOut() > 0) {
        inputWeights = Dropout.applyDropConnect(layer, inputWeightKey);
    }
    INDArray wFFTranspose = recurrentWeights.get(NDArrayIndex.all(), interval(4 * hiddenLayerSize, 4 * hiddenLayerSize + 1)).transpose();
    INDArray wOOTranspose = recurrentWeights.get(NDArrayIndex.all(), interval(4 * hiddenLayerSize + 1, 4 * hiddenLayerSize + 2)).transpose();
    INDArray wGGTranspose = recurrentWeights.get(NDArrayIndex.all(), interval(4 * hiddenLayerSize + 2, 4 * hiddenLayerSize + 3)).transpose();
    if (timeSeriesLength > 1 || forBackprop) {
        wFFTranspose = Shape.toMmulCompatible(wFFTranspose);
        wOOTranspose = Shape.toMmulCompatible(wOOTranspose);
        wGGTranspose = Shape.toMmulCompatible(wGGTranspose);
    }
    //Allocate arrays for activations:
    boolean sigmoidGates = gateActivationFn instanceof ActivationSigmoid;
    IActivation afn = conf.getLayer().getActivationFn();
    INDArray outputActivations = null;
    FwdPassReturn toReturn = new FwdPassReturn();
    if (forBackprop) {
        toReturn.fwdPassOutputAsArrays = new INDArray[timeSeriesLength];
        toReturn.memCellState = new INDArray[timeSeriesLength];
        toReturn.memCellActivations = new INDArray[timeSeriesLength];
        toReturn.iz = new INDArray[timeSeriesLength];
        toReturn.ia = new INDArray[timeSeriesLength];
        toReturn.fa = new INDArray[timeSeriesLength];
        toReturn.oa = new INDArray[timeSeriesLength];
        toReturn.ga = new INDArray[timeSeriesLength];
        if (!sigmoidGates) {
            toReturn.fz = new INDArray[timeSeriesLength];
            toReturn.oz = new INDArray[timeSeriesLength];
            toReturn.gz = new INDArray[timeSeriesLength];
        }
    } else {
        //F order to keep time steps together
        outputActivations = Nd4j.create(new int[] { miniBatchSize, hiddenLayerSize, timeSeriesLength }, 'f');
        toReturn.fwdPassOutput = outputActivations;
    }
    Level1 l1BLAS = Nd4j.getBlasWrapper().level1();
    //Input validation: check input data matches nIn
    if (input.size(1) != inputWeights.size(0)) {
        throw new DL4JInvalidInputException("Received input with size(1) = " + input.size(1) + " (input array shape = " + Arrays.toString(input.shape()) + "); input.size(1) must match layer nIn size (nIn = " + inputWeights.size(0) + ")");
    }
    //These can be different if user forgets to call rnnClearPreviousState() between calls of rnnTimeStep
    if (prevOutputActivations != null && prevOutputActivations.size(0) != input.size(0)) {
        throw new DL4JInvalidInputException("Previous activations (stored state) number of examples = " + prevOutputActivations.size(0) + " but input array number of examples = " + input.size(0) + ". Possible cause: using rnnTimeStep() without calling" + " rnnClearPreviousState() between different sequences?");
    }
    //initialize prevOutputActivations to zeroes
    if (prevOutputActivations == null) {
        prevOutputActivations = Nd4j.zeros(new int[] { miniBatchSize, hiddenLayerSize });
    }
    for (int iTimeIndex = 0; iTimeIndex < timeSeriesLength; iTimeIndex++) {
        int time = iTimeIndex;
        if (!forwards) {
            time = timeSeriesLength - iTimeIndex - 1;
        }
        //[Expected shape: [m,nIn]. Also deals with edge case of T=1, with 'time series' data of shape [m,nIn], equiv. to [m,nIn,1]
        INDArray miniBatchData = (is2dInput ? input : input.tensorAlongDimension(time, 1, 0));
        miniBatchData = Shape.toMmulCompatible(miniBatchData);
        //Calculate activations for: network input + forget, output, input modulation gates. Next 3 lines are first part of those
        //Shape: [miniBatch,4*layerSize]
        INDArray ifogActivations = miniBatchData.mmul(inputWeights);
        Nd4j.gemm(prevOutputActivations, recurrentWeightsIFOG, ifogActivations, false, false, 1.0, 1.0);
        ifogActivations.addiRowVector(biases);
        INDArray inputActivations = ifogActivations.get(NDArrayIndex.all(), NDArrayIndex.interval(0, hiddenLayerSize));
        if (forBackprop)
            toReturn.iz[time] = inputActivations.dup('f');
        conf.getLayer().getActivationFn().getActivation(inputActivations, training);
        if (forBackprop)
            toReturn.ia[time] = inputActivations;
        INDArray forgetGateActivations = ifogActivations.get(NDArrayIndex.all(), NDArrayIndex.interval(hiddenLayerSize, 2 * hiddenLayerSize));
        INDArray pmcellWFF = prevMemCellState.dup('f').muliRowVector(wFFTranspose);
        //y = a*x + y i.e., forgetGateActivations.addi(pmcellWFF)
        l1BLAS.axpy(pmcellWFF.length(), 1.0, pmcellWFF, forgetGateActivations);
        //Above line: treats matrix as a vector. Can only do this because we're sure both pwcelWFF and forgetGateACtivations are f order, offset 0 and have same strides
        if (forBackprop && !sigmoidGates) {
            //Forget gate pre-out (z)
            toReturn.fz[time] = forgetGateActivations.dup('f');
        }
        gateActivationFn.getActivation(forgetGateActivations, training);
        if (forBackprop)
            toReturn.fa[time] = forgetGateActivations;
        INDArray inputModGateActivations = ifogActivations.get(NDArrayIndex.all(), NDArrayIndex.interval(3 * hiddenLayerSize, 4 * hiddenLayerSize));
        INDArray pmcellWGG = prevMemCellState.dup('f').muliRowVector(wGGTranspose);
        //inputModGateActivations.addi(pmcellWGG)
        l1BLAS.axpy(pmcellWGG.length(), 1.0, pmcellWGG, inputModGateActivations);
        if (forBackprop && !sigmoidGates) {
            //Input modulation gate pre-out (z)
            toReturn.gz[time] = inputModGateActivations.dup('f');
        }
        gateActivationFn.getActivation(inputModGateActivations, training);
        if (forBackprop)
            toReturn.ga[time] = inputModGateActivations;
        //Memory cell state
        INDArray currentMemoryCellState;
        INDArray inputModMulInput;
        if (forBackprop) {
            currentMemoryCellState = prevMemCellState.dup('f').muli(forgetGateActivations);
            inputModMulInput = inputModGateActivations.dup('f').muli(inputActivations);
        } else {
            currentMemoryCellState = forgetGateActivations.muli(prevMemCellState);
            inputModMulInput = inputModGateActivations.muli(inputActivations);
        }
        //currentMemoryCellState.addi(inputModMulInput)
        l1BLAS.axpy(currentMemoryCellState.length(), 1.0, inputModMulInput, currentMemoryCellState);
        INDArray outputGateActivations = ifogActivations.get(NDArrayIndex.all(), NDArrayIndex.interval(2 * hiddenLayerSize, 3 * hiddenLayerSize));
        INDArray pmcellWOO = currentMemoryCellState.dup('f').muliRowVector(wOOTranspose);
        //outputGateActivations.addi(pmcellWOO)
        l1BLAS.axpy(pmcellWOO.length(), 1.0, pmcellWOO, outputGateActivations);
        if (forBackprop && !sigmoidGates) {
            //Output gate activations
            toReturn.oz[time] = outputGateActivations.dup('f');
        }
        gateActivationFn.getActivation(outputGateActivations, training);
        if (forBackprop)
            toReturn.oa[time] = outputGateActivations;
        //LSTM unit outputs:
        INDArray currMemoryCellActivation = afn.getActivation(currentMemoryCellState.dup('f'), training);
        INDArray currHiddenUnitActivations;
        if (forBackprop) {
            //Expected shape: [m,hiddenLayerSize]
            currHiddenUnitActivations = currMemoryCellActivation.dup('f').muli(outputGateActivations);
        } else {
            //Expected shape: [m,hiddenLayerSize]
            currHiddenUnitActivations = currMemoryCellActivation.muli(outputGateActivations);
        }
        if (maskArray != null) {
            //Mask array is present: bidirectional RNN -> need to zero out these activations to avoid
            // incorrectly using activations from masked time steps (i.e., want 0 initialization in both directions)
            //We *also* need to apply this to the memory cells, as they are carried forward
            //Mask array has shape [minibatch, timeSeriesLength] -> get column
            INDArray timeStepMaskColumn = maskArray.getColumn(time);
            currHiddenUnitActivations.muliColumnVector(timeStepMaskColumn);
            currentMemoryCellState.muliColumnVector(timeStepMaskColumn);
        }
        if (forBackprop) {
            toReturn.fwdPassOutputAsArrays[time] = currHiddenUnitActivations;
            toReturn.memCellState[time] = currentMemoryCellState;
            toReturn.memCellActivations[time] = currMemoryCellActivation;
        } else {
            outputActivations.tensorAlongDimension(time, 1, 0).assign(currHiddenUnitActivations);
        }
        prevOutputActivations = currHiddenUnitActivations;
        prevMemCellState = currentMemoryCellState;
        toReturn.lastAct = currHiddenUnitActivations;
        toReturn.lastMemCell = currentMemoryCellState;
    }
    return toReturn;
}
Also used : INDArray(org.nd4j.linalg.api.ndarray.INDArray) ActivationSigmoid(org.nd4j.linalg.activations.impl.ActivationSigmoid) Level1(org.nd4j.linalg.api.blas.Level1) IActivation(org.nd4j.linalg.activations.IActivation) DL4JInvalidInputException(org.deeplearning4j.exception.DL4JInvalidInputException) NDArrayIndex.point(org.nd4j.linalg.indexing.NDArrayIndex.point)

Aggregations

IActivation (org.nd4j.linalg.activations.IActivation)12 INDArray (org.nd4j.linalg.api.ndarray.INDArray)10 Gradient (org.deeplearning4j.nn.gradient.Gradient)6 DefaultGradient (org.deeplearning4j.nn.gradient.DefaultGradient)4 Pair (org.deeplearning4j.berkeley.Pair)3 Level1 (org.nd4j.linalg.api.blas.Level1)3 IOException (java.io.IOException)2 ArrayList (java.util.ArrayList)2 GraphVertex (org.deeplearning4j.nn.conf.graph.GraphVertex)2 LayerVertex (org.deeplearning4j.nn.conf.graph.LayerVertex)2 ComputationGraphUpdater (org.deeplearning4j.nn.updater.graph.ComputationGraphUpdater)2 ActivationSigmoid (org.nd4j.linalg.activations.impl.ActivationSigmoid)2 MultiDataSet (org.nd4j.linalg.dataset.MultiDataSet)2 NDArrayIndex.point (org.nd4j.linalg.indexing.NDArrayIndex.point)2 JsonNode (org.nd4j.shade.jackson.databind.JsonNode)2 ObjectMapper (org.nd4j.shade.jackson.databind.ObjectMapper)2 DL4JInvalidInputException (org.deeplearning4j.exception.DL4JInvalidInputException)1 Updater (org.deeplearning4j.nn.api.Updater)1 IOutputLayer (org.deeplearning4j.nn.api.layers.IOutputLayer)1 NeuralNetConfiguration (org.deeplearning4j.nn.conf.NeuralNetConfiguration)1