Search in sources :

Example 11 with SparkDl4jMultiLayer

use of org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer in project deeplearning4j by deeplearning4j.

the class TestPreProcessedData method testPreprocessedData.

@Test
public void testPreprocessedData() {
    //Test _loading_ of preprocessed data
    int dataSetObjSize = 5;
    int batchSizePerExecutor = 10;
    String path = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), "dl4j_testpreprocdata");
    File f = new File(path);
    if (f.exists())
        f.delete();
    f.mkdir();
    DataSetIterator iter = new IrisDataSetIterator(5, 150);
    int i = 0;
    while (iter.hasNext()) {
        File f2 = new File(FilenameUtils.concat(path, "data" + (i++) + ".bin"));
        iter.next().save(f2);
    }
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().updater(Updater.RMSPROP).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1).list().layer(0, new org.deeplearning4j.nn.conf.layers.DenseLayer.Builder().nIn(4).nOut(3).activation(Activation.TANH).build()).layer(1, new org.deeplearning4j.nn.conf.layers.OutputLayer.Builder(LossFunctions.LossFunction.MCXENT).nIn(3).nOut(3).activation(Activation.SOFTMAX).build()).pretrain(false).backprop(true).build();
    SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, new ParameterAveragingTrainingMaster.Builder(numExecutors(), dataSetObjSize).batchSizePerWorker(batchSizePerExecutor).averagingFrequency(1).repartionData(Repartition.Always).build());
    sparkNet.setCollectTrainingStats(true);
    sparkNet.fit("file:///" + path.replaceAll("\\\\", "/"));
    SparkTrainingStats sts = sparkNet.getSparkTrainingStats();
    //4 'fits' per averaging (4 executors, 1 averaging freq); 10 examples each -> 40 examples per fit. 150/40 = 3 averagings (round down); 3*4 = 12
    int expNumFits = 12;
    //Unfortunately: perfect partitioning isn't guaranteed by SparkUtils.balancedRandomSplit (esp. if original partitions are all size 1
    // which appears to be occurring at least some of the time), but we should get close to what we expect...
    assertTrue(Math.abs(expNumFits - sts.getValue("ParameterAveragingWorkerFitTimesMs").size()) < 3);
    assertEquals(3, sts.getValue("ParameterAveragingMasterMapPartitionsTimesMs").size());
}
Also used : IrisDataSetIterator(org.deeplearning4j.datasets.iterator.impl.IrisDataSetIterator) NeuralNetConfiguration(org.deeplearning4j.nn.conf.NeuralNetConfiguration) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) SparkTrainingStats(org.deeplearning4j.spark.api.stats.SparkTrainingStats) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) SparkDl4jMultiLayer(org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer) File(java.io.File) IrisDataSetIterator(org.deeplearning4j.datasets.iterator.impl.IrisDataSetIterator) DataSetIterator(org.nd4j.linalg.dataset.api.iterator.DataSetIterator) PortableDataStreamDataSetIterator(org.deeplearning4j.spark.iterator.PortableDataStreamDataSetIterator) BaseSparkTest(org.deeplearning4j.spark.BaseSparkTest) Test(org.junit.Test)

Example 12 with SparkDl4jMultiLayer

use of org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer in project deeplearning4j by deeplearning4j.

the class TestKryoWarning method doTestMLN.

private static void doTestMLN(SparkConf sparkConf) {
    JavaSparkContext sc = new JavaSparkContext(sparkConf);
    try {
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().list().layer(0, new OutputLayer.Builder().nIn(10).nOut(10).build()).pretrain(false).backprop(true).build();
        TrainingMaster tm = new ParameterAveragingTrainingMaster.Builder(1).build();
        SparkDl4jMultiLayer sml = new SparkDl4jMultiLayer(sc, conf, tm);
    } finally {
        sc.stop();
    }
}
Also used : OutputLayer(org.deeplearning4j.nn.conf.layers.OutputLayer) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) SparkDl4jMultiLayer(org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) NeuralNetConfiguration(org.deeplearning4j.nn.conf.NeuralNetConfiguration) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) TrainingMaster(org.deeplearning4j.spark.api.TrainingMaster)

Example 13 with SparkDl4jMultiLayer

use of org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer in project deeplearning4j by deeplearning4j.

the class TestCompareParameterAveragingSparkVsSingleMachine method testAverageEveryStepCNN.

@Test
public void testAverageEveryStepCNN() {
    //Idea: averaging every step with SGD (SGD updater + optimizer) is mathematically identical to doing the learning
    // on a single machine for synchronous distributed training
    //BUT: This is *ONLY* the case if all workers get an identical number of examples. This won't be the case if
    // we use RDD.randomSplit (which is what occurs if we use .fit(JavaRDD<DataSet> on a data set that needs splitting),
    // which might give a number of examples that isn't divisible by number of workers (like 39 examples on 4 executors)
    //This is also ONLY the case using SGD updater
    int miniBatchSizePerWorker = 10;
    int nWorkers = 4;
    for (boolean saveUpdater : new boolean[] { true, false }) {
        JavaSparkContext sc = getContext(nWorkers);
        try {
            //Do training locally, for 3 minibatches
            int[] seeds = { 1, 2, 3 };
            MultiLayerNetwork net = new MultiLayerNetwork(getConfCNN(12345, Updater.SGD));
            net.init();
            INDArray initialParams = net.params().dup();
            for (int i = 0; i < seeds.length; i++) {
                DataSet ds = getOneDataSetCNN(miniBatchSizePerWorker * nWorkers, seeds[i]);
                if (!saveUpdater)
                    net.setUpdater(null);
                net.fit(ds);
            }
            INDArray finalParams = net.params().dup();
            //Do training on Spark with one executor, for 3 separate minibatches
            ParameterAveragingTrainingMaster tm = new ParameterAveragingTrainingMaster.Builder(1).averagingFrequency(1).batchSizePerWorker(miniBatchSizePerWorker).saveUpdater(saveUpdater).workerPrefetchNumBatches(0).rddTrainingApproach(RDDTrainingApproach.Export).build();
            SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, getConfCNN(12345, Updater.SGD), tm);
            sparkNet.setCollectTrainingStats(true);
            INDArray initialSparkParams = sparkNet.getNetwork().params().dup();
            for (int i = 0; i < seeds.length; i++) {
                List<DataSet> list = getOneDataSetAsIndividalExamplesCNN(miniBatchSizePerWorker * nWorkers, seeds[i]);
                JavaRDD<DataSet> rdd = sc.parallelize(list);
                sparkNet.fit(rdd);
            }
            System.out.println(sparkNet.getSparkTrainingStats().statsAsString());
            INDArray finalSparkParams = sparkNet.getNetwork().params().dup();
            System.out.println("Initial (Local) params:       " + Arrays.toString(initialParams.data().asFloat()));
            System.out.println("Initial (Spark) params:       " + Arrays.toString(initialSparkParams.data().asFloat()));
            System.out.println("Final (Local) params: " + Arrays.toString(finalParams.data().asFloat()));
            System.out.println("Final (Spark) params: " + Arrays.toString(finalSparkParams.data().asFloat()));
            assertArrayEquals(initialParams.data().asFloat(), initialSparkParams.data().asFloat(), 1e-8f);
            assertArrayEquals(finalParams.data().asFloat(), finalSparkParams.data().asFloat(), 1e-6f);
            double sparkScore = sparkNet.getScore();
            assertTrue(sparkScore > 0.0);
            assertEquals(net.score(), sparkScore, 1e-3);
        } finally {
            sc.stop();
        }
    }
}
Also used : DataSet(org.nd4j.linalg.dataset.DataSet) INDArray(org.nd4j.linalg.api.ndarray.INDArray) SparkDl4jMultiLayer(org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) MultiLayerNetwork(org.deeplearning4j.nn.multilayer.MultiLayerNetwork) Test(org.junit.Test)

Example 14 with SparkDl4jMultiLayer

use of org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer in project deeplearning4j by deeplearning4j.

the class TestSparkMultiLayerParameterAveraging method testSmallAmountOfData.

@Test
public void testSmallAmountOfData() {
    //Idea: Test spark training where some executors don't get any data
    //in this case: by having fewer examples (2 DataSets) than executors (local[*])
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().updater(Updater.RMSPROP).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1).list().layer(0, new org.deeplearning4j.nn.conf.layers.DenseLayer.Builder().nIn(nIn).nOut(3).activation(Activation.TANH).build()).layer(1, new org.deeplearning4j.nn.conf.layers.OutputLayer.Builder(LossFunctions.LossFunction.MSE).nIn(3).nOut(nOut).activation(Activation.SOFTMAX).build()).build();
    SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, new ParameterAveragingTrainingMaster(true, numExecutors(), 1, 10, 1, 0));
    Nd4j.getRandom().setSeed(12345);
    DataSet d1 = new DataSet(Nd4j.rand(1, nIn), Nd4j.rand(1, nOut));
    DataSet d2 = new DataSet(Nd4j.rand(1, nIn), Nd4j.rand(1, nOut));
    JavaRDD<DataSet> rddData = sc.parallelize(Arrays.asList(d1, d2));
    sparkNet.fit(rddData);
}
Also used : MultiDataSet(org.nd4j.linalg.dataset.MultiDataSet) DataSet(org.nd4j.linalg.dataset.DataSet) NeuralNetConfiguration(org.deeplearning4j.nn.conf.NeuralNetConfiguration) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) DenseLayer(org.deeplearning4j.nn.conf.layers.DenseLayer) SparkDl4jMultiLayer(org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer) BaseSparkTest(org.deeplearning4j.spark.BaseSparkTest) Test(org.junit.Test)

Example 15 with SparkDl4jMultiLayer

use of org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer in project deeplearning4j by deeplearning4j.

the class TestSparkMultiLayerParameterAveraging method testEvaluation.

@Test
public void testEvaluation() {
    SparkDl4jMultiLayer sparkNet = getBasicNetwork();
    MultiLayerNetwork netCopy = sparkNet.getNetwork().clone();
    Evaluation evalExpected = new Evaluation();
    INDArray outLocal = netCopy.output(input, Layer.TrainingMode.TEST);
    evalExpected.eval(labels, outLocal);
    Evaluation evalActual = sparkNet.evaluate(sparkData);
    assertEquals(evalExpected.accuracy(), evalActual.accuracy(), 1e-3);
    assertEquals(evalExpected.f1(), evalActual.f1(), 1e-3);
    assertEquals(evalExpected.getNumRowCounter(), evalActual.getNumRowCounter(), 1e-3);
    assertMapEquals(evalExpected.falseNegatives(), evalActual.falseNegatives());
    assertMapEquals(evalExpected.falsePositives(), evalActual.falsePositives());
    assertMapEquals(evalExpected.trueNegatives(), evalActual.trueNegatives());
    assertMapEquals(evalExpected.truePositives(), evalActual.truePositives());
    assertEquals(evalExpected.precision(), evalActual.precision(), 1e-3);
    assertEquals(evalExpected.recall(), evalActual.recall(), 1e-3);
    assertEquals(evalExpected.getConfusionMatrix(), evalActual.getConfusionMatrix());
}
Also used : Evaluation(org.deeplearning4j.eval.Evaluation) INDArray(org.nd4j.linalg.api.ndarray.INDArray) SparkDl4jMultiLayer(org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer) MultiLayerNetwork(org.deeplearning4j.nn.multilayer.MultiLayerNetwork) BaseSparkTest(org.deeplearning4j.spark.BaseSparkTest) Test(org.junit.Test)

Aggregations

SparkDl4jMultiLayer (org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer)23 Test (org.junit.Test)22 DataSet (org.nd4j.linalg.dataset.DataSet)19 BaseSparkTest (org.deeplearning4j.spark.BaseSparkTest)18 MultiLayerConfiguration (org.deeplearning4j.nn.conf.MultiLayerConfiguration)17 MultiLayerNetwork (org.deeplearning4j.nn.multilayer.MultiLayerNetwork)13 INDArray (org.nd4j.linalg.api.ndarray.INDArray)13 MultiDataSet (org.nd4j.linalg.dataset.MultiDataSet)13 NeuralNetConfiguration (org.deeplearning4j.nn.conf.NeuralNetConfiguration)12 LabeledPoint (org.apache.spark.mllib.regression.LabeledPoint)10 IrisDataSetIterator (org.deeplearning4j.datasets.iterator.impl.IrisDataSetIterator)10 DenseLayer (org.deeplearning4j.nn.conf.layers.DenseLayer)9 JavaSparkContext (org.apache.spark.api.java.JavaSparkContext)6 DataSetIterator (org.nd4j.linalg.dataset.api.iterator.DataSetIterator)6 MnistDataSetIterator (org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator)5 OutputLayer (org.deeplearning4j.nn.conf.layers.OutputLayer)5 SparkTrainingStats (org.deeplearning4j.spark.api.stats.SparkTrainingStats)5 ParameterAveragingTrainingMaster (org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster)5 File (java.io.File)3 Evaluation (org.deeplearning4j.eval.Evaluation)3