Search in sources :

Example 1 with ParameterAveragingTrainingMaster

use of org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster in project deeplearning4j by deeplearning4j.

the class TestEarlyStoppingSpark method testBadTuning.

@Test
public void testBadTuning() {
    //Test poor tuning (high LR): should terminate on MaxScoreIterationTerminationCondition
    Nd4j.getRandom().setSeed(12345);
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().seed(12345).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1).updater(Updater.SGD).learningRate(//Intentionally huge LR
    10.0).weightInit(WeightInit.XAVIER).list().layer(0, new OutputLayer.Builder().nIn(4).nOut(3).activation(Activation.IDENTITY).lossFunction(LossFunctions.LossFunction.MSE).build()).pretrain(false).backprop(true).build();
    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    net.setListeners(new ScoreIterationListener(1));
    JavaRDD<DataSet> irisData = getIris();
    EarlyStoppingModelSaver<MultiLayerNetwork> saver = new InMemoryModelSaver<>();
    EarlyStoppingConfiguration<MultiLayerNetwork> esConf = new EarlyStoppingConfiguration.Builder<MultiLayerNetwork>().epochTerminationConditions(new MaxEpochsTerminationCondition(5000)).iterationTerminationConditions(new MaxTimeIterationTerminationCondition(1, TimeUnit.MINUTES), //Initial score is ~2.5
    new MaxScoreIterationTerminationCondition(7.5)).scoreCalculator(new SparkDataSetLossCalculator(irisData, true, sc.sc())).modelSaver(saver).build();
    IEarlyStoppingTrainer<MultiLayerNetwork> trainer = new SparkEarlyStoppingTrainer(getContext().sc(), new ParameterAveragingTrainingMaster(true, 4, 1, 150 / 4, 1, 0), esConf, net, irisData);
    EarlyStoppingResult result = trainer.fit();
    assertTrue(result.getTotalEpochs() < 5);
    assertEquals(EarlyStoppingResult.TerminationReason.IterationTerminationCondition, result.getTerminationReason());
    String expDetails = new MaxScoreIterationTerminationCondition(7.5).toString();
    assertEquals(expDetails, result.getTerminationDetails());
}
Also used : OutputLayer(org.deeplearning4j.nn.conf.layers.OutputLayer) InMemoryModelSaver(org.deeplearning4j.earlystopping.saver.InMemoryModelSaver) MaxEpochsTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxEpochsTerminationCondition) DataSet(org.nd4j.linalg.dataset.DataSet) SparkEarlyStoppingTrainer(org.deeplearning4j.spark.earlystopping.SparkEarlyStoppingTrainer) SparkDataSetLossCalculator(org.deeplearning4j.spark.earlystopping.SparkDataSetLossCalculator) NeuralNetConfiguration(org.deeplearning4j.nn.conf.NeuralNetConfiguration) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) EarlyStoppingResult(org.deeplearning4j.earlystopping.EarlyStoppingResult) EarlyStoppingConfiguration(org.deeplearning4j.earlystopping.EarlyStoppingConfiguration) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) MaxScoreIterationTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxScoreIterationTerminationCondition) MultiLayerNetwork(org.deeplearning4j.nn.multilayer.MultiLayerNetwork) ScoreIterationListener(org.deeplearning4j.optimize.listeners.ScoreIterationListener) MaxTimeIterationTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxTimeIterationTerminationCondition) Test(org.junit.Test)

Example 2 with ParameterAveragingTrainingMaster

use of org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster in project deeplearning4j by deeplearning4j.

the class TestEarlyStoppingSpark method testNoImprovementNEpochsTermination.

@Test
public void testNoImprovementNEpochsTermination() {
    //Idea: terminate training if score (test set loss) does not improve for 5 consecutive epochs
    //Simulate this by setting LR = 0.0
    Nd4j.getRandom().setSeed(12345);
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().seed(12345).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1).updater(Updater.SGD).learningRate(0.0).weightInit(WeightInit.XAVIER).list().layer(0, new OutputLayer.Builder().nIn(4).nOut(3).lossFunction(LossFunctions.LossFunction.MCXENT).build()).pretrain(false).backprop(true).build();
    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    net.setListeners(new ScoreIterationListener(1));
    JavaRDD<DataSet> irisData = getIris();
    EarlyStoppingModelSaver<MultiLayerNetwork> saver = new InMemoryModelSaver<>();
    EarlyStoppingConfiguration<MultiLayerNetwork> esConf = new EarlyStoppingConfiguration.Builder<MultiLayerNetwork>().epochTerminationConditions(new MaxEpochsTerminationCondition(100), new ScoreImprovementEpochTerminationCondition(5)).iterationTerminationConditions(//Initial score is ~2.5
    new MaxScoreIterationTerminationCondition(7.5)).scoreCalculator(new SparkDataSetLossCalculator(irisData, true, sc.sc())).modelSaver(saver).build();
    IEarlyStoppingTrainer<MultiLayerNetwork> trainer = new SparkEarlyStoppingTrainer(getContext().sc(), new ParameterAveragingTrainingMaster(true, 4, 1, 150 / 10, 1, 0), esConf, net, irisData);
    EarlyStoppingResult result = trainer.fit();
    //Expect no score change due to 0 LR -> terminate after 6 total epochs
    //Normally expect 6 epochs exactly; get a little more than that here due to rounding + order of operations
    assertTrue(result.getTotalEpochs() < 12);
    assertEquals(EarlyStoppingResult.TerminationReason.EpochTerminationCondition, result.getTerminationReason());
    String expDetails = new ScoreImprovementEpochTerminationCondition(5).toString();
    assertEquals(expDetails, result.getTerminationDetails());
}
Also used : InMemoryModelSaver(org.deeplearning4j.earlystopping.saver.InMemoryModelSaver) MaxEpochsTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxEpochsTerminationCondition) DataSet(org.nd4j.linalg.dataset.DataSet) ScoreImprovementEpochTerminationCondition(org.deeplearning4j.earlystopping.termination.ScoreImprovementEpochTerminationCondition) SparkEarlyStoppingTrainer(org.deeplearning4j.spark.earlystopping.SparkEarlyStoppingTrainer) SparkDataSetLossCalculator(org.deeplearning4j.spark.earlystopping.SparkDataSetLossCalculator) NeuralNetConfiguration(org.deeplearning4j.nn.conf.NeuralNetConfiguration) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) EarlyStoppingResult(org.deeplearning4j.earlystopping.EarlyStoppingResult) EarlyStoppingConfiguration(org.deeplearning4j.earlystopping.EarlyStoppingConfiguration) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) MaxScoreIterationTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxScoreIterationTerminationCondition) MultiLayerNetwork(org.deeplearning4j.nn.multilayer.MultiLayerNetwork) ScoreIterationListener(org.deeplearning4j.optimize.listeners.ScoreIterationListener) Test(org.junit.Test)

Example 3 with ParameterAveragingTrainingMaster

use of org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster in project deeplearning4j by deeplearning4j.

the class TestEarlyStoppingSpark method testTimeTermination.

@Test
public void testTimeTermination() {
    //test termination after max time
    Nd4j.getRandom().setSeed(12345);
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().seed(12345).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1).updater(Updater.SGD).learningRate(1e-6).weightInit(WeightInit.XAVIER).list().layer(0, new OutputLayer.Builder().nIn(4).nOut(3).lossFunction(LossFunctions.LossFunction.MCXENT).build()).pretrain(false).backprop(true).build();
    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    net.setListeners(new ScoreIterationListener(1));
    JavaRDD<DataSet> irisData = getIris();
    EarlyStoppingModelSaver<MultiLayerNetwork> saver = new InMemoryModelSaver<>();
    EarlyStoppingConfiguration<MultiLayerNetwork> esConf = new EarlyStoppingConfiguration.Builder<MultiLayerNetwork>().epochTerminationConditions(new MaxEpochsTerminationCondition(10000)).iterationTerminationConditions(new MaxTimeIterationTerminationCondition(3, TimeUnit.SECONDS), //Initial score is ~2.5
    new MaxScoreIterationTerminationCondition(7.5)).scoreCalculator(new SparkDataSetLossCalculator(irisData, true, sc.sc())).modelSaver(saver).build();
    IEarlyStoppingTrainer<MultiLayerNetwork> trainer = new SparkEarlyStoppingTrainer(getContext().sc(), new ParameterAveragingTrainingMaster(true, 4, 1, 150 / 15, 1, 0), esConf, net, irisData);
    long startTime = System.currentTimeMillis();
    EarlyStoppingResult result = trainer.fit();
    long endTime = System.currentTimeMillis();
    int durationSeconds = (int) (endTime - startTime) / 1000;
    assertTrue("durationSeconds = " + durationSeconds, durationSeconds >= 3);
    assertTrue("durationSeconds = " + durationSeconds, durationSeconds <= 9);
    assertEquals(EarlyStoppingResult.TerminationReason.IterationTerminationCondition, result.getTerminationReason());
    String expDetails = new MaxTimeIterationTerminationCondition(3, TimeUnit.SECONDS).toString();
    assertEquals(expDetails, result.getTerminationDetails());
}
Also used : InMemoryModelSaver(org.deeplearning4j.earlystopping.saver.InMemoryModelSaver) MaxEpochsTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxEpochsTerminationCondition) DataSet(org.nd4j.linalg.dataset.DataSet) SparkEarlyStoppingTrainer(org.deeplearning4j.spark.earlystopping.SparkEarlyStoppingTrainer) SparkDataSetLossCalculator(org.deeplearning4j.spark.earlystopping.SparkDataSetLossCalculator) NeuralNetConfiguration(org.deeplearning4j.nn.conf.NeuralNetConfiguration) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) EarlyStoppingResult(org.deeplearning4j.earlystopping.EarlyStoppingResult) EarlyStoppingConfiguration(org.deeplearning4j.earlystopping.EarlyStoppingConfiguration) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) MaxScoreIterationTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxScoreIterationTerminationCondition) MultiLayerNetwork(org.deeplearning4j.nn.multilayer.MultiLayerNetwork) ScoreIterationListener(org.deeplearning4j.optimize.listeners.ScoreIterationListener) MaxTimeIterationTerminationCondition(org.deeplearning4j.earlystopping.termination.MaxTimeIterationTerminationCondition) Test(org.junit.Test)

Example 4 with ParameterAveragingTrainingMaster

use of org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster in project deeplearning4j by deeplearning4j.

the class TestSparkComputationGraph method testBasic.

@Test
public void testBasic() throws Exception {
    JavaSparkContext sc = this.sc;
    RecordReader rr = new CSVRecordReader(0, ",");
    rr.initialize(new FileSplit(new ClassPathResource("iris.txt").getTempFileFromArchive()));
    MultiDataSetIterator iter = new RecordReaderMultiDataSetIterator.Builder(1).addReader("iris", rr).addInput("iris", 0, 3).addOutputOneHot("iris", 4, 3).build();
    List<MultiDataSet> list = new ArrayList<>(150);
    while (iter.hasNext()) list.add(iter.next());
    ComputationGraphConfiguration config = new NeuralNetConfiguration.Builder().optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).learningRate(0.1).graphBuilder().addInputs("in").addLayer("dense", new DenseLayer.Builder().nIn(4).nOut(2).build(), "in").addLayer("out", new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT).nIn(2).nOut(3).build(), "dense").setOutputs("out").pretrain(false).backprop(true).build();
    ComputationGraph cg = new ComputationGraph(config);
    cg.init();
    TrainingMaster tm = new ParameterAveragingTrainingMaster(true, numExecutors(), 1, 10, 1, 0);
    SparkComputationGraph scg = new SparkComputationGraph(sc, cg, tm);
    scg.setListeners(Collections.singleton((IterationListener) new ScoreIterationListener(1)));
    JavaRDD<MultiDataSet> rdd = sc.parallelize(list);
    scg.fitMultiDataSet(rdd);
    //Try: fitting using DataSet
    DataSetIterator iris = new IrisDataSetIterator(1, 150);
    List<DataSet> list2 = new ArrayList<>();
    while (iris.hasNext()) list2.add(iris.next());
    JavaRDD<DataSet> rddDS = sc.parallelize(list2);
    scg.fit(rddDS);
}
Also used : IrisDataSetIterator(org.deeplearning4j.datasets.iterator.impl.IrisDataSetIterator) DataSet(org.nd4j.linalg.dataset.DataSet) MultiDataSet(org.nd4j.linalg.dataset.api.MultiDataSet) RecordReader(org.datavec.api.records.reader.RecordReader) CSVRecordReader(org.datavec.api.records.reader.impl.csv.CSVRecordReader) FileSplit(org.datavec.api.split.FileSplit) TrainingMaster(org.deeplearning4j.spark.api.TrainingMaster) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) RecordReaderMultiDataSetIterator(org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator) MultiDataSetIterator(org.nd4j.linalg.dataset.api.iterator.MultiDataSetIterator) CSVRecordReader(org.datavec.api.records.reader.impl.csv.CSVRecordReader) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) ComputationGraph(org.deeplearning4j.nn.graph.ComputationGraph) ScoreIterationListener(org.deeplearning4j.optimize.listeners.ScoreIterationListener) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) ClassPathResource(org.nd4j.linalg.io.ClassPathResource) MultiDataSet(org.nd4j.linalg.dataset.api.MultiDataSet) DenseLayer(org.deeplearning4j.nn.conf.layers.DenseLayer) ComputationGraphConfiguration(org.deeplearning4j.nn.conf.ComputationGraphConfiguration) IterationListener(org.deeplearning4j.optimize.api.IterationListener) ScoreIterationListener(org.deeplearning4j.optimize.listeners.ScoreIterationListener) IrisDataSetIterator(org.deeplearning4j.datasets.iterator.impl.IrisDataSetIterator) DataSetIterator(org.nd4j.linalg.dataset.api.iterator.DataSetIterator) RecordReaderMultiDataSetIterator(org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator) MultiDataSetIterator(org.nd4j.linalg.dataset.api.iterator.MultiDataSetIterator) BaseSparkTest(org.deeplearning4j.spark.BaseSparkTest) Test(org.junit.Test)

Example 5 with ParameterAveragingTrainingMaster

use of org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster in project deeplearning4j by deeplearning4j.

the class TestTrainingStatsCollection method testStatsCollection.

@Test
public void testStatsCollection() throws Exception {
    int nWorkers = 4;
    SparkConf sparkConf = new SparkConf();
    sparkConf.setMaster("local[" + nWorkers + "]");
    sparkConf.setAppName("Test");
    JavaSparkContext sc = new JavaSparkContext(sparkConf);
    try {
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1).list().layer(0, new DenseLayer.Builder().nIn(10).nOut(10).build()).layer(1, new OutputLayer.Builder().nIn(10).nOut(10).build()).pretrain(false).backprop(true).build();
        int miniBatchSizePerWorker = 10;
        int averagingFrequency = 5;
        int numberOfAveragings = 3;
        int totalExamples = nWorkers * miniBatchSizePerWorker * averagingFrequency * numberOfAveragings;
        Nd4j.getRandom().setSeed(12345);
        List<DataSet> list = new ArrayList<>();
        for (int i = 0; i < totalExamples; i++) {
            INDArray f = Nd4j.rand(1, 10);
            INDArray l = Nd4j.rand(1, 10);
            DataSet ds = new DataSet(f, l);
            list.add(ds);
        }
        JavaRDD<DataSet> rdd = sc.parallelize(list);
        rdd.repartition(4);
        ParameterAveragingTrainingMaster tm = new ParameterAveragingTrainingMaster.Builder(nWorkers, 1).averagingFrequency(averagingFrequency).batchSizePerWorker(miniBatchSizePerWorker).saveUpdater(true).workerPrefetchNumBatches(0).repartionData(Repartition.Always).build();
        SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);
        sparkNet.setCollectTrainingStats(true);
        sparkNet.fit(rdd);
        //Collect the expected keys:
        List<String> expectedStatNames = new ArrayList<>();
        Class<?>[] classes = new Class[] { CommonSparkTrainingStats.class, ParameterAveragingTrainingMasterStats.class, ParameterAveragingTrainingWorkerStats.class };
        String[] fieldNames = new String[] { "columnNames", "columnNames", "columnNames" };
        for (int i = 0; i < classes.length; i++) {
            Field field = classes[i].getDeclaredField(fieldNames[i]);
            field.setAccessible(true);
            Object f = field.get(null);
            Collection<String> c = (Collection<String>) f;
            expectedStatNames.addAll(c);
        }
        System.out.println(expectedStatNames);
        SparkTrainingStats stats = sparkNet.getSparkTrainingStats();
        Set<String> actualKeySet = stats.getKeySet();
        assertEquals(expectedStatNames.size(), actualKeySet.size());
        for (String s : stats.getKeySet()) {
            assertTrue(expectedStatNames.contains(s));
            assertNotNull(stats.getValue(s));
        }
        String statsAsString = stats.statsAsString();
        System.out.println(statsAsString);
        //One line per stat
        assertEquals(actualKeySet.size(), statsAsString.split("\n").length);
        //Go through nested stats
        //First: master stats
        assertTrue(stats instanceof ParameterAveragingTrainingMasterStats);
        ParameterAveragingTrainingMasterStats masterStats = (ParameterAveragingTrainingMasterStats) stats;
        List<EventStats> exportTimeStats = masterStats.getParameterAveragingMasterExportTimesMs();
        assertEquals(1, exportTimeStats.size());
        assertDurationGreaterZero(exportTimeStats);
        assertNonNullFields(exportTimeStats);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(exportTimeStats, 1, 1, 1);
        List<EventStats> countRddTime = masterStats.getParameterAveragingMasterCountRddSizeTimesMs();
        //occurs once per fit
        assertEquals(1, countRddTime.size());
        assertDurationGreaterEqZero(countRddTime);
        assertNonNullFields(countRddTime);
        //should occur only in master once
        assertExpectedNumberMachineIdsJvmIdsThreadIds(countRddTime, 1, 1, 1);
        List<EventStats> broadcastCreateTime = masterStats.getParameterAveragingMasterBroadcastCreateTimesMs();
        assertEquals(numberOfAveragings, broadcastCreateTime.size());
        assertDurationGreaterEqZero(broadcastCreateTime);
        assertNonNullFields(broadcastCreateTime);
        //only 1 thread for master
        assertExpectedNumberMachineIdsJvmIdsThreadIds(broadcastCreateTime, 1, 1, 1);
        List<EventStats> fitTimes = masterStats.getParameterAveragingMasterFitTimesMs();
        //i.e., number of times fit(JavaRDD<DataSet>) was called
        assertEquals(1, fitTimes.size());
        assertDurationGreaterZero(fitTimes);
        assertNonNullFields(fitTimes);
        //only 1 thread for master
        assertExpectedNumberMachineIdsJvmIdsThreadIds(fitTimes, 1, 1, 1);
        List<EventStats> splitTimes = masterStats.getParameterAveragingMasterSplitTimesMs();
        //Splitting of the data set is executed once only (i.e., one fit(JavaRDD<DataSet>) call)
        assertEquals(1, splitTimes.size());
        assertDurationGreaterEqZero(splitTimes);
        assertNonNullFields(splitTimes);
        //only 1 thread for master
        assertExpectedNumberMachineIdsJvmIdsThreadIds(splitTimes, 1, 1, 1);
        List<EventStats> aggregateTimesMs = masterStats.getParamaterAveragingMasterAggregateTimesMs();
        assertEquals(numberOfAveragings, aggregateTimesMs.size());
        assertDurationGreaterEqZero(aggregateTimesMs);
        assertNonNullFields(aggregateTimesMs);
        //only 1 thread for master
        assertExpectedNumberMachineIdsJvmIdsThreadIds(aggregateTimesMs, 1, 1, 1);
        List<EventStats> processParamsTimesMs = masterStats.getParameterAveragingMasterProcessParamsUpdaterTimesMs();
        assertEquals(numberOfAveragings, processParamsTimesMs.size());
        assertDurationGreaterEqZero(processParamsTimesMs);
        assertNonNullFields(processParamsTimesMs);
        //only 1 thread for master
        assertExpectedNumberMachineIdsJvmIdsThreadIds(processParamsTimesMs, 1, 1, 1);
        List<EventStats> repartitionTimesMs = masterStats.getParameterAveragingMasterRepartitionTimesMs();
        assertEquals(numberOfAveragings, repartitionTimesMs.size());
        assertDurationGreaterEqZero(repartitionTimesMs);
        assertNonNullFields(repartitionTimesMs);
        //only 1 thread for master
        assertExpectedNumberMachineIdsJvmIdsThreadIds(repartitionTimesMs, 1, 1, 1);
        //Second: Common spark training stats
        SparkTrainingStats commonStats = masterStats.getNestedTrainingStats();
        assertNotNull(commonStats);
        assertTrue(commonStats instanceof CommonSparkTrainingStats);
        CommonSparkTrainingStats cStats = (CommonSparkTrainingStats) commonStats;
        List<EventStats> workerFlatMapTotalTimeMs = cStats.getWorkerFlatMapTotalTimeMs();
        assertEquals(numberOfAveragings * nWorkers, workerFlatMapTotalTimeMs.size());
        assertDurationGreaterZero(workerFlatMapTotalTimeMs);
        assertNonNullFields(workerFlatMapTotalTimeMs);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(workerFlatMapTotalTimeMs, 1, 1, nWorkers);
        List<EventStats> workerFlatMapGetInitialModelTimeMs = cStats.getWorkerFlatMapGetInitialModelTimeMs();
        assertEquals(numberOfAveragings * nWorkers, workerFlatMapGetInitialModelTimeMs.size());
        assertDurationGreaterEqZero(workerFlatMapGetInitialModelTimeMs);
        assertNonNullFields(workerFlatMapGetInitialModelTimeMs);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(workerFlatMapGetInitialModelTimeMs, 1, 1, nWorkers);
        List<EventStats> workerFlatMapDataSetGetTimesMs = cStats.getWorkerFlatMapDataSetGetTimesMs();
        int numMinibatchesProcessed = workerFlatMapDataSetGetTimesMs.size();
        //1 for every time we get a data set
        int expectedNumMinibatchesProcessed = numberOfAveragings * nWorkers * averagingFrequency;
        //Sometimes random split is just bad - some executors might miss out on getting the expected amount of data
        assertTrue(numMinibatchesProcessed >= expectedNumMinibatchesProcessed - 5);
        List<EventStats> workerFlatMapProcessMiniBatchTimesMs = cStats.getWorkerFlatMapProcessMiniBatchTimesMs();
        assertTrue(workerFlatMapProcessMiniBatchTimesMs.size() >= numberOfAveragings * nWorkers * averagingFrequency - 5);
        assertDurationGreaterEqZero(workerFlatMapProcessMiniBatchTimesMs);
        assertNonNullFields(workerFlatMapDataSetGetTimesMs);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(workerFlatMapDataSetGetTimesMs, 1, 1, nWorkers);
        //Third: ParameterAveragingTrainingWorker stats
        SparkTrainingStats paramAvgStats = cStats.getNestedTrainingStats();
        assertNotNull(paramAvgStats);
        assertTrue(paramAvgStats instanceof ParameterAveragingTrainingWorkerStats);
        ParameterAveragingTrainingWorkerStats pStats = (ParameterAveragingTrainingWorkerStats) paramAvgStats;
        List<EventStats> parameterAveragingWorkerBroadcastGetValueTimeMs = pStats.getParameterAveragingWorkerBroadcastGetValueTimeMs();
        assertEquals(numberOfAveragings * nWorkers, parameterAveragingWorkerBroadcastGetValueTimeMs.size());
        assertDurationGreaterEqZero(parameterAveragingWorkerBroadcastGetValueTimeMs);
        assertNonNullFields(parameterAveragingWorkerBroadcastGetValueTimeMs);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(parameterAveragingWorkerBroadcastGetValueTimeMs, 1, 1, nWorkers);
        List<EventStats> parameterAveragingWorkerInitTimeMs = pStats.getParameterAveragingWorkerInitTimeMs();
        assertEquals(numberOfAveragings * nWorkers, parameterAveragingWorkerInitTimeMs.size());
        assertDurationGreaterEqZero(parameterAveragingWorkerInitTimeMs);
        assertNonNullFields(parameterAveragingWorkerInitTimeMs);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(parameterAveragingWorkerInitTimeMs, 1, 1, nWorkers);
        List<EventStats> parameterAveragingWorkerFitTimesMs = pStats.getParameterAveragingWorkerFitTimesMs();
        assertTrue(parameterAveragingWorkerFitTimesMs.size() >= numberOfAveragings * nWorkers * averagingFrequency - 5);
        assertDurationGreaterEqZero(parameterAveragingWorkerFitTimesMs);
        assertNonNullFields(parameterAveragingWorkerFitTimesMs);
        assertExpectedNumberMachineIdsJvmIdsThreadIds(parameterAveragingWorkerFitTimesMs, 1, 1, nWorkers);
        assertNull(pStats.getNestedTrainingStats());
        //Finally: try exporting stats
        String tempDir = System.getProperty("java.io.tmpdir");
        String outDir = FilenameUtils.concat(tempDir, "dl4j_testTrainingStatsCollection");
        stats.exportStatFiles(outDir, sc.sc());
        String htmlPlotsPath = FilenameUtils.concat(outDir, "AnalysisPlots.html");
        StatsUtils.exportStatsAsHtml(stats, htmlPlotsPath, sc);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        StatsUtils.exportStatsAsHTML(stats, baos);
        baos.close();
        byte[] bytes = baos.toByteArray();
        String str = new String(bytes, "UTF-8");
    //            System.out.println(str);
    } finally {
        sc.stop();
    }
}
Also used : OutputLayer(org.deeplearning4j.nn.conf.layers.OutputLayer) ParameterAveragingTrainingMasterStats(org.deeplearning4j.spark.impl.paramavg.stats.ParameterAveragingTrainingMasterStats) DataSet(org.nd4j.linalg.dataset.DataSet) CommonSparkTrainingStats(org.deeplearning4j.spark.api.stats.CommonSparkTrainingStats) SparkTrainingStats(org.deeplearning4j.spark.api.stats.SparkTrainingStats) Field(java.lang.reflect.Field) EventStats(org.deeplearning4j.spark.stats.EventStats) MultiLayerConfiguration(org.deeplearning4j.nn.conf.MultiLayerConfiguration) SparkDl4jMultiLayer(org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer) JavaSparkContext(org.apache.spark.api.java.JavaSparkContext) ParameterAveragingTrainingWorkerStats(org.deeplearning4j.spark.impl.paramavg.stats.ParameterAveragingTrainingWorkerStats) ByteArrayOutputStream(java.io.ByteArrayOutputStream) ParameterAveragingTrainingMaster(org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster) INDArray(org.nd4j.linalg.api.ndarray.INDArray) CommonSparkTrainingStats(org.deeplearning4j.spark.api.stats.CommonSparkTrainingStats) SparkConf(org.apache.spark.SparkConf) Test(org.junit.Test)

Aggregations

ParameterAveragingTrainingMaster (org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster)13 Test (org.junit.Test)13 DataSet (org.nd4j.linalg.dataset.DataSet)13 NeuralNetConfiguration (org.deeplearning4j.nn.conf.NeuralNetConfiguration)10 ScoreIterationListener (org.deeplearning4j.optimize.listeners.ScoreIterationListener)10 EarlyStoppingConfiguration (org.deeplearning4j.earlystopping.EarlyStoppingConfiguration)9 InMemoryModelSaver (org.deeplearning4j.earlystopping.saver.InMemoryModelSaver)9 MaxEpochsTerminationCondition (org.deeplearning4j.earlystopping.termination.MaxEpochsTerminationCondition)9 OutputLayer (org.deeplearning4j.nn.conf.layers.OutputLayer)8 MaxTimeIterationTerminationCondition (org.deeplearning4j.earlystopping.termination.MaxTimeIterationTerminationCondition)7 ComputationGraphConfiguration (org.deeplearning4j.nn.conf.ComputationGraphConfiguration)7 ComputationGraph (org.deeplearning4j.nn.graph.ComputationGraph)7 TrainingMaster (org.deeplearning4j.spark.api.TrainingMaster)7 EarlyStoppingResult (org.deeplearning4j.earlystopping.EarlyStoppingResult)6 MaxScoreIterationTerminationCondition (org.deeplearning4j.earlystopping.termination.MaxScoreIterationTerminationCondition)6 MultiLayerConfiguration (org.deeplearning4j.nn.conf.MultiLayerConfiguration)6 SparkEarlyStoppingGraphTrainer (org.deeplearning4j.spark.earlystopping.SparkEarlyStoppingGraphTrainer)5 SparkLossCalculatorComputationGraph (org.deeplearning4j.spark.earlystopping.SparkLossCalculatorComputationGraph)5 DataSetToMultiDataSetFn (org.deeplearning4j.spark.impl.graph.dataset.DataSetToMultiDataSetFn)5 MultiLayerNetwork (org.deeplearning4j.nn.multilayer.MultiLayerNetwork)4