Search in sources :

Example 31 with LongWatermark

use of org.apache.gobblin.source.extractor.extract.LongWatermark in project incubator-gobblin by apache.

the class FineGrainedWatermarkTrackerBenchmark method scheduledDelayedAcks.

@Benchmark
@Group("scheduledDelayed")
public void scheduledDelayedAcks(Control control, TrackerState trackerState) throws Exception {
    if (!control.stopMeasurement) {
        final AcknowledgableWatermark wmark = new AcknowledgableWatermark(new DefaultCheckpointableWatermark("0", new LongWatermark(trackerState._index)));
        trackerState._index++;
        int delay = trackerState._random.nextInt(10);
        trackerState._executorService.schedule(new Runnable() {

            @Override
            public void run() {
                wmark.ack();
            }
        }, delay, TimeUnit.MILLISECONDS);
    }
}
Also used : DefaultCheckpointableWatermark(org.apache.gobblin.source.extractor.DefaultCheckpointableWatermark) LongWatermark(org.apache.gobblin.source.extractor.extract.LongWatermark) Group(org.openjdk.jmh.annotations.Group) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Example 32 with LongWatermark

use of org.apache.gobblin.source.extractor.extract.LongWatermark in project incubator-gobblin by apache.

the class FineGrainedWatermarkTrackerBenchmark method scheduledNoRandomDelayedAcks.

@Benchmark
@Group("scheduledNoRandom")
public void scheduledNoRandomDelayedAcks(Control control, TrackerState trackerState) throws Exception {
    if (!control.stopMeasurement) {
        final AcknowledgableWatermark wmark = new AcknowledgableWatermark(new DefaultCheckpointableWatermark("0", new LongWatermark(trackerState._index)));
        trackerState._index++;
        int delay = 10;
        trackerState._executorService.schedule(new Runnable() {

            @Override
            public void run() {
                wmark.ack();
            }
        }, delay, TimeUnit.MILLISECONDS);
    }
}
Also used : DefaultCheckpointableWatermark(org.apache.gobblin.source.extractor.DefaultCheckpointableWatermark) LongWatermark(org.apache.gobblin.source.extractor.extract.LongWatermark) Group(org.openjdk.jmh.annotations.Group) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Example 33 with LongWatermark

use of org.apache.gobblin.source.extractor.extract.LongWatermark in project incubator-gobblin by apache.

the class FineGrainedWatermarkTrackerBenchmark method trackImmediateAcks.

@Benchmark
@Group("trackImmediate")
public void trackImmediateAcks(Control control, TrackerState trackerState) throws Exception {
    if (!control.stopMeasurement) {
        AcknowledgableWatermark wmark = new AcknowledgableWatermark(new DefaultCheckpointableWatermark("0", new LongWatermark(trackerState._index)));
        trackerState._watermarkTracker.track(wmark);
        trackerState._index++;
        wmark.ack();
    }
}
Also used : DefaultCheckpointableWatermark(org.apache.gobblin.source.extractor.DefaultCheckpointableWatermark) LongWatermark(org.apache.gobblin.source.extractor.extract.LongWatermark) Group(org.openjdk.jmh.annotations.Group) Benchmark(org.openjdk.jmh.annotations.Benchmark)

Example 34 with LongWatermark

use of org.apache.gobblin.source.extractor.extract.LongWatermark in project incubator-gobblin by apache.

the class SequentialTestSource method getWorkunits.

@Override
public List<WorkUnit> getWorkunits(SourceState state) {
    configureIfNeeded(ConfigFactory.parseProperties(state.getProperties()));
    final List<WorkUnitState> previousWorkUnitStates = state.getPreviousWorkUnitStates();
    if (!previousWorkUnitStates.isEmpty()) {
        List<WorkUnit> newWorkUnits = Lists.newArrayListWithCapacity(previousWorkUnitStates.size());
        int i = 0;
        for (WorkUnitState workUnitState : previousWorkUnitStates) {
            WorkUnit workUnit;
            if (workUnitState.getWorkingState().equals(WorkUnitState.WorkingState.COMMITTED)) {
                LongWatermark watermark = workUnitState.getActualHighWatermark(LongWatermark.class);
                LongWatermark expectedWatermark = new LongWatermark(watermark.getValue() + numRecordsPerExtract);
                WatermarkInterval watermarkInterval = new WatermarkInterval(watermark, expectedWatermark);
                workUnit = WorkUnit.create(newExtract(tableType, namespace, table), watermarkInterval);
                log.debug("Will be setting watermark interval to " + watermarkInterval.toJson());
                workUnit.setProp(WORK_UNIT_INDEX, workUnitState.getWorkunit().getProp(WORK_UNIT_INDEX));
            } else {
                // retry
                LongWatermark watermark = workUnitState.getWorkunit().getLowWatermark(LongWatermark.class);
                LongWatermark expectedWatermark = new LongWatermark(watermark.getValue() + numRecordsPerExtract);
                WatermarkInterval watermarkInterval = new WatermarkInterval(watermark, expectedWatermark);
                workUnit = WorkUnit.create(newExtract(tableType, namespace, table), watermarkInterval);
                log.debug("Will be setting watermark interval to " + watermarkInterval.toJson());
                workUnit.setProp(WORK_UNIT_INDEX, workUnitState.getWorkunit().getProp(WORK_UNIT_INDEX));
            }
            newWorkUnits.add(workUnit);
        }
        return newWorkUnits;
    } else {
        return initialWorkUnits();
    }
}
Also used : WatermarkInterval(org.apache.gobblin.source.extractor.WatermarkInterval) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit) LongWatermark(org.apache.gobblin.source.extractor.extract.LongWatermark)

Example 35 with LongWatermark

use of org.apache.gobblin.source.extractor.extract.LongWatermark in project incubator-gobblin by apache.

the class SourceHadoopFsEndPoint method getWatermark.

@Override
public synchronized Optional<ComparableWatermark> getWatermark() {
    if (this.initialized) {
        return this.cachedWatermark;
    }
    try {
        long curTs = -1;
        FileSystem fs = FileSystem.get(rc.getFsURI(), new Configuration());
        Collection<Path> validPaths = ReplicationDataValidPathPicker.getValidPaths(this);
        for (Path p : validPaths) {
            this.allFileStatus.addAll(FileListUtils.listFilesRecursively(fs, p, super.getPathFilter(), super.isApplyFilterToDirectories()));
        }
        for (FileStatus f : this.allFileStatus) {
            if (f.getModificationTime() > curTs) {
                curTs = f.getModificationTime();
            }
        }
        ComparableWatermark result = new LongWatermark(curTs);
        this.cachedWatermark = Optional.of(result);
        if (this.cachedWatermark.isPresent()) {
            this.initialized = true;
        }
        return this.cachedWatermark;
    } catch (IOException e) {
        log.error("Error while retrieve the watermark for " + this);
        return this.cachedWatermark;
    }
}
Also used : Path(org.apache.hadoop.fs.Path) FileStatus(org.apache.hadoop.fs.FileStatus) ComparableWatermark(org.apache.gobblin.source.extractor.ComparableWatermark) Configuration(org.apache.hadoop.conf.Configuration) FileSystem(org.apache.hadoop.fs.FileSystem) IOException(java.io.IOException) LongWatermark(org.apache.gobblin.source.extractor.extract.LongWatermark)

Aggregations

LongWatermark (org.apache.gobblin.source.extractor.extract.LongWatermark)35 Test (org.testng.annotations.Test)16 DefaultCheckpointableWatermark (org.apache.gobblin.source.extractor.DefaultCheckpointableWatermark)12 WorkUnitState (org.apache.gobblin.configuration.WorkUnitState)10 CheckpointableWatermark (org.apache.gobblin.source.extractor.CheckpointableWatermark)9 SourceState (org.apache.gobblin.configuration.SourceState)7 State (org.apache.gobblin.configuration.State)7 WatermarkInterval (org.apache.gobblin.source.extractor.WatermarkInterval)6 IOException (java.io.IOException)5 RecordEnvelope (org.apache.gobblin.stream.RecordEnvelope)5 WorkUnit (org.apache.gobblin.source.workunit.WorkUnit)4 Partition (org.apache.hadoop.hive.ql.metadata.Partition)4 Random (java.util.Random)3 TreeSet (java.util.TreeSet)3 AtomicInteger (java.util.concurrent.atomic.AtomicInteger)3 ComparableWatermark (org.apache.gobblin.source.extractor.ComparableWatermark)3 Path (org.apache.hadoop.fs.Path)3 Benchmark (org.openjdk.jmh.annotations.Benchmark)3 Group (org.openjdk.jmh.annotations.Group)3 Config (com.typesafe.config.Config)2