Search in sources :

Example 1 with SourceState

use of org.apache.gobblin.configuration.SourceState in project incubator-gobblin by apache.

the class AbstractSourceTest method testGetPreviousWorkUnitStatesOnFullRetryPartialCommit.

/**
 * Test when work unit retry policy is on full, but the job commit policy is "partial".
 */
@Test
public void testGetPreviousWorkUnitStatesOnFullRetryPartialCommit() {
    SourceState sourceState = new SourceState(new State(), this.previousWorkUnitStates);
    sourceState.setProp(ConfigurationKeys.WORK_UNIT_RETRY_POLICY_KEY, "onfull");
    sourceState.setProp(ConfigurationKeys.JOB_COMMIT_POLICY_KEY, "partial");
    Assert.assertEquals(this.testSource.getPreviousWorkUnitStatesForRetry(sourceState), Collections.EMPTY_LIST);
}
Also used : SourceState(org.apache.gobblin.configuration.SourceState) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) State(org.apache.gobblin.configuration.State) WorkingState(org.apache.gobblin.configuration.WorkUnitState.WorkingState) SourceState(org.apache.gobblin.configuration.SourceState) Test(org.testng.annotations.Test)

Example 2 with SourceState

use of org.apache.gobblin.configuration.SourceState in project incubator-gobblin by apache.

the class AbstractSourceTest method testGetPreviousWorkUnitStatesEnabledRetry.

/**
 * Test the always-retry policy, with WORK_UNIT_RETRY_ENABLED_KEY enabled.
 */
@Test
public void testGetPreviousWorkUnitStatesEnabledRetry() {
    SourceState sourceState = new SourceState(new State(), this.previousWorkUnitStates);
    sourceState.setProp(ConfigurationKeys.WORK_UNIT_RETRY_ENABLED_KEY, Boolean.TRUE);
    List<WorkUnitState> returnedWorkUnitStates = this.testSource.getPreviousWorkUnitStatesForRetry(sourceState);
    Assert.assertEquals(returnedWorkUnitStates, this.expectedPreviousWorkUnitStates);
}
Also used : SourceState(org.apache.gobblin.configuration.SourceState) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) State(org.apache.gobblin.configuration.State) WorkingState(org.apache.gobblin.configuration.WorkUnitState.WorkingState) SourceState(org.apache.gobblin.configuration.SourceState) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) Test(org.testng.annotations.Test)

Example 3 with SourceState

use of org.apache.gobblin.configuration.SourceState in project incubator-gobblin by apache.

the class HiveSerDeTest method testAvroOrcSerDes.

/**
 * This test uses Avro SerDe to deserialize data from Avro files, and use ORC SerDe
 * to serialize them into ORC files.
 */
@Test(groups = { "gobblin.serde" })
public void testAvroOrcSerDes() throws IOException, DataRecordException, DataConversionException {
    Properties properties = new Properties();
    properties.load(new FileReader("gobblin-core/src/test/resources/serde/serde.properties"));
    SourceState sourceState = new SourceState(new State(properties), ImmutableList.<WorkUnitState>of());
    OldApiWritableFileSource source = new OldApiWritableFileSource();
    List<WorkUnit> workUnits = source.getWorkunits(sourceState);
    Assert.assertEquals(workUnits.size(), 1);
    WorkUnitState wus = new WorkUnitState(workUnits.get(0));
    wus.addAll(sourceState);
    Closer closer = Closer.create();
    HiveWritableHdfsDataWriter writer = null;
    try {
        OldApiWritableFileExtractor extractor = closer.register((OldApiWritableFileExtractor) source.getExtractor(wus));
        HiveSerDeConverter converter = closer.register(new HiveSerDeConverter());
        writer = closer.register((HiveWritableHdfsDataWriter) new HiveWritableHdfsDataWriterBuilder<>().withBranches(1).withWriterId("0").writeTo(Destination.of(DestinationType.HDFS, sourceState)).writeInFormat(WriterOutputFormat.ORC).build());
        converter.init(wus);
        Writable record;
        while ((record = extractor.readRecord(null)) != null) {
            Iterable<Writable> convertedRecordIterable = converter.convertRecordImpl(null, record, wus);
            Assert.assertEquals(Iterators.size(convertedRecordIterable.iterator()), 1);
            writer.write(convertedRecordIterable.iterator().next());
        }
    } catch (Throwable t) {
        throw closer.rethrow(t);
    } finally {
        closer.close();
        if (writer != null) {
            writer.commit();
        }
        Assert.assertTrue(this.fs.exists(new Path(sourceState.getProp(ConfigurationKeys.WRITER_OUTPUT_DIR), sourceState.getProp(ConfigurationKeys.WRITER_FILE_NAME))));
        HadoopUtils.deletePath(this.fs, new Path(sourceState.getProp(ConfigurationKeys.WRITER_OUTPUT_DIR)), true);
    }
}
Also used : Closer(com.google.common.io.Closer) Path(org.apache.hadoop.fs.Path) SourceState(org.apache.gobblin.configuration.SourceState) OldApiWritableFileExtractor(org.apache.gobblin.source.extractor.hadoop.OldApiWritableFileExtractor) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) HiveSerDeConverter(org.apache.gobblin.converter.serde.HiveSerDeConverter) Writable(org.apache.hadoop.io.Writable) Properties(java.util.Properties) HiveWritableHdfsDataWriterBuilder(org.apache.gobblin.writer.HiveWritableHdfsDataWriterBuilder) HiveWritableHdfsDataWriter(org.apache.gobblin.writer.HiveWritableHdfsDataWriter) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) State(org.apache.gobblin.configuration.State) SourceState(org.apache.gobblin.configuration.SourceState) FileReader(java.io.FileReader) OldApiWritableFileSource(org.apache.gobblin.source.extractor.hadoop.OldApiWritableFileSource) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit) Test(org.testng.annotations.Test)

Example 4 with SourceState

use of org.apache.gobblin.configuration.SourceState in project incubator-gobblin by apache.

the class RegexBasedPartitionedRetrieverTest method testSnapshotRegex.

@Test
public void testSnapshotRegex() throws IOException {
    String snapshotRegex = "(\\d+)-PT-\\d+";
    RegexBasedPartitionedRetriever r = new RegexBasedPartitionedRetriever("txt");
    SourceState state = new SourceState();
    state.setProp(ConfigurationKeys.SOURCE_FILEBASED_FS_URI, "file:///");
    state.setProp(ConfigurationKeys.SOURCE_FILEBASED_DATA_DIRECTORY, tempDir.toString());
    state.setProp(PartitionedFileSourceBase.DATE_PARTITIONED_SOURCE_PARTITION_PATTERN, snapshotRegex);
    r.init(state);
    List<PartitionAwareFileRetriever.FileInfo> files = r.getFilesToProcess(DateToUse.APR_3_2017.getValue() - 1, 9999);
    Assert.assertEquals(files.size(), 3);
    verifyFile(files.get(0), DateToUse.APR_3_2017.getValue());
    verifyFile(files.get(1), DateToUse.MAY_1_2017.getValue());
    verifyFile(files.get(2), DateToUse.TWENTY_THREE_HOURS_AGO.getValue());
}
Also used : SourceState(org.apache.gobblin.configuration.SourceState) Test(org.testng.annotations.Test)

Example 5 with SourceState

use of org.apache.gobblin.configuration.SourceState in project incubator-gobblin by apache.

the class QueryBasedSourceTest method testGetTableSpecificPropsFromState.

@Test
public void testGetTableSpecificPropsFromState() {
    SourceState state = new SourceState();
    state.setProp(DatasetUtils.DATASET_SPECIFIC_PROPS, "[{\"dataset\":\"Entity1\", \"value\": 1}, {\"dataset\":\"Table2\", \"value\":2}]");
    // We should look in the dataset specific properties using the entity name, not table name
    SourceEntity se1 = new SourceEntity("Entity1", "Table2");
    SourceEntity se3 = new SourceEntity("Entity3", "Table3");
    Set<SourceEntity> entities = ImmutableSet.of(se1, se3);
    Map<SourceEntity, State> datasetProps = QueryBasedSource.getTableSpecificPropsFromState(entities, state);
    // Value 1 should be returned for se1, no prpos should be returned for se3
    Assert.assertEquals(datasetProps.size(), 1);
    Assert.assertTrue(datasetProps.containsKey(se1));
    State se1Props = datasetProps.get(se1);
    Assert.assertEquals(se1Props.getProp("value"), "1");
}
Also used : SourceState(org.apache.gobblin.configuration.SourceState) SourceEntity(org.apache.gobblin.source.extractor.extract.QueryBasedSource.SourceEntity) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) State(org.apache.gobblin.configuration.State) WorkingState(org.apache.gobblin.configuration.WorkUnitState.WorkingState) SourceState(org.apache.gobblin.configuration.SourceState) Test(org.testng.annotations.Test)

Aggregations

SourceState (org.apache.gobblin.configuration.SourceState)90 Test (org.testng.annotations.Test)76 WorkUnitState (org.apache.gobblin.configuration.WorkUnitState)44 WorkUnit (org.apache.gobblin.source.workunit.WorkUnit)38 State (org.apache.gobblin.configuration.State)30 WorkingState (org.apache.gobblin.configuration.WorkUnitState.WorkingState)11 Partition (org.apache.hadoop.hive.ql.metadata.Partition)8 Table (org.apache.hadoop.hive.ql.metadata.Table)8 IterableDatasetFinder (org.apache.gobblin.dataset.IterableDatasetFinder)7 LongWatermark (org.apache.gobblin.source.extractor.extract.LongWatermark)7 Extract (org.apache.gobblin.source.workunit.Extract)7 DateTime (org.joda.time.DateTime)7 Dataset (org.apache.gobblin.dataset.Dataset)6 PartitionableDataset (org.apache.gobblin.dataset.PartitionableDataset)6 MultiWorkUnit (org.apache.gobblin.source.workunit.MultiWorkUnit)6 WorkUnitStream (org.apache.gobblin.source.workunit.WorkUnitStream)6 IOException (java.io.IOException)5 Path (org.apache.hadoop.fs.Path)5 Gson (com.google.gson.Gson)4 JsonObject (com.google.gson.JsonObject)4