Search in sources :

Example 11 with Extract

use of org.apache.gobblin.source.workunit.Extract in project incubator-gobblin by apache.

the class TaskContinuousTest method getStreamingTaskState.

private TaskState getStreamingTaskState() {
    WorkUnitState workUnitState = new WorkUnitState(WorkUnit.create(new Extract(Extract.TableType.SNAPSHOT_ONLY, this.getClass().getName(), this.getClass().getSimpleName())));
    workUnitState.setProp(ConfigurationKeys.TASK_KEY_KEY, "1234");
    TaskState taskState = new TaskState(workUnitState);
    taskState.setProp(ConfigurationKeys.METRICS_ENABLED_KEY, Boolean.toString(false));
    taskState.setProp(TaskConfigurationKeys.TASK_EXECUTION_MODE, ExecutionModel.STREAMING.name());
    taskState.setJobId("1234");
    taskState.setTaskId("testContinuousTaskId");
    return taskState;
}
Also used : WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) Extract(org.apache.gobblin.source.workunit.Extract)

Example 12 with Extract

use of org.apache.gobblin.source.workunit.Extract in project incubator-gobblin by apache.

the class TestSource method getWorkunits.

@Override
public List<WorkUnit> getWorkunits(SourceState state) {
    String nameSpace = state.getProp(ConfigurationKeys.EXTRACT_NAMESPACE_NAME_KEY);
    Extract extract1 = createExtract(TableType.SNAPSHOT_ONLY, nameSpace, "TestTable1");
    Extract extract2 = createExtract(TableType.SNAPSHOT_ONLY, nameSpace, "TestTable2");
    String sourceFileList = state.getProp(SOURCE_FILE_LIST_KEY);
    List<String> list = SPLITTER.splitToList(sourceFileList);
    List<WorkUnit> workUnits = Lists.newArrayList();
    for (int i = 0; i < list.size(); i++) {
        WorkUnit workUnit = WorkUnit.create(i % 2 == 0 ? extract1 : extract2);
        workUnit.setProp(SOURCE_FILE_KEY, list.get(i));
        workUnits.add(workUnit);
    }
    if (state.getPropAsBoolean("use.multiworkunit", false)) {
        MultiWorkUnit multiWorkUnit = MultiWorkUnit.createEmpty();
        multiWorkUnit.addWorkUnits(workUnits);
        workUnits.clear();
        workUnits.add(multiWorkUnit);
    }
    return workUnits;
}
Also used : MultiWorkUnit(org.apache.gobblin.source.workunit.MultiWorkUnit) Extract(org.apache.gobblin.source.workunit.Extract) MultiWorkUnit(org.apache.gobblin.source.workunit.MultiWorkUnit) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit)

Example 13 with Extract

use of org.apache.gobblin.source.workunit.Extract in project incubator-gobblin by apache.

the class HelloWorldSource method getWorkunits.

@Override
public List<WorkUnit> getWorkunits(SourceState state) {
    Config rootCfg = ConfigUtils.propertiesToConfig(state.getProperties());
    Config cfg = rootCfg.hasPath(CONFIG_NAMESPACE) ? rootCfg.getConfig(CONFIG_NAMESPACE) : ConfigFactory.empty();
    int numHellos = cfg.hasPath(NUM_HELLOS_KEY) ? cfg.getInt(NUM_HELLOS_KEY) : DEFAULT_NUM_HELLOS;
    Extract extract = new Extract(TableType.APPEND_ONLY, HelloWorldSource.class.getPackage().getName(), HelloWorldSource.class.getSimpleName());
    List<WorkUnit> wus = new ArrayList<>(numHellos);
    for (int i = 1; i <= numHellos; ++i) {
        WorkUnit wu = new WorkUnit(extract);
        wu.setProp(HELLO_ID_FULL_KEY, i);
        wus.add(wu);
    }
    return wus;
}
Also used : Config(com.typesafe.config.Config) ArrayList(java.util.ArrayList) Extract(org.apache.gobblin.source.workunit.Extract) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit)

Example 14 with Extract

use of org.apache.gobblin.source.workunit.Extract in project incubator-gobblin by apache.

the class StressTestingSource method getWorkunits.

@Override
public List<WorkUnit> getWorkunits(SourceState state) {
    int numWorkUnits = state.getPropAsInt(NUM_WORK_UNITS_KEY, DEFAULT_NUM_WORK_UNITS);
    Extract extract = new Extract(TableType.APPEND_ONLY, StressTestingSource.class.getPackage().getName(), StressTestingSource.class.getSimpleName());
    List<WorkUnit> wus = new ArrayList<>(numWorkUnits);
    for (int i = 1; i <= numWorkUnits; ++i) {
        WorkUnit wu = new WorkUnit(extract);
        wus.add(wu);
    }
    return wus;
}
Also used : ArrayList(java.util.ArrayList) Extract(org.apache.gobblin.source.workunit.Extract) WorkUnit(org.apache.gobblin.source.workunit.WorkUnit)

Example 15 with Extract

use of org.apache.gobblin.source.workunit.Extract in project incubator-gobblin by apache.

the class JobLauncherUtilsTest method testDeleteStagingDataWithOutWriterFilePath.

@Test
public void testDeleteStagingDataWithOutWriterFilePath() throws IOException {
    FileSystem fs = FileSystem.getLocal(new Configuration());
    String branchName0 = "fork_0";
    String branchName1 = "fork_1";
    String namespace = "gobblin.test";
    String tableName = "test-table";
    Path rootDir = new Path("gobblin-test/job-launcher-utils-test");
    Path writerStagingDir0 = new Path(rootDir, "staging" + Path.SEPARATOR + branchName0);
    Path writerStagingDir1 = new Path(rootDir, "staging" + Path.SEPARATOR + branchName1);
    Path writerOutputDir0 = new Path(rootDir, "output" + Path.SEPARATOR + branchName0);
    Path writerOutputDir1 = new Path(rootDir, "output" + Path.SEPARATOR + branchName1);
    try {
        SourceState sourceState = new SourceState();
        WorkUnitState state = new WorkUnitState(WorkUnit.create(new Extract(sourceState, TableType.APPEND_ONLY, namespace, tableName)));
        state.setProp(ConfigurationKeys.FORK_BRANCHES_KEY, "2");
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.FORK_BRANCH_NAME_KEY, 2, 0), branchName0);
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.FORK_BRANCH_NAME_KEY, 2, 1), branchName1);
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.WRITER_FILE_SYSTEM_URI, 2, 0), ConfigurationKeys.LOCAL_FS_URI);
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.WRITER_FILE_SYSTEM_URI, 2, 1), ConfigurationKeys.LOCAL_FS_URI);
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.WRITER_STAGING_DIR, 2, 0), writerStagingDir0.toString());
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.WRITER_STAGING_DIR, 2, 1), writerStagingDir1.toString());
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.WRITER_OUTPUT_DIR, 2, 0), writerOutputDir0.toString());
        state.setProp(ForkOperatorUtils.getPropertyNameForBranch(ConfigurationKeys.WRITER_OUTPUT_DIR, 2, 1), writerOutputDir1.toString());
        Path writerStagingPath0 = new Path(writerStagingDir0, ForkOperatorUtils.getPathForBranch(state, state.getExtract().getOutputFilePath(), 2, 0));
        fs.mkdirs(writerStagingPath0);
        Path writerStagingPath1 = new Path(writerStagingDir1, ForkOperatorUtils.getPathForBranch(state, state.getExtract().getOutputFilePath(), 2, 1));
        fs.mkdirs(writerStagingPath1);
        Path writerOutputPath0 = new Path(writerOutputDir0, ForkOperatorUtils.getPathForBranch(state, state.getExtract().getOutputFilePath(), 2, 0));
        fs.mkdirs(writerOutputPath0);
        Path writerOutputPath1 = new Path(writerOutputDir1, ForkOperatorUtils.getPathForBranch(state, state.getExtract().getOutputFilePath(), 2, 1));
        fs.mkdirs(writerOutputPath1);
        JobLauncherUtils.cleanTaskStagingData(state, LoggerFactory.getLogger(JobLauncherUtilsTest.class));
        Assert.assertFalse(fs.exists(writerStagingPath0));
        Assert.assertFalse(fs.exists(writerStagingPath1));
        Assert.assertFalse(fs.exists(writerOutputPath0));
        Assert.assertFalse(fs.exists(writerOutputPath1));
    } finally {
        fs.delete(rootDir, true);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) SourceState(org.apache.gobblin.configuration.SourceState) Configuration(org.apache.hadoop.conf.Configuration) WorkUnitState(org.apache.gobblin.configuration.WorkUnitState) FileSystem(org.apache.hadoop.fs.FileSystem) Extract(org.apache.gobblin.source.workunit.Extract) Test(org.testng.annotations.Test)

Aggregations

Extract (org.apache.gobblin.source.workunit.Extract)29 WorkUnit (org.apache.gobblin.source.workunit.WorkUnit)24 WorkUnitState (org.apache.gobblin.configuration.WorkUnitState)11 SourceState (org.apache.gobblin.configuration.SourceState)8 Test (org.testng.annotations.Test)7 Path (org.apache.hadoop.fs.Path)6 MultiWorkUnit (org.apache.gobblin.source.workunit.MultiWorkUnit)4 IOException (java.io.IOException)3 ArrayList (java.util.ArrayList)3 Configuration (org.apache.hadoop.conf.Configuration)3 Gson (com.google.gson.Gson)2 JsonObject (com.google.gson.JsonObject)2 Config (com.typesafe.config.Config)2 InputStreamReader (java.io.InputStreamReader)2 Type (java.lang.reflect.Type)2 Map (java.util.Map)2 State (org.apache.gobblin.configuration.State)2 WatermarkInterval (org.apache.gobblin.source.extractor.WatermarkInterval)2 LongWatermark (org.apache.gobblin.source.extractor.extract.LongWatermark)2 TableType (org.apache.gobblin.source.workunit.Extract.TableType)2