Search in sources :

Example 1 with JobExecutionPlan

use of org.apache.gobblin.service.modules.spec.JobExecutionPlan in project incubator-gobblin by apache.

the class MultiHopFlowCompilerTest method testCompileCombinedDatasetFlow.

@Test(dependsOnMethods = "testCompileMultiDatasetFlow")
public void testCompileCombinedDatasetFlow() throws Exception {
    FlowSpec spec = createFlowSpec("flow/flow4.conf", "HDFS-1", "HDFS-3", true, false);
    Dag<JobExecutionPlan> dag = specCompiler.compileFlow(spec);
    // Should be 2 jobs, each containing 3 datasets
    Assert.assertEquals(dag.getNodes().size(), 2);
    Assert.assertEquals(dag.getEndNodes().size(), 1);
    Assert.assertEquals(dag.getStartNodes().size(), 1);
    String copyJobName = Joiner.on(JobExecutionPlan.Factory.JOB_NAME_COMPONENT_SEPARATION_CHAR).join("testFlowGroup", "testFlowName", "Distcp", "HDFS-1", "HDFS-3", "hdfsToHdfs");
    Config jobConfig = dag.getStartNodes().get(0).getValue().getJobSpec().getConfig();
    String jobName = jobConfig.getString(ConfigurationKeys.JOB_NAME_KEY);
    Assert.assertTrue(jobName.startsWith(copyJobName));
    Assert.assertTrue(jobConfig.getString(ConfigurableGlobDatasetFinder.DATASET_FINDER_PATTERN_KEY).endsWith("{dataset0,dataset1,dataset2}"));
    String retentionJobName = Joiner.on(JobExecutionPlan.Factory.JOB_NAME_COMPONENT_SEPARATION_CHAR).join("testFlowGroup", "testFlowName", "SnapshotRetention", "HDFS-3", "HDFS-3", "hdfsRetention");
    Config jobConfig2 = dag.getEndNodes().get(0).getValue().getJobSpec().getConfig();
    String jobName2 = jobConfig2.getString(ConfigurationKeys.JOB_NAME_KEY);
    Assert.assertTrue(jobName2.startsWith(retentionJobName));
    Assert.assertTrue(jobConfig2.getString(ConfigurableGlobDatasetFinder.DATASET_FINDER_PATTERN_KEY).endsWith("{dataset0,dataset1,dataset2}"));
}
Also used : JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) AzkabanProjectConfig(org.apache.gobblin.service.modules.orchestration.AzkabanProjectConfig) Config(com.typesafe.config.Config) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) Test(org.testng.annotations.Test)

Example 2 with JobExecutionPlan

use of org.apache.gobblin.service.modules.spec.JobExecutionPlan in project incubator-gobblin by apache.

the class MultiHopFlowCompilerTest method testMissingSourceNodeError.

@Test(dependsOnMethods = "testUnresolvedFlow")
public void testMissingSourceNodeError() throws Exception {
    FlowSpec spec = createFlowSpec("flow/flow5.conf", "HDFS-NULL", "HDFS-3", false, false);
    Dag<JobExecutionPlan> dag = specCompiler.compileFlow(spec);
    Assert.assertEquals(dag, null);
    Assert.assertEquals(spec.getCompilationErrors().size(), 1);
    spec.getCompilationErrors().stream().anyMatch(s -> s.errorMessage.contains("Flowgraph does not have a node with id"));
}
Also used : JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) Test(org.testng.annotations.Test)

Example 3 with JobExecutionPlan

use of org.apache.gobblin.service.modules.spec.JobExecutionPlan in project incubator-gobblin by apache.

the class MultiHopFlowCompilerTest method testUnresolvedFlow.

@Test(dependsOnMethods = "testCompileCombinedDatasetFlow")
public void testUnresolvedFlow() throws Exception {
    FlowSpec spec = createFlowSpec("flow/flow5.conf", "HDFS-1", "HDFS-3", false, false);
    Dag<JobExecutionPlan> dag = specCompiler.compileFlow(spec);
    Assert.assertNull(dag);
    Assert.assertEquals(spec.getCompilationErrors().stream().map(c -> c.errorMessage).collect(Collectors.toSet()).size(), 1);
    spec.getCompilationErrors().stream().anyMatch(s -> s.errorMessage.contains(AzkabanProjectConfig.USER_TO_PROXY));
}
Also used : AzkabanProjectConfig(org.apache.gobblin.service.modules.orchestration.AzkabanProjectConfig) SpecProducer(org.apache.gobblin.runtime.api.SpecProducer) FileSystem(org.apache.hadoop.fs.FileSystem) URISyntaxException(java.net.URISyntaxException) Test(org.testng.annotations.Test) FileStatus(org.apache.hadoop.fs.FileStatus) FlowGraphConfigurationKeys(org.apache.gobblin.service.modules.flowgraph.FlowGraphConfigurationKeys) GobblinConstructorUtils(org.apache.gobblin.util.reflection.GobblinConstructorUtils) Future(java.util.concurrent.Future) ConfigBuilder(org.apache.gobblin.config.ConfigBuilder) FlowEdgeFactory(org.apache.gobblin.service.modules.flowgraph.FlowEdgeFactory) Optional(com.google.common.base.Optional) Map(java.util.Map) Configuration(org.apache.hadoop.conf.Configuration) Path(org.apache.hadoop.fs.Path) CompletedFuture(org.apache.gobblin.util.CompletedFuture) PathUtils(org.apache.gobblin.util.PathUtils) URI(java.net.URI) ServiceConfigKeys(org.apache.gobblin.service.ServiceConfigKeys) RepositoryCache(org.eclipse.jgit.lib.RepositoryCache) FlowEdge(org.apache.gobblin.service.modules.flowgraph.FlowEdge) SpecExecutor(org.apache.gobblin.runtime.api.SpecExecutor) GitAPIException(org.eclipse.jgit.api.errors.GitAPIException) RefSpec(org.eclipse.jgit.transport.RefSpec) BeforeClass(org.testng.annotations.BeforeClass) Set(java.util.Set) AbstractSpecExecutor(org.apache.gobblin.runtime.spec_executorInstance.AbstractSpecExecutor) Collectors(java.util.stream.Collectors) DataNode(org.apache.gobblin.service.modules.flowgraph.DataNode) List(java.util.List) Slf4j(lombok.extern.slf4j.Slf4j) FS(org.eclipse.jgit.util.FS) FSFlowTemplateCatalog(org.apache.gobblin.service.modules.template_catalog.FSFlowTemplateCatalog) Joiner(com.google.common.base.Joiner) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) PathFilter(org.apache.hadoop.fs.PathFilter) HashMap(java.util.HashMap) ConfigUtils(org.apache.gobblin.util.ConfigUtils) ArrayList(java.util.ArrayList) TopologySpec(org.apache.gobblin.runtime.api.TopologySpec) HashSet(java.util.HashSet) Lists(com.google.common.collect.Lists) Charset(java.nio.charset.Charset) Assert(org.testng.Assert) JobSpec(org.apache.gobblin.runtime.api.JobSpec) Files(com.google.common.io.Files) ConfigFactory(com.typesafe.config.ConfigFactory) BaseFlowGraph(org.apache.gobblin.service.modules.flowgraph.BaseFlowGraph) Spec(org.apache.gobblin.runtime.api.Spec) AfterClass(org.testng.annotations.AfterClass) Charsets(com.google.common.base.Charsets) Properties(java.util.Properties) ConfigSyntax(com.typesafe.config.ConfigSyntax) Dag(org.apache.gobblin.service.modules.flowgraph.Dag) Config(com.typesafe.config.Config) SystemUtils(org.apache.commons.lang3.SystemUtils) IOException(java.io.IOException) FileUtils(org.apache.commons.io.FileUtils) ConfigurationKeys(org.apache.gobblin.configuration.ConfigurationKeys) InputStreamReader(java.io.InputStreamReader) File(java.io.File) TimeUnit(java.util.concurrent.TimeUnit) ConfigurableGlobDatasetFinder(org.apache.gobblin.data.management.retention.profile.ConfigurableGlobDatasetFinder) FlowGraph(org.apache.gobblin.service.modules.flowgraph.FlowGraph) ConfigParseOptions(com.typesafe.config.ConfigParseOptions) DagNode(org.apache.gobblin.service.modules.flowgraph.Dag.DagNode) Git(org.eclipse.jgit.api.Git) Repository(org.eclipse.jgit.lib.Repository) JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) InputStream(java.io.InputStream) GitFlowGraphMonitor(org.apache.gobblin.service.modules.core.GitFlowGraphMonitor) JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) Test(org.testng.annotations.Test)

Example 4 with JobExecutionPlan

use of org.apache.gobblin.service.modules.spec.JobExecutionPlan in project incubator-gobblin by apache.

the class MultiHopFlowCompilerTest method testCompileFlowAfterFirstEdgeDeletion.

@Test(dependsOnMethods = "testCompileFlowWithRetention")
public void testCompileFlowAfterFirstEdgeDeletion() throws URISyntaxException, IOException {
    // Delete the self edge on HDFS-1 that performs convert-to-json-and-encrypt.
    this.flowGraph.deleteFlowEdge("HDFS-1_HDFS-1_hdfsConvertToJsonAndEncrypt");
    FlowSpec spec = createFlowSpec("flow/flow1.conf", "LocalFS-1", "ADLS-1", false, false);
    Dag<JobExecutionPlan> jobDag = this.specCompiler.compileFlow(spec);
    Assert.assertEquals(jobDag.getNodes().size(), 4);
    Assert.assertEquals(jobDag.getStartNodes().size(), 1);
    Assert.assertEquals(jobDag.getEndNodes().size(), 1);
    // Get the 1st hop - Distcp from "LocalFS-1" to "HDFS-2"
    DagNode<JobExecutionPlan> startNode = jobDag.getStartNodes().get(0);
    JobExecutionPlan jobExecutionPlan = startNode.getValue();
    JobSpec jobSpec = jobExecutionPlan.getJobSpec();
    // Ensure the resolved job config for the first hop has the correct substitutions.
    Config jobConfig = jobSpec.getConfig();
    String flowGroup = "testFlowGroup";
    String flowName = "testFlowName";
    String expectedJobName1 = Joiner.on(JobExecutionPlan.Factory.JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, "Distcp", "LocalFS-1", "HDFS-2", "localToHdfs");
    String jobName1 = jobConfig.getString(ConfigurationKeys.JOB_NAME_KEY);
    Assert.assertTrue(jobName1.startsWith(expectedJobName1));
    String from = jobConfig.getString("from");
    String to = jobConfig.getString("to");
    Assert.assertEquals(from, "/data/out/testTeam/testDataset");
    Assert.assertEquals(to, "/data/out/testTeam/testDataset");
    String sourceFsUri = jobConfig.getString("fs.uri");
    Assert.assertEquals(sourceFsUri, "file:///");
    Assert.assertEquals(jobConfig.getString("source.filebased.fs.uri"), sourceFsUri);
    Assert.assertEquals(jobConfig.getString("state.store.fs.uri"), sourceFsUri);
    String targetFsUri = jobConfig.getString("target.filebased.fs.uri");
    Assert.assertEquals(targetFsUri, "hdfs://hadoopnn02.grid.linkedin.com:8888/");
    Assert.assertEquals(jobConfig.getString("writer.fs.uri"), targetFsUri);
    Assert.assertEquals(new Path(jobConfig.getString("gobblin.dataset.pattern")), new Path(from));
    Assert.assertEquals(jobConfig.getString("data.publisher.final.dir"), to);
    Assert.assertEquals(jobConfig.getString("type"), "java");
    Assert.assertEquals(jobConfig.getString("job.class"), "org.apache.gobblin.runtime.local.LocalJobLauncher");
    Assert.assertEquals(jobConfig.getString("launcher.type"), "LOCAL");
    // Ensure the spec executor has the correct configurations
    SpecExecutor specExecutor = jobExecutionPlan.getSpecExecutor();
    Assert.assertEquals(specExecutor.getUri().toString(), "fs:///");
    Assert.assertEquals(specExecutor.getClass().getCanonicalName(), "org.apache.gobblin.runtime.spec_executorInstance.InMemorySpecExecutor");
    // Get the 2nd hop - "HDFS-2 to HDFS-2 : convert avro to json and encrypt"
    Assert.assertEquals(jobDag.getChildren(startNode).size(), 1);
    DagNode<JobExecutionPlan> secondHopNode = jobDag.getChildren(startNode).get(0);
    jobExecutionPlan = secondHopNode.getValue();
    jobConfig = jobExecutionPlan.getJobSpec().getConfig();
    String expectedJobName2 = Joiner.on(JobExecutionPlan.Factory.JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, "ConvertToJsonAndEncrypt", "HDFS-2", "HDFS-2", "hdfsConvertToJsonAndEncrypt");
    String jobName2 = jobConfig.getString(ConfigurationKeys.JOB_NAME_KEY);
    Assert.assertTrue(jobName2.startsWith(expectedJobName2));
    Assert.assertEquals(jobConfig.getString(ConfigurationKeys.JOB_DEPENDENCIES), jobName1);
    from = jobConfig.getString("from");
    to = jobConfig.getString("to");
    Assert.assertEquals(from, "/data/out/testTeam/testDataset");
    Assert.assertEquals(to, "/data/encrypted/testTeam/testDataset");
    Assert.assertEquals(jobConfig.getString("source.filebased.data.directory"), from);
    Assert.assertEquals(jobConfig.getString("data.publisher.final.dir"), to);
    specExecutor = jobExecutionPlan.getSpecExecutor();
    Assert.assertEquals(specExecutor.getUri().toString(), "https://azkaban02.gobblin.net:8443");
    Assert.assertEquals(specExecutor.getClass().getCanonicalName(), "org.apache.gobblin.service.modules.flow.MultiHopFlowCompilerTest.TestAzkabanSpecExecutor");
    // Get the 3rd hop - "Distcp HDFS-2 to HDFS-4"
    Assert.assertEquals(jobDag.getChildren(secondHopNode).size(), 1);
    DagNode<JobExecutionPlan> thirdHopNode = jobDag.getChildren(secondHopNode).get(0);
    jobExecutionPlan = thirdHopNode.getValue();
    jobConfig = jobExecutionPlan.getJobSpec().getConfig();
    String expectedJobName3 = Joiner.on(JobExecutionPlan.Factory.JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, "Distcp", "HDFS-2", "HDFS-4", "hdfsToHdfs");
    String jobName3 = jobConfig.getString(ConfigurationKeys.JOB_NAME_KEY);
    Assert.assertTrue(jobName3.startsWith(expectedJobName3));
    Assert.assertEquals(jobConfig.getString(ConfigurationKeys.JOB_DEPENDENCIES), jobName2);
    from = jobConfig.getString("from");
    to = jobConfig.getString("to");
    Assert.assertEquals(from, "/data/encrypted/testTeam/testDataset");
    Assert.assertEquals(to, "/data/encrypted/testTeam/testDataset");
    Assert.assertEquals(jobConfig.getString("source.filebased.fs.uri"), "hdfs://hadoopnn02.grid.linkedin.com:8888/");
    Assert.assertEquals(jobConfig.getString("target.filebased.fs.uri"), "hdfs://hadoopnn04.grid.linkedin.com:8888/");
    Assert.assertEquals(jobConfig.getString("type"), "hadoopJava");
    Assert.assertEquals(jobConfig.getString("job.class"), "org.apache.gobblin.azkaban.AzkabanJobLauncher");
    Assert.assertEquals(jobConfig.getString("launcher.type"), "MAPREDUCE");
    // Ensure the spec executor has the correct configurations
    specExecutor = jobExecutionPlan.getSpecExecutor();
    Assert.assertEquals(specExecutor.getUri().toString(), "https://azkaban02.gobblin.net:8443");
    Assert.assertEquals(specExecutor.getClass().getCanonicalName(), "org.apache.gobblin.service.modules.flow.MultiHopFlowCompilerTest.TestAzkabanSpecExecutor");
    // Get the 4th hop - "Distcp from HDFS-4 to ADLS-1"
    Assert.assertEquals(jobDag.getChildren(thirdHopNode).size(), 1);
    DagNode<JobExecutionPlan> fourthHopNode = jobDag.getChildren(thirdHopNode).get(0);
    jobExecutionPlan = fourthHopNode.getValue();
    jobConfig = jobExecutionPlan.getJobSpec().getConfig();
    String expectedJobName4 = Joiner.on(JobExecutionPlan.Factory.JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, flowName, "DistcpToADL", "HDFS-4", "ADLS-1", "hdfsToAdl");
    String jobName4 = jobConfig.getString(ConfigurationKeys.JOB_NAME_KEY);
    Assert.assertTrue(jobName4.startsWith(expectedJobName4));
    Assert.assertEquals(jobConfig.getString(ConfigurationKeys.JOB_DEPENDENCIES), jobName3);
    from = jobConfig.getString("from");
    to = jobConfig.getString("to");
    Assert.assertEquals(from, "/data/encrypted/testTeam/testDataset");
    Assert.assertEquals(to, "/data/encrypted/testTeam/testDataset");
    Assert.assertEquals(jobConfig.getString("source.filebased.fs.uri"), "hdfs://hadoopnn04.grid.linkedin.com:8888/");
    Assert.assertEquals(jobConfig.getString("target.filebased.fs.uri"), "adl://azuredatalakestore.net/");
    Assert.assertEquals(jobConfig.getString("type"), "hadoopJava");
    Assert.assertEquals(jobConfig.getString("job.class"), "org.apache.gobblin.azkaban.AzkabanJobLauncher");
    Assert.assertEquals(jobConfig.getString("launcher.type"), "MAPREDUCE");
    Assert.assertEquals(jobConfig.getString("dfs.adls.oauth2.client.id"), "1234");
    Assert.assertEquals(jobConfig.getString("writer.encrypted.dfs.adls.oauth2.credential"), "credential");
    Assert.assertEquals(jobConfig.getString("encrypt.key.loc"), "/user/testUser/master.password");
    // Ensure the spec executor has the correct configurations
    specExecutor = jobExecutionPlan.getSpecExecutor();
    Assert.assertEquals(specExecutor.getUri().toString(), "https://azkaban04.gobblin.net:8443");
    Assert.assertEquals(specExecutor.getClass().getCanonicalName(), "org.apache.gobblin.service.modules.flow.MultiHopFlowCompilerTest.TestAzkabanSpecExecutor");
    // Ensure the fourth hop is the last
    Assert.assertEquals(jobDag.getEndNodes().get(0), fourthHopNode);
}
Also used : Path(org.apache.hadoop.fs.Path) JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) AzkabanProjectConfig(org.apache.gobblin.service.modules.orchestration.AzkabanProjectConfig) Config(com.typesafe.config.Config) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) SpecExecutor(org.apache.gobblin.runtime.api.SpecExecutor) AbstractSpecExecutor(org.apache.gobblin.runtime.spec_executorInstance.AbstractSpecExecutor) JobSpec(org.apache.gobblin.runtime.api.JobSpec) Test(org.testng.annotations.Test)

Example 5 with JobExecutionPlan

use of org.apache.gobblin.service.modules.spec.JobExecutionPlan in project incubator-gobblin by apache.

the class MultiHopFlowCompilerTest method testCompileFlowAfterSecondEdgeDeletion.

@Test(dependsOnMethods = "testCompileFlowAfterFirstEdgeDeletion")
public void testCompileFlowAfterSecondEdgeDeletion() throws URISyntaxException, IOException {
    // Delete the self edge on HDFS-2 that performs convert-to-json-and-encrypt.
    this.flowGraph.deleteFlowEdge("HDFS-2_HDFS-2_hdfsConvertToJsonAndEncrypt");
    FlowSpec spec = createFlowSpec("flow/flow1.conf", "LocalFS-1", "ADLS-1", false, false);
    Dag<JobExecutionPlan> jobDag = this.specCompiler.compileFlow(spec);
    // Ensure no path to destination.
    Assert.assertEquals(jobDag, null);
}
Also used : JobExecutionPlan(org.apache.gobblin.service.modules.spec.JobExecutionPlan) FlowSpec(org.apache.gobblin.runtime.api.FlowSpec) Test(org.testng.annotations.Test)

Aggregations

JobExecutionPlan (org.apache.gobblin.service.modules.spec.JobExecutionPlan)39 Config (com.typesafe.config.Config)22 FlowSpec (org.apache.gobblin.runtime.api.FlowSpec)21 Test (org.testng.annotations.Test)21 JobSpec (org.apache.gobblin.runtime.api.JobSpec)15 ArrayList (java.util.ArrayList)12 Dag (org.apache.gobblin.service.modules.flowgraph.Dag)12 SpecExecutor (org.apache.gobblin.runtime.api.SpecExecutor)10 AzkabanProjectConfig (org.apache.gobblin.service.modules.orchestration.AzkabanProjectConfig)8 JobExecutionPlanDagFactory (org.apache.gobblin.service.modules.spec.JobExecutionPlanDagFactory)8 URI (java.net.URI)7 Spec (org.apache.gobblin.runtime.api.Spec)6 TopologySpec (org.apache.gobblin.runtime.api.TopologySpec)6 IOException (java.io.IOException)5 DagNode (org.apache.gobblin.service.modules.flowgraph.Dag.DagNode)5 File (java.io.File)4 HashSet (java.util.HashSet)4 Path (org.apache.hadoop.fs.Path)4 Joiner (com.google.common.base.Joiner)3 Optional (com.google.common.base.Optional)3