Search in sources :

Example 6 with ProgramRunInfo

use of io.cdap.cdap.runtime.spi.ProgramRunInfo in project cdap by caskdata.

the class ElasticMapReduceProvisionerTest method testClusterName.

@Test
public void testClusterName() {
    // test basic
    ProgramRunInfo programRunInfo = new ProgramRunInfo.Builder().setNamespace("ns").setApplication("app").setVersion("1.0").setProgramType("workflow").setProgram("program").setRun(UUID.randomUUID().toString()).build();
    Assert.assertEquals("cdap-app-" + programRunInfo.getRun(), ElasticMapReduceProvisioner.getClusterName(programRunInfo));
    // test lowercasing, stripping of invalid characters, and truncation
    programRunInfo = new ProgramRunInfo.Builder().setNamespace("ns").setApplication("My@Appl!cation").setVersion("1.0").setProgramType("workflow").setProgram("program").setRun(UUID.randomUUID().toString()).build();
    Assert.assertEquals("cdap-myapplcat-" + programRunInfo.getRun(), ElasticMapReduceProvisioner.getClusterName(programRunInfo));
}
Also used : ProgramRunInfo(io.cdap.cdap.runtime.spi.ProgramRunInfo) Test(org.junit.Test)

Example 7 with ProgramRunInfo

use of io.cdap.cdap.runtime.spi.ProgramRunInfo in project cdap by cdapio.

the class DataprocRuntimeJobManager method launch.

@Override
public void launch(RuntimeJobInfo runtimeJobInfo) throws Exception {
    String bucket = DataprocUtils.getBucketName(this.bucket);
    ProgramRunInfo runInfo = runtimeJobInfo.getProgramRunInfo();
    LOG.debug("Launching run {} with following configurations: cluster {}, project {}, region {}, bucket {}.", runInfo.getRun(), clusterName, projectId, region, bucket);
    // TODO: CDAP-16408 use fixed directory for caching twill, application, artifact jars
    File tempDir = Files.createTempDirectory("dataproc.launcher").toFile();
    // on dataproc bucket the run root will be <bucket>/cdap-job/<runid>/. All the files for this run will be copied
    // under that base dir.
    String runRootPath = getPath(DataprocUtils.CDAP_GCS_ROOT, runInfo.getRun());
    try {
        // step 1: build twill.jar and launcher.jar and add them to files to be copied to gcs
        List<LocalFile> localFiles = getRuntimeLocalFiles(runtimeJobInfo.getLocalizeFiles(), tempDir);
        // step 2: upload all the necessary files to gcs so that those files are available to dataproc job
        List<Future<LocalFile>> uploadFutures = new ArrayList<>();
        for (LocalFile fileToUpload : localFiles) {
            String targetFilePath = getPath(runRootPath, fileToUpload.getName());
            uploadFutures.add(provisionerContext.execute(() -> uploadFile(bucket, targetFilePath, fileToUpload)).toCompletableFuture());
        }
        List<LocalFile> uploadedFiles = new ArrayList<>();
        for (Future<LocalFile> uploadFuture : uploadFutures) {
            uploadedFiles.add(uploadFuture.get());
        }
        // step 3: build the hadoop job request to be submitted to dataproc
        SubmitJobRequest request = getSubmitJobRequest(runtimeJobInfo, uploadedFiles);
        // step 4: submit hadoop job to dataproc
        try {
            Job job = getJobControllerClient().submitJob(request);
            LOG.debug("Successfully submitted hadoop job {} to cluster {}.", job.getReference().getJobId(), clusterName);
        } catch (AlreadyExistsException ex) {
            // the job id already exists, ignore the job.
            LOG.warn("The dataproc job {} already exists. Ignoring resubmission of the job.", request.getJob().getReference().getJobId());
        }
        DataprocUtils.emitMetric(provisionerContext, region, "provisioner.submitJob.response.count");
    } catch (Exception e) {
        // delete all uploaded gcs files in case of exception
        DataprocUtils.deleteGCSPath(getStorageClient(), bucket, runRootPath);
        DataprocUtils.emitMetric(provisionerContext, region, "provisioner.submitJob.response.count", e);
        throw new Exception(String.format("Error while launching job %s on cluster %s", getJobId(runInfo), clusterName), e);
    } finally {
        // delete local temp directory
        deleteDirectoryContents(tempDir);
    }
}
Also used : AlreadyExistsException(com.google.api.gax.rpc.AlreadyExistsException) ArrayList(java.util.ArrayList) SubmitJobRequest(com.google.cloud.dataproc.v1beta2.SubmitJobRequest) AlreadyExistsException(com.google.api.gax.rpc.AlreadyExistsException) IOException(java.io.IOException) ApiException(com.google.api.gax.rpc.ApiException) StorageException(com.google.cloud.storage.StorageException) DefaultLocalFile(org.apache.twill.internal.DefaultLocalFile) LocalFile(org.apache.twill.api.LocalFile) Future(java.util.concurrent.Future) HadoopJob(com.google.cloud.dataproc.v1beta2.HadoopJob) Job(com.google.cloud.dataproc.v1beta2.Job) DefaultLocalFile(org.apache.twill.internal.DefaultLocalFile) LocalFile(org.apache.twill.api.LocalFile) File(java.io.File) ProgramRunInfo(io.cdap.cdap.runtime.spi.ProgramRunInfo)

Example 8 with ProgramRunInfo

use of io.cdap.cdap.runtime.spi.ProgramRunInfo in project cdap by cdapio.

the class DataprocProvisionerTest method testClusterReuseOnCreate.

@Test
public void testClusterReuseOnCreate() throws Exception {
    context.addProperty("accountKey", "testKey");
    context.addProperty(DataprocConf.PROJECT_ID_KEY, "testProject");
    context.addProperty("region", "testRegion");
    context.addProperty("idleTTL", "5");
    context.addProperty(DataprocConf.SKIP_DELETE, "true");
    context.setProfileName("testProfile");
    ProgramRunInfo programRunInfo = new ProgramRunInfo.Builder().setNamespace("ns").setApplication("app").setVersion("1.0").setProgramType("workflow").setProgram("program").setRun("runId").build();
    context.setProgramRunInfo(programRunInfo);
    // A. Check with existing client, probably after a retry
    Mockito.when(dataprocClient.getClusters(null, Collections.singletonMap(AbstractDataprocProvisioner.LABEL_RUN_KEY, "cdap-app-runId"))).thenAnswer(i -> Stream.of(cluster));
    Mockito.when(cluster.getStatus()).thenReturn(ClusterStatus.RUNNING);
    Assert.assertEquals(cluster, provisioner.createCluster(context));
    // B. With preallocated cluster in "bad" state new allocation should happen
    DataprocConf conf = DataprocConf.create(provisioner.createContextProperties(context));
    Mockito.when(cluster.getStatus()).thenReturn(ClusterStatus.FAILED);
    Mockito.when(cluster2.getName()).thenReturn("cluster2");
    Mockito.when(dataprocClient.getClusters(Mockito.eq(ClusterStatus.RUNNING), Mockito.eq(ImmutableMap.of(AbstractDataprocProvisioner.LABEL_VERSON, "6_4", AbstractDataprocProvisioner.LABEL_REUSE_UNTIL, "*", AbstractDataprocProvisioner.LABEL_REUSE_KEY, conf.getClusterReuseKey(), AbstractDataprocProvisioner.LABEL_PROFILE, "testProfile")), Mockito.any())).thenAnswer(i -> Stream.of(cluster2));
    Assert.assertEquals(cluster2, provisioner.createCluster(context));
    Mockito.verify(dataprocClient).updateClusterLabels("cluster2", Collections.singletonMap(AbstractDataprocProvisioner.LABEL_RUN_KEY, "cdap-app-runId"), Collections.singleton(AbstractDataprocProvisioner.LABEL_REUSE_UNTIL));
}
Also used : ProgramRunInfo(io.cdap.cdap.runtime.spi.ProgramRunInfo) Test(org.junit.Test)

Example 9 with ProgramRunInfo

use of io.cdap.cdap.runtime.spi.ProgramRunInfo in project cdap by cdapio.

the class DataprocProvisionerTest method testRunKey.

@Test
public void testRunKey() throws Exception {
    // test basic
    ProgramRunInfo programRunInfo = new ProgramRunInfo.Builder().setNamespace("ns").setApplication("app").setVersion("1.0").setProgramType("workflow").setProgram("program").setRun(UUID.randomUUID().toString()).build();
    Assert.assertEquals("cdap-app-" + programRunInfo.getRun(), new DataprocProvisioner().getRunKey(new MockProvisionerContext(programRunInfo)));
    // test lowercasing, stripping of invalid characters, and truncation
    programRunInfo = new ProgramRunInfo.Builder().setNamespace("ns").setApplication("My@Appl!cation").setVersion("1.0").setProgramType("workflow").setProgram("program").setRun(UUID.randomUUID().toString()).build();
    Assert.assertEquals("cdap-myapplcat-" + programRunInfo.getRun(), new DataprocProvisioner().getRunKey(new MockProvisionerContext(programRunInfo)));
}
Also used : ProgramRunInfo(io.cdap.cdap.runtime.spi.ProgramRunInfo) Test(org.junit.Test)

Example 10 with ProgramRunInfo

use of io.cdap.cdap.runtime.spi.ProgramRunInfo in project cdap by cdapio.

the class DataprocRuntimeJobManagerTest method invalidJobNameTest.

@Test(expected = IllegalArgumentException.class)
public void invalidJobNameTest() {
    ProgramRunInfo runInfo = new ProgramRunInfo.Builder().setNamespace("namespace").setApplication("application$$$").setVersion("1.0").setProgramType("workflow").setProgram("program").setRun(UUID.randomUUID().toString()).build();
    String jobName = DataprocRuntimeJobManager.getJobId(runInfo);
    Assert.assertTrue(jobName.startsWith("namespace_application_program"));
}
Also used : ProgramRunInfo(io.cdap.cdap.runtime.spi.ProgramRunInfo) Test(org.junit.Test)

Aggregations

ProgramRunInfo (io.cdap.cdap.runtime.spi.ProgramRunInfo)22 Test (org.junit.Test)14 AlreadyExistsException (com.google.api.gax.rpc.AlreadyExistsException)4 ApiException (com.google.api.gax.rpc.ApiException)4 HadoopJob (com.google.cloud.dataproc.v1beta2.HadoopJob)4 Job (com.google.cloud.dataproc.v1beta2.Job)4 SubmitJobRequest (com.google.cloud.dataproc.v1beta2.SubmitJobRequest)4 StorageException (com.google.cloud.storage.StorageException)4 VisibleForTesting (com.google.common.annotations.VisibleForTesting)4 File (java.io.File)4 IOException (java.io.IOException)4 ArrayList (java.util.ArrayList)4 CredentialsProvider (com.google.api.gax.core.CredentialsProvider)2 FixedCredentialsProvider (com.google.api.gax.core.FixedCredentialsProvider)2 StatusCode (com.google.api.gax.rpc.StatusCode)2 GoogleCredentials (com.google.auth.oauth2.GoogleCredentials)2 WriteChannel (com.google.cloud.WriteChannel)2 GetJobRequest (com.google.cloud.dataproc.v1beta2.GetJobRequest)2 JobControllerClient (com.google.cloud.dataproc.v1beta2.JobControllerClient)2 JobControllerSettings (com.google.cloud.dataproc.v1beta2.JobControllerSettings)2