Search in sources :

Example 6 with SparkLogLine

use of com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine in project azure-tools-for-java by Microsoft.

the class SparkBatchJob method getSubmissionLog.

@Override
@NotNull
public Observable<SparkLogLine> getSubmissionLog() {
    if (getConnectUri() == null) {
        return Observable.error(new SparkJobNotConfiguredException("Can't get Spark job connection URI, " + "please configure Spark cluster which the Spark job will be submitted."));
    }
    // Those lines are carried per response,
    // if there is no value followed, the line should not be sent to console
    final Set<String> ignoredEmptyLines = new HashSet<>(Arrays.asList("stdout:", "stderr:", "yarn diagnostics:"));
    return Observable.create(ob -> {
        try {
            final int maxLinesPerGet = 128;
            int linesGot;
            boolean isFetching = true;
            while (isFetching) {
                final int start = nextLivyLogOffset;
                final boolean isAppIdAllocated = !this.getSparkJobApplicationIdObservable().isEmpty().toBlocking().lastOrDefault(true);
                final String logUrl = String.format("%s/%d/log?from=%d&size=%d", this.getConnectUri().toString(), batchId, start, maxLinesPerGet);
                final HttpResponse httpResponse = this.getSubmission().getHttpResponseViaGet(logUrl);
                final SparkJobLog sparkJobLog = ObjectConvertUtils.convertJsonToObject(httpResponse.getMessage(), SparkJobLog.class).orElseThrow(() -> new UnknownServiceException("Bad spark log response: " + httpResponse.getMessage()));
                synchronized (livyLogOffsetLock) {
                    if (start != nextLivyLogOffset) {
                        // The offset is moved by another fetching thread, re-do it with new offset
                        continue;
                    }
                    // To subscriber
                    sparkJobLog.getLog().stream().filter(line -> !ignoredEmptyLines.contains(line.trim().toLowerCase())).forEach(line -> ob.onNext(new SparkLogLine(LIVY, Log, line)));
                    linesGot = sparkJobLog.getLog().size();
                    nextLivyLogOffset += linesGot;
                }
                // Retry interval
                if (linesGot == 0) {
                    isFetching = "starting".equals(this.getState()) && !isAppIdAllocated;
                    sleep(TimeUnit.SECONDS.toMillis(this.getDelaySeconds()));
                }
            }
        } catch (final IOException ex) {
            ob.onNext(new SparkLogLine(TOOL, Error, ex.getMessage()));
        } catch (final InterruptedException ignored) {
        } finally {
            ob.onCompleted();
        }
    });
}
Also used : App(com.microsoft.azure.hdinsight.sdk.rest.yarn.rm.App) java.util(java.util) NotNull(com.microsoft.azuretools.azurecommons.helpers.NotNull) Error(com.microsoft.azure.hdinsight.common.MessageInfoType.Error) LivyCluster(com.microsoft.azure.hdinsight.sdk.cluster.LivyCluster) ObjectConvertUtils(com.microsoft.azure.hdinsight.sdk.rest.ObjectConvertUtils) SimpleImmutableEntry(java.util.AbstractMap.SimpleImmutableEntry) StringUtils(org.apache.commons.lang3.StringUtils) ILogger(com.microsoft.azure.hdinsight.common.logger.ILogger) Observable(rx.Observable) Matcher(java.util.regex.Matcher) ClusterManagerEx(com.microsoft.azure.hdinsight.common.ClusterManagerEx) Cache(com.gargoylesoftware.htmlunit.Cache) Thread.sleep(java.lang.Thread.sleep) URI(java.net.URI) TOOL(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine.TOOL) IClusterDetail(com.microsoft.azure.hdinsight.sdk.cluster.IClusterDetail) YarnCluster(com.microsoft.azure.hdinsight.sdk.cluster.YarnCluster) Nullable(com.microsoft.azuretools.azurecommons.helpers.Nullable) Subscriber(rx.Subscriber) JobUtils(com.microsoft.azure.hdinsight.spark.jobs.JobUtils) Exceptions.propagate(rx.exceptions.Exceptions.propagate) HDIException(com.microsoft.azure.hdinsight.sdk.common.HDIException) HttpResponse(com.microsoft.azure.hdinsight.sdk.common.HttpResponse) IOException(java.io.IOException) Observer(rx.Observer) File(java.io.File) HttpObservable(com.microsoft.azure.hdinsight.sdk.common.HttpObservable) AppResponse(com.microsoft.azure.hdinsight.sdk.rest.yarn.rm.AppResponse) AppAttemptsResponse(com.microsoft.azure.hdinsight.sdk.rest.yarn.rm.AppAttemptsResponse) TimeUnit(java.util.concurrent.TimeUnit) MessageInfoType(com.microsoft.azure.hdinsight.common.MessageInfoType) LIVY(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine.LIVY) Pattern(java.util.regex.Pattern) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine) PublishSubject(rx.subjects.PublishSubject) UnknownServiceException(java.net.UnknownServiceException) AppAttempt(com.microsoft.azure.hdinsight.sdk.rest.yarn.rm.AppAttempt) IHDIStorageAccount(com.microsoft.azure.hdinsight.sdk.storage.IHDIStorageAccount) ExceptionUtils(org.apache.commons.lang3.exception.ExceptionUtils) UnknownServiceException(java.net.UnknownServiceException) HttpResponse(com.microsoft.azure.hdinsight.sdk.common.HttpResponse) IOException(java.io.IOException) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine) NotNull(com.microsoft.azuretools.azurecommons.helpers.NotNull)

Example 7 with SparkLogLine

use of com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine in project azure-tools-for-java by Microsoft.

the class WebHDFSDeploy method deploy.

@Override
public Observable<String> deploy(File src, Observer<SparkLogLine> logSubject) {
    // three steps to upload via webhdfs
    // 1.put request to create new dir
    // 2.put request to get 307 redirect uri from response
    // 3.put redirect request with file content as setEntity
    final URI dest = getUploadDir();
    final HttpPut req = new HttpPut(dest.toString());
    return http.request(req, null, this.createDirReqParams, null).doOnNext(resp -> {
        if (resp.getStatusLine().getStatusCode() != 200) {
            Exceptions.propagate(new UnknownServiceException("Can not create directory to save artifact using webHDFS storage type"));
        }
    }).map(ignored -> new HttpPut(dest.resolve(src.getName()).toString())).flatMap(put -> http.request(put, null, this.uploadReqParams, null)).map(resp -> resp.getFirstHeader("Location").getValue()).doOnNext(redirectedUri -> {
        if (StringUtils.isBlank(redirectedUri)) {
            Exceptions.propagate(new UnknownServiceException("Can not get valid redirect uri using webHDFS storage type"));
        }
    }).map(HttpPut::new).flatMap(put -> {
        try {
            InputStreamEntity reqEntity = new InputStreamEntity(new FileInputStream(src), -1, ContentType.APPLICATION_OCTET_STREAM);
            reqEntity.setChunked(true);
            return http.request(put, new BufferedHttpEntity(reqEntity), URLEncodedUtils.parse(put.getURI(), "UTF-8"), null);
        } catch (IOException ex) {
            throw new RuntimeException(new IllegalArgumentException("Can not get local artifact when uploading" + ex.toString()));
        }
    }).map(ignored -> {
        try {
            return getArtifactUploadedPath(dest.resolve(src.getName()).toString());
        } catch (final URISyntaxException ex) {
            throw new RuntimeException(new IllegalArgumentException("Can not get valid artifact upload path" + ex.toString()));
        }
    });
}
Also used : NotNull(com.microsoft.azuretools.azurecommons.helpers.NotNull) Exceptions(rx.exceptions.Exceptions) URISyntaxException(java.net.URISyntaxException) RequestConfig(org.apache.http.client.config.RequestConfig) StringUtils(org.apache.commons.lang3.StringUtils) ILogger(com.microsoft.azure.hdinsight.common.logger.ILogger) Observable(rx.Observable) WebHdfsParamsBuilder(com.microsoft.azure.hdinsight.sdk.storage.webhdfs.WebHdfsParamsBuilder) URI(java.net.URI) IClusterDetail(com.microsoft.azure.hdinsight.sdk.cluster.IClusterDetail) Nullable(com.microsoft.azuretools.azurecommons.helpers.Nullable) URIBuilder(org.apache.http.client.utils.URIBuilder) JobUtils(com.microsoft.azure.hdinsight.spark.jobs.JobUtils) ContentType(org.apache.http.entity.ContentType) IOException(java.io.IOException) FileInputStream(java.io.FileInputStream) Observer(rx.Observer) File(java.io.File) HttpObservable(com.microsoft.azure.hdinsight.sdk.common.HttpObservable) List(java.util.List) HttpPut(org.apache.http.client.methods.HttpPut) URLEncodedUtils(org.apache.http.client.utils.URLEncodedUtils) InputStreamEntity(org.apache.http.entity.InputStreamEntity) NameValuePair(org.apache.http.NameValuePair) BufferedHttpEntity(org.apache.http.entity.BufferedHttpEntity) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine) UnknownServiceException(java.net.UnknownServiceException) BufferedHttpEntity(org.apache.http.entity.BufferedHttpEntity) UnknownServiceException(java.net.UnknownServiceException) IOException(java.io.IOException) URISyntaxException(java.net.URISyntaxException) URI(java.net.URI) HttpPut(org.apache.http.client.methods.HttpPut) FileInputStream(java.io.FileInputStream) InputStreamEntity(org.apache.http.entity.InputStreamEntity)

Example 8 with SparkLogLine

use of com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine in project azure-tools-for-java by Microsoft.

the class Session method deploy.

/*
     * Observable APIs, all IO operations
     */
public Observable<Session> deploy() {
    final Deployable deployDelegate = getDeploy();
    if (deployDelegate == null) {
        return Observable.just(this);
    }
    return Observable.from(getArtifactsToDeploy()).doOnNext(artifactPath -> ctrlSubject.onNext(new SparkLogLine(TOOL, Info, "Start uploading artifact " + artifactPath))).flatMap(artifactPath -> deployDelegate.deploy(new File(artifactPath), ctrlSubject)).doOnNext(uri -> ctrlSubject.onNext(new SparkLogLine(TOOL, Info, "Uploaded to " + uri))).toList().onErrorResumeNext(err -> {
        ctrlSubject.onNext(new SparkLogLine(TOOL, Warning, "Failed to upload artifact: " + err));
        ctrlSubject.onNext(new SparkLogLine(TOOL, Warning, "Try to start interactive session without those artifacts dependency..."));
        return Observable.empty();
    }).map(uploadedUris -> {
        this.createParameters.uploadedArtifactsUris.addAll(uploadedUris);
        return this;
    }).defaultIfEmpty(this);
}
Also used : java.util(java.util) Warning(com.microsoft.azure.hdinsight.common.MessageInfoType.Warning) SessionState(com.microsoft.azure.hdinsight.sdk.rest.livy.interactive.SessionState) SessionKind(com.microsoft.azure.hdinsight.sdk.rest.livy.interactive.SessionKind) SimpleImmutableEntry(java.util.AbstractMap.SimpleImmutableEntry) StringUtils(org.apache.commons.lang3.StringUtils) HDInsightLoader(com.microsoft.azure.hdinsight.common.HDInsightLoader) ILogger(com.microsoft.azure.hdinsight.common.logger.ILogger) Observable(rx.Observable) Info(com.microsoft.azure.hdinsight.common.MessageInfoType.Info) ImmutableList(com.google.common.collect.ImmutableList) ByteArrayInputStream(java.io.ByteArrayInputStream) Schedulers(rx.schedulers.Schedulers) Thread.sleep(java.lang.Thread.sleep) URI(java.net.URI) TOOL(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine.TOOL) PostSessions(com.microsoft.azure.hdinsight.sdk.rest.livy.interactive.api.PostSessions) Debug(com.microsoft.azure.hdinsight.common.MessageInfoType.Debug) MemorySize(com.microsoft.azure.hdinsight.sdk.common.livy.MemorySize) Nullable(com.microsoft.azuretools.azurecommons.helpers.Nullable) ImmutableMap(com.google.common.collect.ImmutableMap) Exceptions.propagate(rx.exceptions.Exceptions.propagate) HttpResponse(com.microsoft.azure.hdinsight.sdk.common.HttpResponse) StringEntity(org.apache.http.entity.StringEntity) SessionNotStartException(com.microsoft.azure.hdinsight.sdk.common.livy.interactive.exceptions.SessionNotStartException) Scheduler(rx.Scheduler) StatementExecutionError(com.microsoft.azure.hdinsight.sdk.common.livy.interactive.exceptions.StatementExecutionError) Collectors(java.util.stream.Collectors) ImmutablePair(org.apache.commons.lang3.tuple.ImmutablePair) File(java.io.File) StandardCharsets(java.nio.charset.StandardCharsets) HttpObservable(com.microsoft.azure.hdinsight.sdk.common.HttpObservable) TimeUnit(java.util.concurrent.TimeUnit) Stream(java.util.stream.Stream) AppInsightsClient(com.microsoft.azuretools.telemetry.AppInsightsClient) Closeable(java.io.Closeable) ApplicationNotStartException(com.microsoft.azure.hdinsight.sdk.common.livy.interactive.exceptions.ApplicationNotStartException) Deployable(com.microsoft.azure.hdinsight.spark.common.Deployable) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine) PublishSubject(rx.subjects.PublishSubject) ExceptionUtils(org.apache.commons.lang3.exception.ExceptionUtils) Deployable(com.microsoft.azure.hdinsight.spark.common.Deployable) File(java.io.File) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine)

Example 9 with SparkLogLine

use of com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine in project azure-tools-for-java by Microsoft.

the class CosmosServerlessSparkBatchJob method getSubmissionLog.

@NotNull
@Override
public Observable<SparkLogLine> getSubmissionLog() {
    final ImmutableSet<String> ignoredEmptyLines = ImmutableSet.of("stdout:", "stderr:", "yarn diagnostics:");
    final int GET_LIVY_URL_REPEAT_DELAY_MILLISECONDS = 3000;
    final int MAX_LOG_LINES_PER_REQUEST = 128;
    final int GET_LOG_REPEAT_DELAY_MILLISECONDS = 1000;
    // We need to repeatly call getSparkBatchJobRequest() since "livyServerApi" field does not always exist in response but
    // only appeared for a while and before that we can't get the "livyServerApi" field.
    ctrlInfo("Trying to get livy URL...");
    return getSparkBatchJobRequest().flatMap(batchResp -> getJobSchedulerState(batchResp) == null ? Observable.error(new IOException("Failed to get scheduler state of the job.")) : Observable.just(batchResp)).retryWhen(err -> err.zipWith(Observable.range(1, getRetriesMax()), (n, i) -> i).delay(getDelaySeconds(), TimeUnit.SECONDS)).repeatWhen(ob -> ob.delay(GET_LIVY_URL_REPEAT_DELAY_MILLISECONDS, TimeUnit.MILLISECONDS)).takeUntil(batchResp -> isJobEnded(batchResp) || StringUtils.isNotEmpty(getLivyAPI(batchResp))).filter(batchResp -> isJobEnded(batchResp) || StringUtils.isNotEmpty(getLivyAPI(batchResp))).flatMap(job -> {
        if (isJobEnded(job)) {
            final String jobState = getJobState(job);
            final String schedulerState = getJobSchedulerState(job);
            final String message = String.format("Job scheduler state: %s. Job running state: %s.", schedulerState, jobState);
            return Observable.just(new SparkLogLine(TOOL, Info, message));
        } else {
            return Observable.just(job).doOnNext(batchResp -> {
                ctrlInfo("Successfully get livy URL: " + batchResp.properties().livyServerAPI());
                ctrlInfo("Trying to retrieve livy submission logs...");
                // After test we find batch id won't be provided until the job is in running state
                // However, since only one spark job will be run on the cluster, the batch ID should always be 0
                setBatchId(0);
            }).map(batchResp -> batchResp.properties().livyServerAPI()).flatMap(livyUrl -> Observable.defer(() -> getSubmissionLogRequest(livyUrl, getBatchId(), getLogStartIndex(), MAX_LOG_LINES_PER_REQUEST)).map(sparkJobLog -> Optional.ofNullable(sparkJobLog.getLog()).orElse(Collections.<String>emptyList())).doOnNext(logs -> setLogStartIndex(getLogStartIndex() + logs.size())).map(logs -> logs.stream().filter(logLine -> !ignoredEmptyLines.contains(logLine.trim().toLowerCase())).collect(Collectors.toList())).flatMap(logLines -> {
                if (logLines.size() > 0) {
                    return Observable.just(Triple.of(logLines, SparkBatchJobState.STARTING.toString(), SchedulerState.SCHEDULED.toString()));
                } else {
                    return getSparkBatchJobRequest().map(batchResp -> Triple.of(logLines, getJobState(batchResp), getJobSchedulerState(batchResp)));
                }
            }).onErrorResumeNext(errors -> getSparkBatchJobRequest().delay(getDelaySeconds(), TimeUnit.SECONDS).map(batchResp -> Triple.of(new ArrayList<>(), getJobState(batchResp), getJobSchedulerState(batchResp)))).repeatWhen(ob -> ob.delay(GET_LOG_REPEAT_DELAY_MILLISECONDS, TimeUnit.MILLISECONDS)).takeUntil(logAndStatesTriple -> {
                String jobRunningState = logAndStatesTriple.getMiddle();
                String jobSchedulerState = logAndStatesTriple.getRight();
                return jobRunningState != null && !jobRunningState.equalsIgnoreCase(SparkBatchJobState.STARTING.toString()) || jobSchedulerState != null && jobSchedulerState.equalsIgnoreCase(SchedulerState.ENDED.toString());
            }).flatMap(logAndStatesTriple -> {
                final String jobRunningState = logAndStatesTriple.getMiddle();
                final String jobSchedulerState = logAndStatesTriple.getRight();
                if (jobRunningState != null && !jobRunningState.equalsIgnoreCase(SparkBatchJobState.STARTING.toString()) || jobSchedulerState != null && jobSchedulerState.equalsIgnoreCase(SchedulerState.ENDED.toString())) {
                    final String message = String.format("Job scheduler state: %s. Job running state: %s.", jobSchedulerState, jobRunningState);
                    return Observable.just(new SparkLogLine(TOOL, Info, message));
                } else {
                    return Observable.from(logAndStatesTriple.getLeft()).map(line -> new SparkLogLine(LIVY, Log, line));
                }
            }));
        }
    });
}
Also used : BasicNameValuePair(org.apache.http.message.BasicNameValuePair) AzureSparkServerlessAccount(com.microsoft.azure.hdinsight.sdk.common.azure.serverless.AzureSparkServerlessAccount) java.util(java.util) CreateSparkBatchJobParameters(com.microsoft.azure.hdinsight.sdk.rest.azure.serverless.spark.models.CreateSparkBatchJobParameters) NotNull(com.microsoft.azuretools.azurecommons.helpers.NotNull) ADLStoreClient(com.microsoft.azure.datalake.store.ADLStoreClient) StringUtils(org.apache.commons.lang3.StringUtils) Header(org.apache.http.Header) Observable(rx.Observable) Info(com.microsoft.azure.hdinsight.common.MessageInfoType.Info) URI(java.net.URI) Triple(org.apache.commons.lang3.tuple.Triple) TOOL(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine.TOOL) ImmutableSet(com.google.common.collect.ImmutableSet) Nullable(com.microsoft.azuretools.azurecommons.helpers.Nullable) IOException(java.io.IOException) SchedulerState(com.microsoft.azure.hdinsight.sdk.rest.azure.serverless.spark.models.SchedulerState) Collectors(java.util.stream.Collectors) File(java.io.File) TimeUnit(java.util.concurrent.TimeUnit) Log(com.microsoft.azure.hdinsight.common.MessageInfoType.Log) BasicHeader(org.apache.http.message.BasicHeader) LIVY(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine.LIVY) NameValuePair(org.apache.http.NameValuePair) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine) AzureHttpObservable(com.microsoft.azure.hdinsight.sdk.common.AzureHttpObservable) ExceptionUtils(org.apache.commons.lang3.exception.ExceptionUtils) IOException(java.io.IOException) SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine) NotNull(com.microsoft.azuretools.azurecommons.helpers.NotNull)

Example 10 with SparkLogLine

use of com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine in project azure-tools-for-java by Microsoft.

the class JobUtils method uploadFileToEmulator.

public static String uploadFileToEmulator(@NotNull IClusterDetail selectedClusterDetail, @NotNull String buildJarPath, @NotNull Observer<SparkLogLine> logSubject) throws Exception {
    logSubject.onNext(new SparkLogLine(TOOL, Info, String.format("Get target jar from %s.", buildJarPath)));
    final String uniqueFolderId = UUID.randomUUID().toString();
    final String folderPath = String.format("../opt/livy/SparkSubmission/%s", uniqueFolderId);
    return String.format("/opt/livy/SparkSubmission/%s/%s", uniqueFolderId, sftpFileToEmulator(buildJarPath, folderPath, selectedClusterDetail));
}
Also used : SparkLogLine(com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine)

Aggregations

SparkLogLine (com.microsoft.azure.hdinsight.spark.common.log.SparkLogLine)10 URI (java.net.URI)7 Nullable (com.microsoft.azuretools.azurecommons.helpers.Nullable)6 Observable (rx.Observable)6 NotNull (com.microsoft.azuretools.azurecommons.helpers.NotNull)5 File (java.io.File)5 ILogger (com.microsoft.azure.hdinsight.common.logger.ILogger)4 HttpObservable (com.microsoft.azure.hdinsight.sdk.common.HttpObservable)4 StringUtils (org.apache.commons.lang3.StringUtils)4 ExceptionUtils (org.apache.commons.lang3.exception.ExceptionUtils)4 IClusterDetail (com.microsoft.azure.hdinsight.sdk.cluster.IClusterDetail)3 HttpResponse (com.microsoft.azure.hdinsight.sdk.common.HttpResponse)3 JobUtils (com.microsoft.azure.hdinsight.spark.jobs.JobUtils)3 IOException (java.io.IOException)3 UnknownServiceException (java.net.UnknownServiceException)3 Observer (rx.Observer)3 PublishSubject (rx.subjects.PublishSubject)3 ImmutableMap (com.google.common.collect.ImmutableMap)2 DefaultExecutionResult (com.intellij.execution.DefaultExecutionResult)2 ExecutionResult (com.intellij.execution.ExecutionResult)2