Search in sources :

Example 6 with JobContext

use of org.apache.samza.context.JobContext in project samza by apache.

the class SamzaSqlApplication method describe.

@Override
public void describe(StreamApplicationDescriptor appDescriptor) {
    try {
        // TODO: Introduce an API to return a dsl string containing one or more sql statements.
        List<String> dslStmts = SamzaSqlDslConverter.fetchSqlFromConfig(appDescriptor.getConfig());
        Map<Integer, TranslatorContext> translatorContextMap = new HashMap<>();
        // 1. Get Calcite plan
        List<String> inputSystemStreams = new LinkedList<>();
        List<String> outputSystemStreams = new LinkedList<>();
        Collection<RelRoot> relRoots = SamzaSqlApplicationConfig.populateSystemStreamsAndGetRelRoots(dslStmts, appDescriptor.getConfig(), inputSystemStreams, outputSystemStreams);
        // 2. Populate configs
        SamzaSqlApplicationConfig sqlConfig = new SamzaSqlApplicationConfig(appDescriptor.getConfig(), inputSystemStreams, outputSystemStreams);
        // 3. Translate Calcite plan to Samza stream operators
        QueryTranslator queryTranslator = new QueryTranslator(appDescriptor, sqlConfig);
        SamzaSqlExecutionContext executionContext = new SamzaSqlExecutionContext(sqlConfig);
        // QueryId implies the index of the query in multiple query statements scenario. It should always start with 0.
        int queryId = 0;
        for (RelRoot relRoot : relRoots) {
            LOG.info("Translating relRoot {} to samza stream graph with queryId {}", relRoot, queryId);
            TranslatorContext translatorContext = new TranslatorContext(appDescriptor, relRoot, executionContext);
            translatorContextMap.put(queryId, translatorContext);
            queryTranslator.translate(relRoot, sqlConfig.getOutputSystemStreams().get(queryId), translatorContext, queryId);
            queryId++;
        }
        // 4. Set all translator contexts
        /*
       * TODO When serialization of ApplicationDescriptor is actually needed, then something will need to be updated here,
       * since translatorContext is not Serializable. Currently, a new ApplicationDescriptor instance is created in each
       * container, so it does not need to be serialized. Therefore, the translatorContext is recreated in each container
       * and does not need to be serialized.
       */
        appDescriptor.withApplicationTaskContextFactory(new ApplicationTaskContextFactory<SamzaSqlApplicationContext>() {

            @Override
            public SamzaSqlApplicationContext create(ExternalContext externalContext, JobContext jobContext, ContainerContext containerContext, TaskContext taskContext, ApplicationContainerContext applicationContainerContext) {
                return new SamzaSqlApplicationContext(translatorContextMap);
            }
        });
    } catch (RuntimeException e) {
        LOG.error("SamzaSqlApplication threw exception.", e);
        throw e;
    }
}
Also used : TaskContext(org.apache.samza.context.TaskContext) HashMap(java.util.HashMap) RelRoot(org.apache.calcite.rel.RelRoot) LinkedList(java.util.LinkedList) ContainerContext(org.apache.samza.context.ContainerContext) ApplicationContainerContext(org.apache.samza.context.ApplicationContainerContext) ApplicationContainerContext(org.apache.samza.context.ApplicationContainerContext) TranslatorContext(org.apache.samza.sql.translator.TranslatorContext) ExternalContext(org.apache.samza.context.ExternalContext) SamzaSqlExecutionContext(org.apache.samza.sql.data.SamzaSqlExecutionContext) JobContext(org.apache.samza.context.JobContext) QueryTranslator(org.apache.samza.sql.translator.QueryTranslator)

Example 7 with JobContext

use of org.apache.samza.context.JobContext in project samza by apache.

the class ContainerStorageManager method restoreStores.

// Restoration of all stores, in parallel across tasks
private void restoreStores() throws InterruptedException {
    LOG.info("Store Restore started");
    Set<TaskName> activeTasks = getTasks(containerModel, TaskMode.Active).keySet();
    // TODO HIGH dchen verify davinci lifecycle
    // Find all non-side input stores
    Set<String> nonSideInputStoreNames = storageEngineFactories.keySet().stream().filter(storeName -> !sideInputStoreNames.contains(storeName)).collect(Collectors.toSet());
    // Obtain the checkpoints for each task
    Map<TaskName, Map<String, TaskRestoreManager>> taskRestoreManagers = new HashMap<>();
    Map<TaskName, Checkpoint> taskCheckpoints = new HashMap<>();
    containerModel.getTasks().forEach((taskName, taskModel) -> {
        Checkpoint taskCheckpoint = null;
        if (checkpointManager != null && activeTasks.contains(taskName)) {
            // only pass in checkpoints for active tasks
            taskCheckpoint = checkpointManager.readLastCheckpoint(taskName);
            LOG.info("Obtained checkpoint: {} for state restore for taskName: {}", taskCheckpoint, taskName);
        }
        taskCheckpoints.put(taskName, taskCheckpoint);
        Map<String, Set<String>> backendFactoryStoreNames = getBackendFactoryStoreNames(taskCheckpoint, nonSideInputStoreNames, new StorageConfig(config));
        Map<String, TaskRestoreManager> taskStoreRestoreManagers = createTaskRestoreManagers(restoreStateBackendFactories, backendFactoryStoreNames, clock, samzaContainerMetrics, taskName, taskModel);
        taskRestoreManagers.put(taskName, taskStoreRestoreManagers);
    });
    // Initialize each TaskStorageManager
    taskRestoreManagers.forEach((taskName, restoreManagers) -> restoreManagers.forEach((factoryName, taskRestoreManager) -> taskRestoreManager.init(taskCheckpoints.get(taskName))));
    // Start each store consumer once.
    // Note: These consumers are per system and only changelog system store consumers will be started.
    // Some TaskRestoreManagers may not require the consumer to to be started, but due to the agnostic nature of
    // ContainerStorageManager we always start the changelog consumer here in case it is required
    this.storeConsumers.values().stream().distinct().forEach(SystemConsumer::start);
    List<Future<Void>> taskRestoreFutures = new ArrayList<>();
    // Submit restore callable for each taskInstance
    taskRestoreManagers.forEach((taskInstance, restoreManagersMap) -> {
        // Submit for each restore factory
        restoreManagersMap.forEach((factoryName, taskRestoreManager) -> {
            long startTime = System.currentTimeMillis();
            String taskName = taskInstance.getTaskName();
            LOG.info("Starting restore for state for task: {}", taskName);
            CompletableFuture<Void> restoreFuture = taskRestoreManager.restore().handle((res, ex) -> {
                // on stop, so paralleling stop() also parallelizes their compaction (a time-intensive operation).
                try {
                    taskRestoreManager.close();
                } catch (Exception e) {
                    LOG.error("Error closing restore manager for task: {} after {} restore", taskName, ex != null ? "unsuccessful" : "successful", e);
                // ignore exception from close. container may still be be able to continue processing/backups
                // if restore manager close fails.
                }
                long timeToRestore = System.currentTimeMillis() - startTime;
                if (samzaContainerMetrics != null) {
                    Gauge taskGauge = samzaContainerMetrics.taskStoreRestorationMetrics().getOrDefault(taskInstance, null);
                    if (taskGauge != null) {
                        taskGauge.set(timeToRestore);
                    }
                }
                if (ex != null) {
                    // log and rethrow exception to communicate restore failure
                    String msg = String.format("Error restoring state for task: %s", taskName);
                    LOG.error(msg, ex);
                    // wrap in unchecked exception to throw from lambda
                    throw new SamzaException(msg, ex);
                } else {
                    return null;
                }
            });
            taskRestoreFutures.add(restoreFuture);
        });
    });
    // as samza exceptions
    for (Future<Void> future : taskRestoreFutures) {
        try {
            future.get();
        } catch (InterruptedException e) {
            LOG.warn("Received an interrupt during store restoration. Interrupting the restore executor to exit " + "prematurely without restoring full state.");
            restoreExecutor.shutdownNow();
            throw e;
        } catch (Exception e) {
            LOG.error("Exception when restoring state.", e);
            throw new SamzaException("Exception when restoring state.", e);
        }
    }
    // Stop each store consumer once
    this.storeConsumers.values().stream().distinct().forEach(SystemConsumer::stop);
    // Now create persistent non side input stores in read-write mode, leave non-persistent stores as-is
    this.taskStores = createTaskStores(nonSideInputStoreNames, this.containerModel, jobContext, containerContext, storageEngineFactories, serdes, taskInstanceMetrics, taskInstanceCollectors);
    // Add in memory stores
    this.inMemoryStores.forEach((taskName, stores) -> {
        if (!this.taskStores.containsKey(taskName)) {
            taskStores.put(taskName, new HashMap<>());
        }
        taskStores.get(taskName).putAll(stores);
    });
    // Add side input stores
    this.sideInputStores.forEach((taskName, stores) -> {
        if (!this.taskStores.containsKey(taskName)) {
            taskStores.put(taskName, new HashMap<>());
        }
        taskStores.get(taskName).putAll(stores);
    });
    LOG.info("Store Restore complete");
}
Also used : StreamMetadataCache(org.apache.samza.system.StreamMetadataCache) SerdeUtils(org.apache.samza.table.utils.SerdeUtils) LoggerFactory(org.slf4j.LoggerFactory) TaskModel(org.apache.samza.job.model.TaskModel) Future(java.util.concurrent.Future) SystemConsumer(org.apache.samza.system.SystemConsumer) SamzaContainerMetrics(org.apache.samza.container.SamzaContainerMetrics) Map(java.util.Map) TaskInstanceCollector(org.apache.samza.task.TaskInstanceCollector) RoundRobinChooserFactory(org.apache.samza.system.chooser.RoundRobinChooserFactory) Path(java.nio.file.Path) StorageConfig(org.apache.samza.config.StorageConfig) RunLoopTask(org.apache.samza.container.RunLoopTask) TaskName(org.apache.samza.container.TaskName) IncomingMessageEnvelope(org.apache.samza.system.IncomingMessageEnvelope) Collection(java.util.Collection) Set(java.util.Set) DefaultChooser(org.apache.samza.system.chooser.DefaultChooser) Checkpoint(org.apache.samza.checkpoint.Checkpoint) MetricsRegistry(org.apache.samza.metrics.MetricsRegistry) Collectors(java.util.stream.Collectors) Executors(java.util.concurrent.Executors) CountDownLatch(java.util.concurrent.CountDownLatch) List(java.util.List) Optional(java.util.Optional) Config(org.apache.samza.config.Config) MetricsRegistryMap(org.apache.samza.metrics.MetricsRegistryMap) SystemAdmins(org.apache.samza.system.SystemAdmins) ScalaJavaUtil(org.apache.samza.util.ScalaJavaUtil) ThreadFactoryBuilder(com.google.common.util.concurrent.ThreadFactoryBuilder) MessageChooser(org.apache.samza.system.chooser.MessageChooser) CheckpointV2(org.apache.samza.checkpoint.CheckpointV2) JobConfig(org.apache.samza.config.JobConfig) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) Serde(org.apache.samza.serializers.Serde) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) Function(java.util.function.Function) SystemStreamMetadata(org.apache.samza.system.SystemStreamMetadata) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Gauge(org.apache.samza.metrics.Gauge) ImmutableList(com.google.common.collect.ImmutableList) SerdeManager(org.apache.samza.serializers.SerdeManager) MessageCollector(org.apache.samza.task.MessageCollector) CheckpointManager(org.apache.samza.checkpoint.CheckpointManager) SystemStream(org.apache.samza.system.SystemStream) RunLoop(org.apache.samza.container.RunLoop) SystemConsumersMetrics(org.apache.samza.system.SystemConsumersMetrics) ExecutorService(java.util.concurrent.ExecutorService) MapUtils(org.apache.commons.collections4.MapUtils) JavaConversions(scala.collection.JavaConversions) TaskInstanceMetrics(org.apache.samza.container.TaskInstanceMetrics) Logger(org.slf4j.Logger) TaskConfig(org.apache.samza.config.TaskConfig) JobContext(org.apache.samza.context.JobContext) ContainerContext(org.apache.samza.context.ContainerContext) SystemFactory(org.apache.samza.system.SystemFactory) Clock(org.apache.samza.util.Clock) SystemConsumers(org.apache.samza.system.SystemConsumers) File(java.io.File) SamzaException(org.apache.samza.SamzaException) TimeUnit(java.util.concurrent.TimeUnit) TaskMode(org.apache.samza.job.model.TaskMode) Entry(org.apache.samza.storage.kv.Entry) ReflectionUtil(org.apache.samza.util.ReflectionUtil) ContainerModel(org.apache.samza.job.model.ContainerModel) VisibleForTesting(com.google.common.annotations.VisibleForTesting) KeyValueStore(org.apache.samza.storage.kv.KeyValueStore) Collections(java.util.Collections) SystemConsumer(org.apache.samza.system.SystemConsumer) Set(java.util.Set) HashSet(java.util.HashSet) HashMap(java.util.HashMap) ArrayList(java.util.ArrayList) SamzaException(org.apache.samza.SamzaException) Gauge(org.apache.samza.metrics.Gauge) StorageConfig(org.apache.samza.config.StorageConfig) SamzaException(org.apache.samza.SamzaException) Checkpoint(org.apache.samza.checkpoint.Checkpoint) TaskName(org.apache.samza.container.TaskName) Future(java.util.concurrent.Future) CompletableFuture(java.util.concurrent.CompletableFuture) Map(java.util.Map) MetricsRegistryMap(org.apache.samza.metrics.MetricsRegistryMap) HashMap(java.util.HashMap)

Example 8 with JobContext

use of org.apache.samza.context.JobContext in project samza by apache.

the class TestLocalTableWrite method createTable.

private LocalTable createTable(boolean isTimerDisabled) {
    Map<String, String> config = new HashMap<>();
    if (isTimerDisabled) {
        config.put(MetricsConfig.METRICS_TIMER_ENABLED, "false");
    }
    Context context = mock(Context.class);
    JobContext jobContext = mock(JobContext.class);
    when(context.getJobContext()).thenReturn(jobContext);
    when(jobContext.getConfig()).thenReturn(new MapConfig(config));
    ContainerContext containerContext = mock(ContainerContext.class);
    when(context.getContainerContext()).thenReturn(containerContext);
    when(containerContext.getContainerMetricsRegistry()).thenReturn(metricsRegistry);
    LocalTable table = new LocalTable("t1", kvStore);
    table.init(context);
    return table;
}
Also used : JobContext(org.apache.samza.context.JobContext) ContainerContext(org.apache.samza.context.ContainerContext) Context(org.apache.samza.context.Context) ContainerContext(org.apache.samza.context.ContainerContext) HashMap(java.util.HashMap) JobContext(org.apache.samza.context.JobContext) MapConfig(org.apache.samza.config.MapConfig)

Example 9 with JobContext

use of org.apache.samza.context.JobContext in project samza by apache.

the class TestStreamOperatorTask method testCloseDuringInitializationErrors.

@Test
public void testCloseDuringInitializationErrors() throws Exception {
    Context context = mock(Context.class);
    JobContext jobContext = mock(JobContext.class);
    when(context.getJobContext()).thenReturn(jobContext);
    doThrow(new RuntimeException("Failed to get config")).when(jobContext).getConfig();
    StreamOperatorTask operatorTask = new StreamOperatorTask(mock(OperatorSpecGraph.class), mock(Clock.class));
    try {
        operatorTask.init(context);
    } catch (RuntimeException e) {
        if (e instanceof NullPointerException) {
            fail("Unexpected null pointer exception");
        }
    }
    operatorTask.close();
}
Also used : JobContext(org.apache.samza.context.JobContext) Context(org.apache.samza.context.Context) OperatorSpecGraph(org.apache.samza.operators.OperatorSpecGraph) JobContext(org.apache.samza.context.JobContext) Clock(org.apache.samza.util.Clock) Test(org.junit.Test)

Aggregations

JobContext (org.apache.samza.context.JobContext)9 ContainerContext (org.apache.samza.context.ContainerContext)8 HashMap (java.util.HashMap)6 Context (org.apache.samza.context.Context)5 MapConfig (org.apache.samza.config.MapConfig)4 VisibleForTesting (com.google.common.annotations.VisibleForTesting)3 Config (org.apache.samza.config.Config)3 Clock (org.apache.samza.util.Clock)3 File (java.io.File)2 Map (java.util.Map)2 Set (java.util.Set)2 ExecutorService (java.util.concurrent.ExecutorService)2 Collectors (java.util.stream.Collectors)2 RelRoot (org.apache.calcite.rel.RelRoot)2 MapUtils (org.apache.commons.collections4.MapUtils)2 StorageConfig (org.apache.samza.config.StorageConfig)2 TaskConfig (org.apache.samza.config.TaskConfig)2 TaskContext (org.apache.samza.context.TaskContext)2 ContainerModel (org.apache.samza.job.model.ContainerModel)2 JobModel (org.apache.samza.job.model.JobModel)2