Search in sources :

Example 21 with CheckpointId

use of org.apache.samza.checkpoint.CheckpointId in project samza by apache.

the class TaskStorageCommitManager method writeCheckpointToStoreDirectories.

/**
 * Writes the {@link Checkpoint} information returned by {@link #upload(CheckpointId, Map)}
 * in each store directory and store checkpoint directory. Written content depends on the type of {@code checkpoint}.
 * For {@link CheckpointV2}, writes the entire task {@link CheckpointV2}.
 * For {@link CheckpointV1}, only writes the changelog ssp offsets in the OFFSET* files.
 *
 * Note: The assumption is that this method will be invoked once for each {@link Checkpoint} version that the
 * task needs to write as determined by {@link org.apache.samza.config.TaskConfig#getCheckpointWriteVersions()}.
 * This is required for upgrade and rollback compatibility.
 *
 * @param checkpoint the latest checkpoint to be persisted to local file system
 */
public void writeCheckpointToStoreDirectories(Checkpoint checkpoint) {
    if (checkpoint instanceof CheckpointV1) {
        LOG.debug("Writing CheckpointV1 to store and checkpoint directories for taskName: {} with checkpoint: {}", taskName, checkpoint);
        // Write CheckpointV1 changelog offsets to store and checkpoint directories
        writeChangelogOffsetFiles(checkpoint.getOffsets());
    } else if (checkpoint instanceof CheckpointV2) {
        LOG.debug("Writing CheckpointV2 to store and checkpoint directories for taskName: {} with checkpoint: {}", taskName, checkpoint);
        storageEngines.forEach((storeName, storageEngine) -> {
            // Only write the checkpoint file if the store is durable and persisted to disk
            if (storageEngine.getStoreProperties().isDurableStore() && storageEngine.getStoreProperties().isPersistedToDisk()) {
                CheckpointV2 checkpointV2 = (CheckpointV2) checkpoint;
                try {
                    File storeDir = storageManagerUtil.getTaskStoreDir(durableStoreBaseDir, storeName, taskName, TaskMode.Active);
                    storageManagerUtil.writeCheckpointV2File(storeDir, checkpointV2);
                    CheckpointId checkpointId = checkpointV2.getCheckpointId();
                    File checkpointDir = Paths.get(storageManagerUtil.getStoreCheckpointDir(storeDir, checkpointId)).toFile();
                    storageManagerUtil.writeCheckpointV2File(checkpointDir, checkpointV2);
                } catch (Exception e) {
                    throw new SamzaException(String.format("Write checkpoint file failed for task: %s, storeName: %s, checkpointId: %s", taskName, storeName, ((CheckpointV2) checkpoint).getCheckpointId()), e);
                }
            }
        });
    } else {
        throw new SamzaException("Unsupported checkpoint version: " + checkpoint.getVersion());
    }
}
Also used : CheckpointV2(org.apache.samza.checkpoint.CheckpointV2) CheckpointV2(org.apache.samza.checkpoint.CheckpointV2) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) SystemStreamPartition(org.apache.samza.system.SystemStreamPartition) ArrayList(java.util.ArrayList) CheckpointV1(org.apache.samza.checkpoint.CheckpointV1) CheckpointManager(org.apache.samza.checkpoint.CheckpointManager) SystemStream(org.apache.samza.system.SystemStream) Map(java.util.Map) ExecutorService(java.util.concurrent.ExecutorService) FutureUtil(org.apache.samza.util.FutureUtil) KafkaChangelogSSPOffset(org.apache.samza.checkpoint.kafka.KafkaChangelogSSPOffset) TaskInstanceMetrics(org.apache.samza.container.TaskInstanceMetrics) TaskName(org.apache.samza.container.TaskName) Logger(org.slf4j.Logger) Partition(org.apache.samza.Partition) IOException(java.io.IOException) FileUtils(org.apache.commons.io.FileUtils) Checkpoint(org.apache.samza.checkpoint.Checkpoint) File(java.io.File) SamzaException(org.apache.samza.SamzaException) CheckpointId(org.apache.samza.checkpoint.CheckpointId) List(java.util.List) TaskMode(org.apache.samza.job.model.TaskMode) FileFilter(java.io.FileFilter) Paths(java.nio.file.Paths) WildcardFileFilter(org.apache.commons.io.filefilter.WildcardFileFilter) VisibleForTesting(com.google.common.annotations.VisibleForTesting) Config(org.apache.samza.config.Config) Collections(java.util.Collections) CheckpointV1(org.apache.samza.checkpoint.CheckpointV1) CheckpointId(org.apache.samza.checkpoint.CheckpointId) File(java.io.File) SamzaException(org.apache.samza.SamzaException) IOException(java.io.IOException) SamzaException(org.apache.samza.SamzaException)

Example 22 with CheckpointId

use of org.apache.samza.checkpoint.CheckpointId in project samza by apache.

the class TestCheckpointId method testSerializationDeserialization.

@Test
public void testSerializationDeserialization() {
    CheckpointId checkpointId = CheckpointId.create();
    CheckpointId deserializedCheckpointId = CheckpointId.deserialize(checkpointId.serialize());
    assertEquals(checkpointId.getMillis(), deserializedCheckpointId.getMillis());
    assertEquals(checkpointId.getNanoId(), deserializedCheckpointId.getNanoId());
    assertEquals(checkpointId, deserializedCheckpointId);
}
Also used : CheckpointId(org.apache.samza.checkpoint.CheckpointId) Test(org.junit.Test)

Example 23 with CheckpointId

use of org.apache.samza.checkpoint.CheckpointId in project samza by apache.

the class TestCheckpointId method testSerializationFormatForBackwardsCompatibility.

@Test
public void testSerializationFormatForBackwardsCompatibility() {
    CheckpointId checkpointId = CheckpointId.create();
    String serializedCheckpointId = checkpointId.serialize();
    // WARNING: This format is written to persisted remotes stores and local files, making a change in the format
    // would be backwards incompatible
    String legacySerializedFormat = serializeLegacy(checkpointId);
    assertEquals(checkpointId, CheckpointId.deserialize(legacySerializedFormat));
}
Also used : CheckpointId(org.apache.samza.checkpoint.CheckpointId) Test(org.junit.Test)

Example 24 with CheckpointId

use of org.apache.samza.checkpoint.CheckpointId in project samza by apache.

the class KafkaChangelogSSPOffset method fromString.

public static KafkaChangelogSSPOffset fromString(String message) {
    if (StringUtils.isBlank(message)) {
        throw new IllegalArgumentException("Invalid checkpointed changelog message: " + message);
    }
    String[] checkpointIdAndOffset = message.split(SEPARATOR);
    if (checkpointIdAndOffset.length != 2) {
        throw new IllegalArgumentException("Invalid checkpointed changelog offset: " + message);
    }
    CheckpointId checkpointId = CheckpointId.deserialize(checkpointIdAndOffset[0]);
    String offset = null;
    if (!"null".equals(checkpointIdAndOffset[1])) {
        offset = checkpointIdAndOffset[1];
    }
    return new KafkaChangelogSSPOffset(checkpointId, offset);
}
Also used : CheckpointId(org.apache.samza.checkpoint.CheckpointId)

Example 25 with CheckpointId

use of org.apache.samza.checkpoint.CheckpointId in project samza by apache.

the class TestBlobStoreUtil method testPutDir.

@Test
public // TODO HIGH shesharm test with empty (0 byte) files
void testPutDir() throws IOException, InterruptedException, ExecutionException {
    BlobStoreManager blobStoreManager = mock(BlobStoreManager.class);
    // File, dir and recursive dir added, retained and removed in local
    String local = "[a, c, z/1, y/1, p/m/1, q/n/1]";
    String remote = "[a, b, z/1, x/1, p/m/1, p/m/2, r/o/1]";
    String expectedAdded = "[c, y/1, q/n/1]";
    String expectedRetained = "[a, z/1, p/m/1]";
    String expectedRemoved = "[b, x/1, r/o/1, p/m/2]";
    SortedSet<String> expectedAddedFiles = BlobStoreTestUtil.getExpected(expectedAdded);
    SortedSet<String> expectedRetainedFiles = BlobStoreTestUtil.getExpected(expectedRetained);
    SortedSet<String> expectedPresentFiles = new TreeSet<>(expectedAddedFiles);
    expectedPresentFiles.addAll(expectedRetainedFiles);
    SortedSet<String> expectedRemovedFiles = BlobStoreTestUtil.getExpected(expectedRemoved);
    // Set up environment
    Path localSnapshotDir = BlobStoreTestUtil.createLocalDir(local);
    String basePath = localSnapshotDir.toAbsolutePath().toString();
    DirIndex remoteSnapshotDir = BlobStoreTestUtil.createDirIndex(remote);
    SnapshotMetadata snapshotMetadata = new SnapshotMetadata(checkpointId, jobName, jobId, taskName, storeName);
    DirDiff dirDiff = DirDiffUtil.getDirDiff(localSnapshotDir.toFile(), remoteSnapshotDir, (localFile, remoteFile) -> localFile.getName().equals(remoteFile.getFileName()));
    SortedSet<String> allUploaded = new TreeSet<>();
    // Set up mocks
    when(blobStoreManager.put(any(InputStream.class), any(Metadata.class))).thenAnswer((Answer<CompletableFuture<String>>) invocation -> {
        Metadata metadata = invocation.getArgumentAt(1, Metadata.class);
        String path = metadata.getPayloadPath();
        allUploaded.add(path.substring(localSnapshotDir.toAbsolutePath().toString().length() + 1));
        return CompletableFuture.completedFuture(path);
    });
    // Execute
    BlobStoreUtil blobStoreUtil = new BlobStoreUtil(blobStoreManager, EXECUTOR, null, null);
    CompletionStage<DirIndex> dirIndexFuture = blobStoreUtil.putDir(dirDiff, snapshotMetadata);
    DirIndex dirIndex = null;
    try {
        // should be already complete. if not, future composition in putDir is broken.
        dirIndex = dirIndexFuture.toCompletableFuture().get(0, TimeUnit.MILLISECONDS);
    } catch (TimeoutException e) {
        fail("Future returned from putDir should be already complete.");
    }
    SortedSet<String> allPresent = new TreeSet<>();
    SortedSet<String> allRemoved = new TreeSet<>();
    BlobStoreTestUtil.getAllPresentInIndex("", dirIndex, allPresent);
    BlobStoreTestUtil.getAllRemovedInIndex("", dirIndex, allRemoved);
    // Assert
    assertEquals(expectedAddedFiles, allUploaded);
    assertEquals(expectedPresentFiles, allPresent);
    assertEquals(expectedRemovedFiles, allRemoved);
}
Also used : Path(java.nio.file.Path) SortedSet(java.util.SortedSet) FileMetadata(org.apache.samza.storage.blobstore.index.FileMetadata) FileTime(java.nio.file.attribute.FileTime) TimeoutException(java.util.concurrent.TimeoutException) Random(java.util.Random) RetriableException(org.apache.samza.storage.blobstore.exceptions.RetriableException) FileUtil(org.apache.samza.util.FileUtil) Pair(org.apache.commons.lang3.tuple.Pair) Map(java.util.Map) Path(java.nio.file.Path) FutureUtil(org.apache.samza.util.FutureUtil) ImmutableSet(com.google.common.collect.ImmutableSet) PosixFileAttributes(java.nio.file.attribute.PosixFileAttributes) ImmutableMap(com.google.common.collect.ImmutableMap) Set(java.util.Set) CompletionException(java.util.concurrent.CompletionException) Checkpoint(org.apache.samza.checkpoint.Checkpoint) DirDiff(org.apache.samza.storage.blobstore.diff.DirDiff) CheckpointId(org.apache.samza.checkpoint.CheckpointId) IOUtils(org.apache.commons.io.IOUtils) List(java.util.List) CompletionStage(java.util.concurrent.CompletionStage) SnapshotIndex(org.apache.samza.storage.blobstore.index.SnapshotIndex) Optional(java.util.Optional) RandomStringUtils(org.apache.commons.lang3.RandomStringUtils) SnapshotMetadata(org.apache.samza.storage.blobstore.index.SnapshotMetadata) MoreExecutors(com.google.common.util.concurrent.MoreExecutors) DirIndex(org.apache.samza.storage.blobstore.index.DirIndex) FileBlob(org.apache.samza.storage.blobstore.index.FileBlob) Matchers(org.mockito.Matchers) CheckpointV2(org.apache.samza.checkpoint.CheckpointV2) HashMap(java.util.HashMap) CompletableFuture(java.util.concurrent.CompletableFuture) TreeSet(java.util.TreeSet) ArrayList(java.util.ArrayList) HashSet(java.util.HashSet) Answer(org.mockito.stubbing.Answer) PosixFilePermissions(java.nio.file.attribute.PosixFilePermissions) ArgumentCaptor(org.mockito.ArgumentCaptor) ImmutableList(com.google.common.collect.ImmutableList) BlobStoreManager(org.apache.samza.storage.blobstore.BlobStoreManager) BlobStoreStateBackendFactory(org.apache.samza.storage.blobstore.BlobStoreStateBackendFactory) ExecutorService(java.util.concurrent.ExecutorService) OutputStream(java.io.OutputStream) FileIndex(org.apache.samza.storage.blobstore.index.FileIndex) Files(java.nio.file.Files) FileOutputStream(java.io.FileOutputStream) IOException(java.io.IOException) FileUtils(org.apache.commons.io.FileUtils) Test(org.junit.Test) Metadata(org.apache.samza.storage.blobstore.Metadata) File(java.io.File) SamzaException(org.apache.samza.SamzaException) ExecutionException(java.util.concurrent.ExecutionException) TimeUnit(java.util.concurrent.TimeUnit) Mockito(org.mockito.Mockito) Ignore(org.junit.Ignore) Paths(java.nio.file.Paths) NullOutputStream(org.apache.commons.io.output.NullOutputStream) CRC32(java.util.zip.CRC32) Assert(org.junit.Assert) Collections(java.util.Collections) InputStream(java.io.InputStream) DeletedException(org.apache.samza.storage.blobstore.exceptions.DeletedException) InputStream(java.io.InputStream) FileMetadata(org.apache.samza.storage.blobstore.index.FileMetadata) SnapshotMetadata(org.apache.samza.storage.blobstore.index.SnapshotMetadata) Metadata(org.apache.samza.storage.blobstore.Metadata) DirDiff(org.apache.samza.storage.blobstore.diff.DirDiff) BlobStoreManager(org.apache.samza.storage.blobstore.BlobStoreManager) CompletableFuture(java.util.concurrent.CompletableFuture) SnapshotMetadata(org.apache.samza.storage.blobstore.index.SnapshotMetadata) TreeSet(java.util.TreeSet) DirIndex(org.apache.samza.storage.blobstore.index.DirIndex) TimeoutException(java.util.concurrent.TimeoutException) Test(org.junit.Test)

Aggregations

CheckpointId (org.apache.samza.checkpoint.CheckpointId)53 Test (org.junit.Test)48 File (java.io.File)44 HashMap (java.util.HashMap)43 Map (java.util.Map)42 Path (java.nio.file.Path)41 ImmutableMap (com.google.common.collect.ImmutableMap)40 TaskName (org.apache.samza.container.TaskName)38 Collections (java.util.Collections)36 Partition (org.apache.samza.Partition)34 SystemStreamPartition (org.apache.samza.system.SystemStreamPartition)34 Answer (org.mockito.stubbing.Answer)34 MapConfig (org.apache.samza.config.MapConfig)31 TaskMode (org.apache.samza.job.model.TaskMode)30 FileUtil (org.apache.samza.util.FileUtil)30 Mockito (org.mockito.Mockito)30 Set (java.util.Set)29 ImmutableList (com.google.common.collect.ImmutableList)28 ImmutableSet (com.google.common.collect.ImmutableSet)28 SystemStream (org.apache.samza.system.SystemStream)27