Search in sources :

Example 1 with ReadResultMapper

use of com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper in project dsbulk by datastax.

the class UnloadWorkflow method manyWriters.

private Flux<Record> manyWriters() {
    // writeConcurrency and readConcurrency are >= 0.5C here
    int actualConcurrency = Math.min(readConcurrency, writeConcurrency);
    int numThreads = Math.min(numCores * 2, actualConcurrency);
    Scheduler scheduler = Schedulers.newParallel(numThreads, new DefaultThreadFactory("workflow"));
    schedulers.add(scheduler);
    return Flux.fromIterable(readStatements).flatMap(results -> {
        Flux<Record> records = Flux.from(executor.readReactive(results)).publishOn(scheduler, 500).transform(queryWarningsHandler).transform(totalItemsMonitor).transform(totalItemsCounter).transform(failedReadResultsMonitor).transform(failedReadsHandler).map(readResultMapper::map).transform(failedRecordsMonitor).transform(unmappableRecordsHandler);
        if (actualConcurrency == writeConcurrency) {
            records = records.transform(writer);
        } else {
            // If the actual concurrency is lesser than the connector's desired write
            // concurrency, we need to give the connector a chance to switch writers
            // frequently so that it can really redirect records to all the final destinations
            // (to that many files on disk for example). If the connector is correctly
            // implemented, each window will be redirected to a different destination
            // in a round-robin fashion.
            records = records.window(500).flatMap(window -> window.transform(writer), 1, 500);
        }
        return records.transform(failedRecordsMonitor).transform(failedRecordsHandler);
    }, actualConcurrency, 500);
}
Also used : DefaultThreadFactory(io.netty.util.concurrent.DefaultThreadFactory) ReadResult(com.datastax.oss.dsbulk.executor.api.result.ReadResult) Connector(com.datastax.oss.dsbulk.connectors.api.Connector) DefaultThreadFactory(io.netty.util.concurrent.DefaultThreadFactory) BulkReader(com.datastax.oss.dsbulk.executor.api.reader.BulkReader) DriverSettings(com.datastax.oss.dsbulk.workflow.commons.settings.DriverSettings) LoggerFactory(org.slf4j.LoggerFactory) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) Workflow(com.datastax.oss.dsbulk.workflow.api.Workflow) Scheduler(reactor.core.scheduler.Scheduler) Function(java.util.function.Function) ExecutorSettings(com.datastax.oss.dsbulk.workflow.commons.settings.ExecutorSettings) SchemaSettings(com.datastax.oss.dsbulk.workflow.commons.settings.SchemaSettings) HashSet(java.util.HashSet) RecordMetadata(com.datastax.oss.dsbulk.connectors.api.RecordMetadata) CqlSession(com.datastax.oss.driver.api.core.CqlSession) ConnectorSettings(com.datastax.oss.dsbulk.workflow.commons.settings.ConnectorSettings) Duration(java.time.Duration) SchemaGenerationStrategy(com.datastax.oss.dsbulk.workflow.commons.settings.SchemaGenerationStrategy) Schedulers(reactor.core.scheduler.Schedulers) Record(com.datastax.oss.dsbulk.connectors.api.Record) Stopwatch(com.datastax.oss.driver.shaded.guava.common.base.Stopwatch) CommonConnectorFeature(com.datastax.oss.dsbulk.connectors.api.CommonConnectorFeature) Logger(org.slf4j.Logger) Config(com.typesafe.config.Config) LogSettings(com.datastax.oss.dsbulk.workflow.commons.settings.LogSettings) Publisher(org.reactivestreams.Publisher) ConvertingCodecFactory(com.datastax.oss.dsbulk.codecs.api.ConvertingCodecFactory) SettingsManager(com.datastax.oss.dsbulk.workflow.commons.settings.SettingsManager) EngineSettings(com.datastax.oss.dsbulk.workflow.commons.settings.EngineSettings) Set(java.util.Set) ClusterInformationUtils(com.datastax.oss.dsbulk.workflow.commons.utils.ClusterInformationUtils) CodecSettings(com.datastax.oss.dsbulk.workflow.commons.settings.CodecSettings) MonitoringSettings(com.datastax.oss.dsbulk.workflow.commons.settings.MonitoringSettings) TimeUnit(java.util.concurrent.TimeUnit) Flux(reactor.core.publisher.Flux) List(java.util.List) CloseableUtils(com.datastax.oss.dsbulk.workflow.commons.utils.CloseableUtils) ReadResultMapper(com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper) DurationUtils(com.datastax.oss.dsbulk.workflow.api.utils.DurationUtils) MetricsManager(com.datastax.oss.dsbulk.workflow.commons.metrics.MetricsManager) Statement(com.datastax.oss.driver.api.core.cql.Statement) LogManager(com.datastax.oss.dsbulk.workflow.commons.log.LogManager) Scheduler(reactor.core.scheduler.Scheduler) Record(com.datastax.oss.dsbulk.connectors.api.Record)

Example 2 with ReadResultMapper

use of com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper in project dsbulk by datastax.

the class SchemaSettingsTest method should_create_row_mapper_with_inferred_mapping_and_skip_multiple.

@Test
void should_create_row_mapper_with_inferred_mapping_and_skip_multiple() {
    // Infer mapping, but skip C2 and C3.
    Config config = TestConfigUtils.createTestConfig("dsbulk.schema", "keyspace", "ks", "table", "t1", "mapping", "\" *=[-\\\"COL 2\\\", -c3] \"");
    SchemaSettings settings = new SchemaSettings(config, READ_AND_MAP);
    settings.init(session, codecFactory, false, true);
    ReadResultMapper mapper = settings.createReadResultMapper(session, recordMetadata, codecFactory, true);
    assertThat(mapper).isNotNull();
    ArgumentCaptor<String> argument = ArgumentCaptor.forClass(String.class);
    verify(session).prepare(argument.capture());
    assertThat(argument.getValue()).isEqualTo("SELECT c1 FROM ks.t1 WHERE token(c1) > :start AND token(c1) <= :end");
    assertMapping(mapper, C1, C1);
}
Also used : Config(com.typesafe.config.Config) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) ReadResultMapper(com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) Test(org.junit.jupiter.api.Test)

Example 3 with ReadResultMapper

use of com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper in project dsbulk by datastax.

the class SchemaSettingsTest method should_infer_select_query_without_solr_query_column.

@Test
void should_infer_select_query_without_solr_query_column() {
    ColumnMetadata solrQueryCol = mock(ColumnMetadata.class);
    CqlIdentifier solrQueryColName = CqlIdentifier.fromInternal("solr_query");
    when(solrQueryCol.getName()).thenReturn(solrQueryColName);
    when(solrQueryCol.getType()).thenReturn(DataTypes.TEXT);
    when(table.getColumns()).thenReturn(ImmutableMap.of(C1, col1, C2, col2, C3, col3, solrQueryColName, solrQueryCol));
    IndexMetadata idx = mock(IndexMetadata.class);
    CqlIdentifier idxName = CqlIdentifier.fromInternal("idx");
    when(table.getIndexes()).thenReturn(ImmutableMap.of(idxName, idx));
    when(idx.getClassName()).thenReturn(Optional.of("com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex"));
    Config config = TestConfigUtils.createTestConfig("dsbulk.schema", "keyspace", "ks", "table", "t1");
    SchemaSettings settings = new SchemaSettings(config, READ_AND_MAP);
    settings.init(session, codecFactory, false, true);
    ReadResultMapper mapper = settings.createReadResultMapper(session, recordMetadata, codecFactory, true);
    ArgumentCaptor<String> argument = ArgumentCaptor.forClass(String.class);
    verify(session).prepare(argument.capture());
    assertThat(argument.getValue()).isEqualTo("SELECT c1, \"COL 2\", c3 FROM ks.t1 WHERE token(c1) > :start AND token(c1) <= :end");
    assertMapping(mapper, C1, C1, C2, C2, C3, C3);
}
Also used : ColumnMetadata(com.datastax.oss.driver.api.core.metadata.schema.ColumnMetadata) Config(com.typesafe.config.Config) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) IndexMetadata(com.datastax.oss.driver.api.core.metadata.schema.IndexMetadata) CqlIdentifier(com.datastax.oss.driver.api.core.CqlIdentifier) ReadResultMapper(com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) Test(org.junit.jupiter.api.Test)

Example 4 with ReadResultMapper

use of com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper in project dsbulk by datastax.

the class SchemaSettingsTest method should_create_row_mapper_when_mapping_is_a_list_and_mapped.

@Test
void should_create_row_mapper_when_mapping_is_a_list_and_mapped() {
    Config config = TestConfigUtils.createTestConfig("dsbulk.schema", "mapping", "\"\\\"COL 2\\\", c1\", ", "keyspace", "ks", "table", "t1");
    SchemaSettings settings = new SchemaSettings(config, READ_AND_MAP);
    settings.init(session, codecFactory, false, true);
    ReadResultMapper mapper = settings.createReadResultMapper(session, recordMetadata, codecFactory, true);
    assertThat(mapper).isNotNull();
    ArgumentCaptor<String> argument = ArgumentCaptor.forClass(String.class);
    verify(session).prepare(argument.capture());
    assertThat(argument.getValue()).isEqualTo("SELECT \"COL 2\", c1 FROM ks.t1 WHERE token(c1) > :start AND token(c1) <= :end");
    assertMapping(mapper, C1, C1, C2, C2);
}
Also used : Config(com.typesafe.config.Config) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) ReadResultMapper(com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) Test(org.junit.jupiter.api.Test)

Example 5 with ReadResultMapper

use of com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper in project dsbulk by datastax.

the class SchemaSettingsTest method should_create_row_mapper_when_mapping_keyspace_and_table_provided.

@Test
void should_create_row_mapper_when_mapping_keyspace_and_table_provided() {
    Config config = TestConfigUtils.createTestConfig("dsbulk.schema", "mapping", "\" 0 = \\\"COL 2\\\" , 2 = c1 \", ", "keyspace", "ks", "table", "t1");
    SchemaSettings settings = new SchemaSettings(config, READ_AND_MAP);
    settings.init(session, codecFactory, true, false);
    ReadResultMapper mapper = settings.createReadResultMapper(session, recordMetadata, codecFactory, true);
    assertThat(mapper).isNotNull();
    ArgumentCaptor<String> argument = ArgumentCaptor.forClass(String.class);
    verify(session).prepare(argument.capture());
    assertThat(argument.getValue()).isEqualTo("SELECT \"COL 2\", c1 FROM ks.t1 WHERE token(c1) > :start AND token(c1) <= :end");
    assertMapping(mapper, "0", C2, "2", C1);
}
Also used : Config(com.typesafe.config.Config) ArgumentMatchers.anyString(org.mockito.ArgumentMatchers.anyString) ReadResultMapper(com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper) ParameterizedTest(org.junit.jupiter.params.ParameterizedTest) Test(org.junit.jupiter.api.Test)

Aggregations

ReadResultMapper (com.datastax.oss.dsbulk.workflow.commons.schema.ReadResultMapper)12 Config (com.typesafe.config.Config)12 Test (org.junit.jupiter.api.Test)10 ParameterizedTest (org.junit.jupiter.params.ParameterizedTest)10 ArgumentMatchers.anyString (org.mockito.ArgumentMatchers.anyString)10 CqlIdentifier (com.datastax.oss.driver.api.core.CqlIdentifier)2 CqlSession (com.datastax.oss.driver.api.core.CqlSession)2 Statement (com.datastax.oss.driver.api.core.cql.Statement)2 ColumnMetadata (com.datastax.oss.driver.api.core.metadata.schema.ColumnMetadata)2 IndexMetadata (com.datastax.oss.driver.api.core.metadata.schema.IndexMetadata)2 Stopwatch (com.datastax.oss.driver.shaded.guava.common.base.Stopwatch)2 ConvertingCodecFactory (com.datastax.oss.dsbulk.codecs.api.ConvertingCodecFactory)2 CommonConnectorFeature (com.datastax.oss.dsbulk.connectors.api.CommonConnectorFeature)2 Connector (com.datastax.oss.dsbulk.connectors.api.Connector)2 Record (com.datastax.oss.dsbulk.connectors.api.Record)2 RecordMetadata (com.datastax.oss.dsbulk.connectors.api.RecordMetadata)2 BulkReader (com.datastax.oss.dsbulk.executor.api.reader.BulkReader)2 ReadResult (com.datastax.oss.dsbulk.executor.api.result.ReadResult)2 Workflow (com.datastax.oss.dsbulk.workflow.api.Workflow)2 DurationUtils (com.datastax.oss.dsbulk.workflow.api.utils.DurationUtils)2