Search in sources :

Example 1 with TransactionCodec

use of org.apache.tephra.TransactionCodec in project cdap by caskdata.

the class HBaseTableTest method testCachedEncodedTransaction.

@Test
public void testCachedEncodedTransaction() throws Exception {
    String tableName = "testEncodedTxTable";
    DatasetProperties props = DatasetProperties.EMPTY;
    getTableAdmin(CONTEXT1, tableName, props).create();
    DatasetSpecification tableSpec = DatasetSpecification.builder(tableName, HBaseTable.class.getName()).build();
    // use a transaction codec that counts the number of times encode() is called
    final AtomicInteger encodeCount = new AtomicInteger();
    final TransactionCodec codec = new TransactionCodec() {

        @Override
        public byte[] encode(Transaction tx) throws IOException {
            encodeCount.incrementAndGet();
            return super.encode(tx);
        }
    };
    // use a table util that creates an HTable that validates the encoded tx on each get
    final AtomicReference<Transaction> txRef = new AtomicReference<>();
    HBaseTableUtil util = new DelegatingHBaseTableUtil(hBaseTableUtil) {

        @Override
        public HTable createHTable(Configuration conf, TableId tableId) throws IOException {
            HTable htable = super.createHTable(conf, tableId);
            return new MinimalDelegatingHTable(htable) {

                @Override
                public Result get(org.apache.hadoop.hbase.client.Get get) throws IOException {
                    Assert.assertEquals(txRef.get().getTransactionId(), codec.decode(get.getAttribute(TxConstants.TX_OPERATION_ATTRIBUTE_KEY)).getTransactionId());
                    return super.get(get);
                }

                @Override
                public Result[] get(List<org.apache.hadoop.hbase.client.Get> gets) throws IOException {
                    for (org.apache.hadoop.hbase.client.Get get : gets) {
                        Assert.assertEquals(txRef.get().getTransactionId(), codec.decode(get.getAttribute(TxConstants.TX_OPERATION_ATTRIBUTE_KEY)).getTransactionId());
                    }
                    return super.get(gets);
                }

                @Override
                public ResultScanner getScanner(org.apache.hadoop.hbase.client.Scan scan) throws IOException {
                    Assert.assertEquals(txRef.get().getTransactionId(), codec.decode(scan.getAttribute(TxConstants.TX_OPERATION_ATTRIBUTE_KEY)).getTransactionId());
                    return super.getScanner(scan);
                }
            };
        }
    };
    HBaseTable table = new HBaseTable(CONTEXT1, tableSpec, Collections.<String, String>emptyMap(), cConf, TEST_HBASE.getConfiguration(), util, codec);
    DetachedTxSystemClient txSystemClient = new DetachedTxSystemClient();
    // test all operations: only the first one encodes
    Transaction tx = txSystemClient.startShort();
    txRef.set(tx);
    table.startTx(tx);
    table.put(b("row1"), b("col1"), b("val1"));
    Assert.assertEquals(0, encodeCount.get());
    table.get(b("row"));
    Assert.assertEquals(1, encodeCount.get());
    table.get(ImmutableList.of(new Get("a"), new Get("b")));
    Assert.assertEquals(1, encodeCount.get());
    Scanner scanner = table.scan(new Scan(null, null));
    Assert.assertEquals(1, encodeCount.get());
    scanner.close();
    table.increment(b("z"), b("z"), 0L);
    Assert.assertEquals(1, encodeCount.get());
    table.commitTx();
    table.postTxCommit();
    // test that for the next tx, we encode again
    tx = txSystemClient.startShort();
    txRef.set(tx);
    table.startTx(tx);
    table.get(b("row"));
    Assert.assertEquals(2, encodeCount.get());
    table.commitTx();
    // test that we encode again, even of postTxCommit was not called
    tx = txSystemClient.startShort();
    txRef.set(tx);
    table.startTx(tx);
    table.get(b("row"));
    Assert.assertEquals(3, encodeCount.get());
    table.commitTx();
    table.rollbackTx();
    // test that rollback does not encode the tx
    Assert.assertEquals(3, encodeCount.get());
    // test that we encode again if the previous tx rolled back
    tx = txSystemClient.startShort();
    txRef.set(tx);
    table.startTx(tx);
    table.get(b("row"));
    Assert.assertEquals(4, encodeCount.get());
    table.commitTx();
    table.close();
    Assert.assertEquals(4, encodeCount.get());
}
Also used : TableId(co.cask.cdap.data2.util.TableId) RegionScanner(org.apache.hadoop.hbase.regionserver.RegionScanner) ResultScanner(org.apache.hadoop.hbase.client.ResultScanner) Scanner(co.cask.cdap.api.dataset.table.Scanner) CConfiguration(co.cask.cdap.common.conf.CConfiguration) Configuration(org.apache.hadoop.conf.Configuration) HTable(org.apache.hadoop.hbase.client.HTable) Result(org.apache.hadoop.hbase.client.Result) DetachedTxSystemClient(org.apache.tephra.inmemory.DetachedTxSystemClient) List(java.util.List) ImmutableList(com.google.common.collect.ImmutableList) DatasetProperties(co.cask.cdap.api.dataset.DatasetProperties) DatasetSpecification(co.cask.cdap.api.dataset.DatasetSpecification) AtomicReference(java.util.concurrent.atomic.AtomicReference) HBaseTableUtil(co.cask.cdap.data2.util.hbase.HBaseTableUtil) Transaction(org.apache.tephra.Transaction) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) TransactionCodec(org.apache.tephra.TransactionCodec) Get(co.cask.cdap.api.dataset.table.Get) Scan(co.cask.cdap.api.dataset.table.Scan) BufferingTableTest(co.cask.cdap.data2.dataset2.lib.table.BufferingTableTest) Test(org.junit.Test)

Example 2 with TransactionCodec

use of org.apache.tephra.TransactionCodec in project cdap by caskdata.

the class HBaseTableExporter method createSubmittableJob.

/**
 * Sets up the actual MapReduce job.
 * @param tx The transaction which needs to be passed to the Scan instance. This transaction is be used by
 *           coprocessors to filter out the data corresonding to the invalid transactions .
 * @param tableName Name of the table which need to be exported as HFiles.
 * @return the configured job
 * @throws IOException
 */
public Job createSubmittableJob(Transaction tx, String tableName) throws IOException {
    Job job = Job.getInstance(hConf, "HBaseTableExporter");
    job.setJarByClass(HBaseTableExporter.class);
    Scan scan = new Scan();
    scan.setCacheBlocks(false);
    // Set the transaction attribute for the scan.
    scan.setAttribute(TxConstants.TX_OPERATION_ATTRIBUTE_KEY, new TransactionCodec().encode(tx));
    job.setNumReduceTasks(0);
    TableMapReduceUtil.initTableMapperJob(tableName, scan, Import.KeyValueImporter.class, null, null, job);
    FileSystem fs = FileSystem.get(hConf);
    Random rand = new Random();
    Path root = new Path(fs.getWorkingDirectory(), "hbasetableexporter");
    fs.mkdirs(root);
    while (true) {
        bulkloadDir = new Path(root, "" + rand.nextLong());
        if (!fs.exists(bulkloadDir)) {
            break;
        }
    }
    HFileOutputFormat2.setOutputPath(job, bulkloadDir);
    HTable hTable = new HTable(hConf, tableName);
    HFileOutputFormat2.configureIncrementalLoad(job, hTable);
    return job;
}
Also used : Path(org.apache.hadoop.fs.Path) Import(org.apache.hadoop.hbase.mapreduce.Import) Random(java.util.Random) TransactionCodec(org.apache.tephra.TransactionCodec) FileSystem(org.apache.hadoop.fs.FileSystem) Scan(org.apache.hadoop.hbase.client.Scan) Job(org.apache.hadoop.mapreduce.Job) HTable(org.apache.hadoop.hbase.client.HTable)

Example 3 with TransactionCodec

use of org.apache.tephra.TransactionCodec in project cdap by caskdata.

the class MapReduceTaskContextProvider method createCacheLoader.

/**
 * Creates a {@link CacheLoader} for the task context cache.
 */
private CacheLoader<ContextCacheKey, BasicMapReduceTaskContext> createCacheLoader(final Injector injector) {
    final DiscoveryServiceClient discoveryServiceClient = injector.getInstance(DiscoveryServiceClient.class);
    final DatasetFramework datasetFramework = injector.getInstance(DatasetFramework.class);
    final SecureStore secureStore = injector.getInstance(SecureStore.class);
    final SecureStoreManager secureStoreManager = injector.getInstance(SecureStoreManager.class);
    final MessagingService messagingService = injector.getInstance(MessagingService.class);
    // Multiple instances of BasicMapReduceTaskContext can share the same program.
    final AtomicReference<Program> programRef = new AtomicReference<>();
    return new CacheLoader<ContextCacheKey, BasicMapReduceTaskContext>() {

        @Override
        public BasicMapReduceTaskContext load(ContextCacheKey key) throws Exception {
            TaskAttemptID taskAttemptId = key.getTaskAttemptID();
            // taskAttemptId could be null if used from a org.apache.hadoop.mapreduce.Partitioner or
            // from a org.apache.hadoop.io.RawComparator, in which case we can get the JobId from the conf. Note that the
            // JobId isn't in the conf for the OutputCommitter#setupJob method, in which case we use the taskAttemptId
            Path txFile = MainOutputCommitter.getTxFile(key.getConfiguration(), taskAttemptId != null ? taskAttemptId.getJobID() : null);
            FileSystem fs = txFile.getFileSystem(key.getConfiguration());
            Preconditions.checkArgument(fs.exists(txFile));
            Transaction tx;
            try (FSDataInputStream txFileInputStream = fs.open(txFile)) {
                byte[] txByteArray = ByteStreams.toByteArray(txFileInputStream);
                tx = new TransactionCodec().decode(txByteArray);
            }
            MapReduceContextConfig contextConfig = new MapReduceContextConfig(key.getConfiguration());
            MapReduceClassLoader classLoader = MapReduceClassLoader.getFromConfiguration(key.getConfiguration());
            Program program = programRef.get();
            if (program == null) {
                // Creation of program is relatively cheap, so just create and do compare and set.
                programRef.compareAndSet(null, createProgram(contextConfig, classLoader.getProgramClassLoader()));
                program = programRef.get();
            }
            WorkflowProgramInfo workflowInfo = contextConfig.getWorkflowProgramInfo();
            DatasetFramework programDatasetFramework = workflowInfo == null ? datasetFramework : NameMappedDatasetFramework.createFromWorkflowProgramInfo(datasetFramework, workflowInfo, program.getApplicationSpecification());
            // Setup dataset framework context, if required
            if (programDatasetFramework instanceof ProgramContextAware) {
                ProgramRunId programRunId = program.getId().run(ProgramRunners.getRunId(contextConfig.getProgramOptions()));
                ((ProgramContextAware) programDatasetFramework).setContext(new BasicProgramContext(programRunId));
            }
            MapReduceSpecification spec = program.getApplicationSpecification().getMapReduce().get(program.getName());
            MetricsCollectionService metricsCollectionService = null;
            MapReduceMetrics.TaskType taskType = null;
            String taskId = null;
            ProgramOptions options = contextConfig.getProgramOptions();
            // from a org.apache.hadoop.io.RawComparator
            if (taskAttemptId != null) {
                taskId = taskAttemptId.getTaskID().toString();
                if (MapReduceMetrics.TaskType.hasType(taskAttemptId.getTaskType())) {
                    taskType = MapReduceMetrics.TaskType.from(taskAttemptId.getTaskType());
                    // if this is not for a mapper or a reducer, we don't need the metrics collection service
                    metricsCollectionService = injector.getInstance(MetricsCollectionService.class);
                    options = new SimpleProgramOptions(options.getProgramId(), options.getArguments(), new BasicArguments(RuntimeArguments.extractScope("task", taskType.toString().toLowerCase(), contextConfig.getProgramOptions().getUserArguments().asMap())), options.isDebug());
                }
            }
            CConfiguration cConf = injector.getInstance(CConfiguration.class);
            TransactionSystemClient txClient = injector.getInstance(TransactionSystemClient.class);
            return new BasicMapReduceTaskContext(program, options, cConf, taskType, taskId, spec, workflowInfo, discoveryServiceClient, metricsCollectionService, txClient, tx, programDatasetFramework, classLoader.getPluginInstantiator(), contextConfig.getLocalizedResources(), secureStore, secureStoreManager, authorizationEnforcer, authenticationContext, messagingService, mapReduceClassLoader);
        }
    };
}
Also used : DiscoveryServiceClient(org.apache.twill.discovery.DiscoveryServiceClient) TaskAttemptID(org.apache.hadoop.mapreduce.TaskAttemptID) NameMappedDatasetFramework(co.cask.cdap.internal.app.runtime.workflow.NameMappedDatasetFramework) DatasetFramework(co.cask.cdap.data2.dataset2.DatasetFramework) TransactionSystemClient(org.apache.tephra.TransactionSystemClient) FileSystem(org.apache.hadoop.fs.FileSystem) SecureStoreManager(co.cask.cdap.api.security.store.SecureStoreManager) BasicArguments(co.cask.cdap.internal.app.runtime.BasicArguments) MapReduceMetrics(co.cask.cdap.app.metrics.MapReduceMetrics) Path(org.apache.hadoop.fs.Path) Program(co.cask.cdap.app.program.Program) DefaultProgram(co.cask.cdap.app.program.DefaultProgram) MetricsCollectionService(co.cask.cdap.api.metrics.MetricsCollectionService) MapReduceSpecification(co.cask.cdap.api.mapreduce.MapReduceSpecification) AtomicReference(java.util.concurrent.atomic.AtomicReference) BasicProgramContext(co.cask.cdap.internal.app.runtime.BasicProgramContext) SecureStore(co.cask.cdap.api.security.store.SecureStore) CConfiguration(co.cask.cdap.common.conf.CConfiguration) SimpleProgramOptions(co.cask.cdap.internal.app.runtime.SimpleProgramOptions) ProgramOptions(co.cask.cdap.app.runtime.ProgramOptions) MessagingService(co.cask.cdap.messaging.MessagingService) Transaction(org.apache.tephra.Transaction) WorkflowProgramInfo(co.cask.cdap.internal.app.runtime.workflow.WorkflowProgramInfo) TransactionCodec(org.apache.tephra.TransactionCodec) FSDataInputStream(org.apache.hadoop.fs.FSDataInputStream) CacheLoader(com.google.common.cache.CacheLoader) ProgramRunId(co.cask.cdap.proto.id.ProgramRunId) SimpleProgramOptions(co.cask.cdap.internal.app.runtime.SimpleProgramOptions) ProgramContextAware(co.cask.cdap.data.ProgramContextAware)

Example 4 with TransactionCodec

use of org.apache.tephra.TransactionCodec in project cdap by caskdata.

the class MainOutputCommitter method setupJob.

@Override
public void setupJob(JobContext jobContext) throws IOException {
    Configuration configuration = jobContext.getConfiguration();
    MapReduceClassLoader classLoader = MapReduceClassLoader.getFromConfiguration(configuration);
    MapReduceTaskContextProvider taskContextProvider = classLoader.getTaskContextProvider();
    Injector injector = taskContextProvider.getInjector();
    cConf = injector.getInstance(CConfiguration.class);
    MapReduceContextConfig contextConfig = new MapReduceContextConfig(jobContext.getConfiguration());
    ProgramId programId = contextConfig.getProgramId();
    LOG.info("Setting up for MapReduce job: namespaceId={}, applicationId={}, program={}, runid={}", programId.getNamespace(), programId.getApplication(), programId.getProgram(), ProgramRunners.getRunId(contextConfig.getProgramOptions()));
    RetryStrategy retryStrategy = SystemArguments.getRetryStrategy(contextConfig.getProgramOptions().getUserArguments().asMap(), contextConfig.getProgramId().getType(), cConf);
    this.txClient = new RetryingLongTransactionSystemClient(injector.getInstance(TransactionSystemClient.class), retryStrategy);
    // We start long-running tx to be used by mapreduce job tasks.
    this.transaction = txClient.startLong();
    // Write the tx somewhere, so that we can re-use it in mapreduce tasks
    Path txFile = getTxFile(configuration, jobContext.getJobID());
    FileSystem fs = txFile.getFileSystem(configuration);
    try (FSDataOutputStream fsDataOutputStream = fs.create(txFile, false)) {
        fsDataOutputStream.write(new TransactionCodec().encode(transaction));
    }
    // we can instantiate the TaskContext after we set the tx above. It's used by the operations below
    taskContext = taskContextProvider.get(taskAttemptContext);
    this.outputs = Outputs.transform(contextConfig.getOutputs(), taskContext);
    super.setupJob(jobContext);
}
Also used : Path(org.apache.hadoop.fs.Path) CConfiguration(co.cask.cdap.common.conf.CConfiguration) Configuration(org.apache.hadoop.conf.Configuration) ProgramId(co.cask.cdap.proto.id.ProgramId) CConfiguration(co.cask.cdap.common.conf.CConfiguration) RetryingLongTransactionSystemClient(co.cask.cdap.data2.transaction.RetryingLongTransactionSystemClient) Injector(com.google.inject.Injector) TransactionCodec(org.apache.tephra.TransactionCodec) FileSystem(org.apache.hadoop.fs.FileSystem) FSDataOutputStream(org.apache.hadoop.fs.FSDataOutputStream) RetryStrategy(co.cask.cdap.common.service.RetryStrategy)

Aggregations

TransactionCodec (org.apache.tephra.TransactionCodec)4 CConfiguration (co.cask.cdap.common.conf.CConfiguration)3 FileSystem (org.apache.hadoop.fs.FileSystem)3 Path (org.apache.hadoop.fs.Path)3 AtomicReference (java.util.concurrent.atomic.AtomicReference)2 Configuration (org.apache.hadoop.conf.Configuration)2 HTable (org.apache.hadoop.hbase.client.HTable)2 Transaction (org.apache.tephra.Transaction)2 DatasetProperties (co.cask.cdap.api.dataset.DatasetProperties)1 DatasetSpecification (co.cask.cdap.api.dataset.DatasetSpecification)1 Get (co.cask.cdap.api.dataset.table.Get)1 Scan (co.cask.cdap.api.dataset.table.Scan)1 Scanner (co.cask.cdap.api.dataset.table.Scanner)1 MapReduceSpecification (co.cask.cdap.api.mapreduce.MapReduceSpecification)1 MetricsCollectionService (co.cask.cdap.api.metrics.MetricsCollectionService)1 SecureStore (co.cask.cdap.api.security.store.SecureStore)1 SecureStoreManager (co.cask.cdap.api.security.store.SecureStoreManager)1 MapReduceMetrics (co.cask.cdap.app.metrics.MapReduceMetrics)1 DefaultProgram (co.cask.cdap.app.program.DefaultProgram)1 Program (co.cask.cdap.app.program.Program)1