Search in sources :

Example 1 with DataCopyPath

use of org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath in project hive by apache.

the class CreateFunctionHandler method copyFunctionBinaries.

private void copyFunctionBinaries(List<DataCopyPath> functionBinaryCopyPaths, HiveConf hiveConf) throws MetaException, IOException, LoginException, HiveFatalException {
    if (!functionBinaryCopyPaths.isEmpty()) {
        String distCpDoAsUser = hiveConf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER);
        List<ReplChangeManager.FileInfo> filePaths = new ArrayList<>();
        for (DataCopyPath funcBinCopyPath : functionBinaryCopyPaths) {
            String[] decodedURISplits = ReplChangeManager.decodeFileUri(funcBinCopyPath.getSrcPath().toString());
            ReplChangeManager.FileInfo fileInfo = ReplChangeManager.getFileInfo(new Path(decodedURISplits[0]), decodedURISplits[1], decodedURISplits[2], decodedURISplits[3], hiveConf);
            filePaths.add(fileInfo);
            Path destRoot = funcBinCopyPath.getTargetPath().getParent();
            FileSystem dstFs = destRoot.getFileSystem(hiveConf);
            CopyUtils copyUtils = new CopyUtils(distCpDoAsUser, hiveConf, dstFs);
            copyUtils.copyAndVerify(destRoot, filePaths, funcBinCopyPath.getSrcPath(), true, false);
            copyUtils.renameFileCopiedFromCmPath(destRoot, dstFs, filePaths);
        }
    }
}
Also used : DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) Path(org.apache.hadoop.fs.Path) FileSystem(org.apache.hadoop.fs.FileSystem) ArrayList(java.util.ArrayList) DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) ReplChangeManager(org.apache.hadoop.hive.metastore.ReplChangeManager) CopyUtils(org.apache.hadoop.hive.ql.parse.repl.CopyUtils)

Example 2 with DataCopyPath

use of org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath in project hive by apache.

the class CreateFunctionHandler method handle.

@Override
public void handle(Context withinContext) throws Exception {
    Function functionObj = eventMessage.getFunctionObj();
    if (functionObj.getResourceUris() == null || functionObj.getResourceUris().isEmpty()) {
        LOG.info("Not replicating function: " + functionObj.getFunctionName() + " as it seems to have been created " + "without USING clause");
        return;
    }
    LOG.info("Processing#{} CREATE_FUNCTION message : {}", fromEventId(), eventMessageAsJSON);
    Path metadataPath = new Path(withinContext.eventRoot, EximUtil.METADATA_NAME);
    Path dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME);
    FileSystem fileSystem = metadataPath.getFileSystem(withinContext.hiveConf);
    boolean copyAtLoad = withinContext.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET);
    List<DataCopyPath> functionBinaryCopyPaths = new ArrayList<>();
    try (JsonWriter jsonWriter = new JsonWriter(fileSystem, metadataPath)) {
        FunctionSerializer serializer = new FunctionSerializer(functionObj, dataPath, copyAtLoad, withinContext.hiveConf);
        serializer.writeTo(jsonWriter, withinContext.replicationSpec);
        functionBinaryCopyPaths.addAll(serializer.getFunctionBinaryCopyPaths());
    }
    withinContext.createDmd(this).write();
    copyFunctionBinaries(functionBinaryCopyPaths, withinContext.hiveConf);
}
Also used : DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) Path(org.apache.hadoop.fs.Path) Function(org.apache.hadoop.hive.metastore.api.Function) FileSystem(org.apache.hadoop.fs.FileSystem) ArrayList(java.util.ArrayList) DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) FunctionSerializer(org.apache.hadoop.hive.ql.parse.repl.dump.io.FunctionSerializer) JsonWriter(org.apache.hadoop.hive.ql.parse.repl.dump.io.JsonWriter)

Example 3 with DataCopyPath

use of org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath in project hive by apache.

the class PartitionExport method write.

List<DataCopyPath> write(final ReplicationSpec forReplicationSpec, boolean isExportTask, FileList fileList, boolean dataCopyAtLoad) throws InterruptedException, HiveException {
    List<Future<?>> futures = new LinkedList<>();
    List<DataCopyPath> managedTableCopyPaths = new LinkedList<>();
    ExecutorService producer = Executors.newFixedThreadPool(1, new ThreadFactoryBuilder().setNameFormat("partition-submitter-thread-%d").build());
    futures.add(producer.submit(() -> {
        SessionState.setCurrentSessionState(callersSession);
        for (Partition partition : partitionIterable) {
            try {
                queue.put(partition);
            } catch (InterruptedException e) {
                throw new RuntimeException("Error while queuing up the partitions for export of data files", e);
            }
        }
    }));
    producer.shutdown();
    ThreadFactory namingThreadFactory = new ThreadFactoryBuilder().setNameFormat("partition-dump-thread-%d").build();
    ExecutorService consumer = Executors.newFixedThreadPool(nThreads, namingThreadFactory);
    while (!producer.isTerminated() || !queue.isEmpty()) {
        /*
      This is removed using a poll because there can be a case where there partitions iterator is empty
      but because both the producer and consumer are started simultaneously the while loop will execute
      because producer is not terminated but it wont produce anything so queue will be empty and then we
      should only wait for a specific time before continuing, as the next loop cycle will fail.
       */
        Partition partition = queue.poll(1, TimeUnit.SECONDS);
        if (partition == null) {
            continue;
        }
        LOG.debug("scheduling partition dump {}", partition.getName());
        futures.add(consumer.submit(() -> {
            String partitionName = partition.getName();
            String threadName = Thread.currentThread().getName();
            LOG.debug("Thread: {}, start partition dump {}", threadName, partitionName);
            try {
                // Data Copy in case of ExportTask or when dataCopyAtLoad is true
                List<Path> dataPathList = Utils.getDataPathList(partition.getDataLocation(), forReplicationSpec, hiveConf);
                Path rootDataDumpDir = isExportTask ? paths.partitionMetadataExportDir(partitionName) : paths.partitionDataExportDir(partitionName);
                new FileOperations(dataPathList, rootDataDumpDir, distCpDoAsUser, hiveConf, mmCtx).export(isExportTask, dataCopyAtLoad);
                Path dataDumpDir = new Path(paths.dataExportRootDir(), partitionName);
                LOG.debug("Thread: {}, finish partition dump {}", threadName, partitionName);
                if (!(isExportTask || dataCopyAtLoad)) {
                    fileList.add(new DataCopyPath(forReplicationSpec, partition.getDataLocation(), dataDumpDir).convertToString());
                }
            } catch (Exception e) {
                throw new RuntimeException(e.getMessage(), e);
            }
        }));
    }
    consumer.shutdown();
    for (Future<?> future : futures) {
        try {
            future.get();
        } catch (Exception e) {
            LOG.error("failed", e.getCause());
            throw new HiveException(e.getCause().getMessage(), e.getCause());
        }
    }
    // may be drive this via configuration as well.
    consumer.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
    return managedTableCopyPaths;
}
Also used : DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) Path(org.apache.hadoop.fs.Path) Partition(org.apache.hadoop.hive.ql.metadata.Partition) ThreadFactory(java.util.concurrent.ThreadFactory) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) FileOperations(org.apache.hadoop.hive.ql.parse.repl.dump.io.FileOperations) LinkedList(java.util.LinkedList) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) ExecutorService(java.util.concurrent.ExecutorService) Future(java.util.concurrent.Future) DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) ThreadFactoryBuilder(com.google.common.util.concurrent.ThreadFactoryBuilder) FileList(org.apache.hadoop.hive.ql.exec.repl.util.FileList) List(java.util.List) LinkedList(java.util.LinkedList)

Example 4 with DataCopyPath

use of org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath in project hive by apache.

the class TableExport method writeData.

private void writeData(PartitionIterable partitions, boolean isExportTask, FileList fileList, boolean dataCopyAtLoad) throws SemanticException {
    try {
        if (tableSpec.tableHandle.isPartitioned()) {
            if (partitions == null) {
                throw new IllegalStateException("partitions cannot be null for partitionTable :" + tableSpec.getTableName().getTable());
            }
            new PartitionExport(paths, partitions, distCpDoAsUser, conf, mmCtx).write(replicationSpec, isExportTask, fileList, dataCopyAtLoad);
        } else {
            List<Path> dataPathList = Utils.getDataPathList(tableSpec.tableHandle.getDataLocation(), replicationSpec, conf);
            if (!(isExportTask || dataCopyAtLoad)) {
                fileList.add(new DataCopyPath(replicationSpec, tableSpec.tableHandle.getDataLocation(), paths.dataExportDir()).convertToString());
            }
            new FileOperations(dataPathList, paths.dataExportDir(), distCpDoAsUser, conf, mmCtx).export(isExportTask, (dataCopyAtLoad));
        }
    } catch (Exception e) {
        throw new SemanticException(e.getMessage(), e);
    }
}
Also used : Path(org.apache.hadoop.fs.Path) DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) FileOperations(org.apache.hadoop.hive.ql.parse.repl.dump.io.FileOperations) DataCopyPath(org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath) SemanticException(org.apache.hadoop.hive.ql.parse.SemanticException) IOException(java.io.IOException) FileNotFoundException(java.io.FileNotFoundException) NoSuchObjectException(org.apache.hadoop.hive.metastore.api.NoSuchObjectException) HiveException(org.apache.hadoop.hive.ql.metadata.HiveException) SemanticException(org.apache.hadoop.hive.ql.parse.SemanticException)

Aggregations

Path (org.apache.hadoop.fs.Path)4 DataCopyPath (org.apache.hadoop.hive.ql.parse.EximUtil.DataCopyPath)4 ArrayList (java.util.ArrayList)2 FileSystem (org.apache.hadoop.fs.FileSystem)2 HiveException (org.apache.hadoop.hive.ql.metadata.HiveException)2 FileOperations (org.apache.hadoop.hive.ql.parse.repl.dump.io.FileOperations)2 ThreadFactoryBuilder (com.google.common.util.concurrent.ThreadFactoryBuilder)1 FileNotFoundException (java.io.FileNotFoundException)1 IOException (java.io.IOException)1 LinkedList (java.util.LinkedList)1 List (java.util.List)1 ExecutorService (java.util.concurrent.ExecutorService)1 Future (java.util.concurrent.Future)1 ThreadFactory (java.util.concurrent.ThreadFactory)1 ReplChangeManager (org.apache.hadoop.hive.metastore.ReplChangeManager)1 Function (org.apache.hadoop.hive.metastore.api.Function)1 NoSuchObjectException (org.apache.hadoop.hive.metastore.api.NoSuchObjectException)1 FileList (org.apache.hadoop.hive.ql.exec.repl.util.FileList)1 Partition (org.apache.hadoop.hive.ql.metadata.Partition)1 SemanticException (org.apache.hadoop.hive.ql.parse.SemanticException)1