Search in sources :

Example 1 with Tuple

use of org.apache.nifi.util.Tuple in project nifi by apache.

the class FlowRegistryUtils method getRestrictedComponents.

public static Set<ConfigurableComponent> getRestrictedComponents(final VersionedProcessGroup group) {
    final Set<ConfigurableComponent> restrictedComponents = new HashSet<>();
    final Set<Tuple<String, BundleCoordinate>> componentTypes = new HashSet<>();
    populateComponentTypes(group, componentTypes);
    for (final Tuple<String, BundleCoordinate> tuple : componentTypes) {
        final ConfigurableComponent component = ExtensionManager.getTempComponent(tuple.getKey(), tuple.getValue());
        if (component == null) {
            throw new NiFiCoreException("Could not create an instance of component " + tuple.getKey() + " using bundle coordinates " + tuple.getValue());
        final boolean isRestricted = component.getClass().isAnnotationPresent(Restricted.class);
        if (isRestricted) {
    return restrictedComponents;
Also used : NiFiCoreException(org.apache.nifi.web.NiFiCoreException) ConfigurableComponent(org.apache.nifi.components.ConfigurableComponent) BundleCoordinate(org.apache.nifi.bundle.BundleCoordinate) Tuple(org.apache.nifi.util.Tuple) HashSet(java.util.HashSet)

Example 2 with Tuple

use of org.apache.nifi.util.Tuple in project nifi by apache.

the class Hive2JDBC method analyze.

public DataSetRefs analyze(AnalysisContext context, ProvenanceEventRecord event) {
    // Replace the colon so that the schema in the URI can be parsed correctly.
    final String transitUri = event.getTransitUri();
    if (transitUri == null) {
        return null;
    final Matcher uriMatcher = URI_PATTERN.matcher(transitUri);
    if (!uriMatcher.matches()) {
        logger.warn("Unexpected transit URI: {}", new Object[] { transitUri });
        return null;
    final String clusterName = context.getClusterResolver().fromHostNames(splitHostNames(;
    String connectedDatabaseName = null;
    if (uriMatcher.groupCount() > 1) {
        // Try to find connected database name from connection parameters.
        final String[] connectionParams =";");
        connectedDatabaseName = connectionParams[0];
    if (StringUtils.isEmpty(connectedDatabaseName)) {
        // If not found, then use "default".
        connectedDatabaseName = "default";
    final Set<Tuple<String, String>> inputTables = parseTableNames(connectedDatabaseName, event.getAttribute(ATTR_INPUT_TABLES));
    final Set<Tuple<String, String>> outputTables = parseTableNames(connectedDatabaseName, event.getAttribute(ATTR_OUTPUT_TABLES));
    if (inputTables.isEmpty() && outputTables.isEmpty()) {
        // If input/output tables are unknown, create database level lineage.
        return getDatabaseRef(event.getComponentId(), event.getEventType(), clusterName, connectedDatabaseName);
    final DataSetRefs refs = new DataSetRefs(event.getComponentId());
    addRefs(refs, true, clusterName, inputTables);
    addRefs(refs, false, clusterName, outputTables);
    return refs;
Also used : Matcher(java.util.regex.Matcher) DataSetRefs(org.apache.nifi.atlas.provenance.DataSetRefs) Tuple(org.apache.nifi.util.Tuple)

Example 3 with Tuple

use of org.apache.nifi.util.Tuple in project nifi by apache.

the class PersistentProvenanceRepository method mergeJournals.

 * <p>
 * Merges all of the given Journal Files into a single, merged Provenance
 * Event Log File. As these records are merged, they will be compressed, if
 * the repository is configured to compress records, and will be indexed.
 * </p>
 * <p>
 * If the repository is configured to compress the data, the file written to
 * may not be the same as the <code>suggestedMergeFile</code>, as a filename
 * extension of '.gz' may be appended. If the journals are successfully
 * merged, the file that they were merged into will be returned. If unable
 * to merge the records (for instance, because the repository has been
 * closed or because the list of journal files was empty), this method will
 * return <code>null</code>.
 * </p>
 * @param journalFiles the journal files to merge
 * @param suggestedMergeFile the file to write the merged records to
 * @param eventReporter the event reporter to report any warnings or errors
 * to; may be null.
 * @return the file that the given journals were merged into, or
 * <code>null</code> if no records were merged.
 * @throws IOException if a problem occurs writing to the mergedFile,
 * reading from a journal, or updating the Lucene Index.
File mergeJournals(final List<File> journalFiles, final File suggestedMergeFile, final EventReporter eventReporter) throws IOException {
    logger.debug("Merging {} to {}", journalFiles, suggestedMergeFile);
    if (this.closed.get()) {"Provenance Repository has been closed; will not merge journal files to {}", suggestedMergeFile);
        return null;
    if (journalFiles.isEmpty()) {
        logger.debug("Couldn't merge journals: Journal Files is empty; won't merge journals");
        return null;
    Collections.sort(journalFiles, new Comparator<File>() {

        public int compare(final File o1, final File o2) {
            final String suffix1 = LuceneUtil.substringAfterLast(o1.getName(), ".");
            final String suffix2 = LuceneUtil.substringAfterLast(o2.getName(), ".");
            try {
                final int journalIndex1 = Integer.parseInt(suffix1);
                final int journalIndex2 = Integer.parseInt(suffix2);
                return, journalIndex2);
            } catch (final NumberFormatException nfe) {
                return o1.getName().compareTo(o2.getName());
    // Search for any missing files. At this point they should have been written to disk otherwise cannot continue.
    // Missing files is most likely due to incomplete cleanup of files post merge
    final List<File> availableFiles = filterUnavailableFiles(journalFiles);
    final int numAvailableFiles = availableFiles.size();
    // check if we have all of the "partial" files for the journal.
    if (numAvailableFiles > 0) {
        if (suggestedMergeFile.exists()) {
            // we have all "partial" files and there is already a merged file. Delete the data from the index
            // because the merge file may not be fully merged. We will re-merge.
            logger.warn("Merged Journal File {} already exists; however, all partial journal files also exist " + "so assuming that the merge did not finish. Repeating procedure in order to ensure consistency.");
            final DeleteIndexAction deleteAction = new DeleteIndexAction(this, indexConfig, getIndexManager());
            try {
            } catch (final Exception e) {
                logger.warn("Failed to delete records from Journal File {} from the index; this could potentially result in duplicates. Failure was due to {}", suggestedMergeFile, e.toString());
                if (logger.isDebugEnabled()) {
                    logger.warn("", e);
            // file and the TOC file. Otherwise, we could get the wrong copy and have issues retrieving events.
            if (!suggestedMergeFile.delete()) {
                logger.error("Failed to delete partially written Provenance Journal File {}. This may result in events from this journal " + "file not being able to be displayed. This file should be deleted manually.", suggestedMergeFile);
            final File tocFile = TocUtil.getTocFile(suggestedMergeFile);
            if (tocFile.exists() && !tocFile.delete()) {
                logger.error("Failed to delete .toc file {}; this may result in not being able to read the Provenance Events from the {} Journal File. " + "This can be corrected by manually deleting the {} file", tocFile, suggestedMergeFile, tocFile);
    } else {
        logger.warn("Cannot merge journal files {} because they do not exist on disk", journalFiles);
        return null;
    final long startNanos = System.nanoTime();
    // Map each journal to a RecordReader
    final List<RecordReader> readers = new ArrayList<>();
    int records = 0;
    final boolean isCompress = configuration.isCompressOnRollover();
    final File writerFile = isCompress ? new File(suggestedMergeFile.getParentFile(), suggestedMergeFile.getName() + ".gz") : suggestedMergeFile;
    try {
        for (final File journalFile : availableFiles) {
            try {
                // Use MAX_VALUE for number of chars because we don't want to truncate the value as we write it
                // out. This allows us to later decide that we want more characters and still be able to retrieve
                // the entire event.
                readers.add(RecordReaders.newRecordReader(journalFile, null, Integer.MAX_VALUE));
            } catch (final EOFException eof) {
            // there's nothing here. Skip over it.
            } catch (final IOException ioe) {
                logger.warn("Unable to merge {} with other Journal Files due to {}", journalFile, ioe.toString());
                if (logger.isDebugEnabled()) {
                    logger.warn("", ioe);
                if (eventReporter != null) {
                    eventReporter.reportEvent(Severity.ERROR, EVENT_CATEGORY, "Failed to merge Journal Files due to " + ioe.toString());
        // Create a Map so that the key is the next record available from a reader and the value is the Reader from which
        // the record came. This sorted map is then used so that we are able to always get the first entry, which is the next
        // lowest record id
        final SortedMap<StandardProvenanceEventRecord, RecordReader> recordToReaderMap = new TreeMap<>(new Comparator<StandardProvenanceEventRecord>() {

            public int compare(final StandardProvenanceEventRecord o1, final StandardProvenanceEventRecord o2) {
                return, o2.getEventId());
        long minEventId = 0L;
        long earliestTimestamp = System.currentTimeMillis();
        for (final RecordReader reader : readers) {
            StandardProvenanceEventRecord record = null;
            try {
                record = reader.nextRecord();
            } catch (final EOFException eof) {
            // record will be null and reader can no longer be used
            } catch (final Exception e) {
                logger.warn("Failed to generate Provenance Event Record from Journal due to " + e + "; it's " + "possible that the record wasn't completely written to the file. This journal will be " + "skipped.");
                if (logger.isDebugEnabled()) {
                    logger.warn("", e);
                if (eventReporter != null) {
                    eventReporter.reportEvent(Severity.WARNING, EVENT_CATEGORY, "Failed to read Provenance Event " + "Record from Journal due to " + e + "; it's possible that the record wasn't " + "completely written to the file. This journal will be skipped.");
            if (record == null) {
            if (record.getEventTime() < earliestTimestamp) {
                earliestTimestamp = record.getEventTime();
            if (record.getEventId() < minEventId) {
                minEventId = record.getEventId();
            recordToReaderMap.put(record, reader);
        // We want to keep track of the last 1000 events in the files so that we can add them to 'ringBuffer'.
        // However, we don't want to add them directly to ringBuffer, because once they are added to ringBuffer, they are
        // available in query results. As a result, we can have the issue where we've not finished indexing the file
        // but we try to create the lineage for events in that file. In order to avoid this, we will add the records
        // to a temporary RingBuffer and after we finish merging the records will then copy the data to the
        // ringBuffer provided as a method argument.
        final RingBuffer<ProvenanceEventRecord> latestRecords = new RingBuffer<>(1000);
        // with the next entry from the journal file from which the previous record was written.
        try (final RecordWriter writer = RecordWriters.newSchemaRecordWriter(writerFile, idGenerator, configuration.isCompressOnRollover(), true)) {
            final IndexingAction indexingAction = createIndexingAction();
            final File indexingDirectory = indexConfig.getWritableIndexDirectory(writerFile, earliestTimestamp);
            long maxId = 0L;
            final BlockingQueue<Tuple<StandardProvenanceEventRecord, Integer>> eventQueue = new LinkedBlockingQueue<>(100);
            final AtomicBoolean finishedAdding = new AtomicBoolean(false);
            final List<Future<?>> futures = new ArrayList<>();
            final EventIndexWriter indexWriter = getIndexManager().borrowIndexWriter(indexingDirectory);
            try {
                final ExecutorService exec = Executors.newFixedThreadPool(configuration.getIndexThreadPoolSize(), new ThreadFactory() {

                    public Thread newThread(final Runnable r) {
                        final Thread t = Executors.defaultThreadFactory().newThread(r);
                        t.setName("Index Provenance Events");
                        return t;
                final AtomicInteger indexingFailureCount = new AtomicInteger(0);
                try {
                    for (int i = 0; i < configuration.getIndexThreadPoolSize(); i++) {
                        final Callable<Object> callable = new Callable<Object>() {

                            public Object call() throws IOException {
                                while (!eventQueue.isEmpty() || !finishedAdding.get()) {
                                    try {
                                        final Tuple<StandardProvenanceEventRecord, Integer> tuple;
                                        try {
                                            tuple = eventQueue.poll(10, TimeUnit.MILLISECONDS);
                                        } catch (final InterruptedException ie) {
                                        if (tuple == null) {
                                        indexingAction.index(tuple.getKey(), indexWriter.getIndexWriter(), tuple.getValue());
                                    } catch (final Throwable t) {
                                        logger.error("Failed to index Provenance Event for " + writerFile + " to " + indexingDirectory, t);
                                        if (indexingFailureCount.incrementAndGet() >= MAX_INDEXING_FAILURE_COUNT) {
                                            return null;
                                return null;
                        final Future<?> future = exec.submit(callable);
                    boolean indexEvents = true;
                    while (!recordToReaderMap.isEmpty()) {
                        final StandardProvenanceEventRecord record = recordToReaderMap.firstKey();
                        final RecordReader reader = recordToReaderMap.get(record);
                        final int blockIndex = writer.getTocWriter().getCurrentBlockIndex();
                        boolean accepted = false;
                        while (!accepted && indexEvents) {
                            try {
                                accepted = eventQueue.offer(new Tuple<>(record, blockIndex), 10, TimeUnit.MILLISECONDS);
                            } catch (final InterruptedException ie) {
                            // So, if the queue is filled, we will check if this is the case.
                            if (!accepted && indexingFailureCount.get() >= MAX_INDEXING_FAILURE_COUNT) {
                                // don't add anything else to the queue.
                                indexEvents = false;
                                final String warning = String.format("Indexing Provenance Events for %s has failed %s times. This exceeds the maximum threshold of %s failures, " + "so no more Provenance Events will be indexed for this Provenance file.", writerFile, indexingFailureCount.get(), MAX_INDEXING_FAILURE_COUNT);
                                if (eventReporter != null) {
                                    eventReporter.reportEvent(Severity.WARNING, EVENT_CATEGORY, warning);
                        maxId = record.getEventId();
                        // Remove this entry from the map
                        // Get the next entry from this reader and add it to the map
                        StandardProvenanceEventRecord nextRecord = null;
                        try {
                            nextRecord = reader.nextRecord();
                        } catch (final EOFException eof) {
                        // record will be null and reader can no longer be used
                        } catch (final Exception e) {
                            logger.warn("Failed to generate Provenance Event Record from Journal due to " + e + "; it's possible that the record wasn't completely written to the file. " + "The remainder of this journal will be skipped.");
                            if (logger.isDebugEnabled()) {
                                logger.warn("", e);
                            if (eventReporter != null) {
                                eventReporter.reportEvent(Severity.WARNING, EVENT_CATEGORY, "Failed to read " + "Provenance Event Record from Journal due to " + e + "; it's possible " + "that the record wasn't completely written to the file. The remainder " + "of this journal will be skipped.");
                        if (nextRecord != null) {
                            recordToReaderMap.put(nextRecord, reader);
                } finally {
                for (final Future<?> future : futures) {
                    try {
                    } catch (final ExecutionException ee) {
                        final Throwable t = ee.getCause();
                        if (t instanceof RuntimeException) {
                            throw (RuntimeException) t;
                        throw new RuntimeException(t);
                    } catch (final InterruptedException e) {
                        throw new RuntimeException("Thread interrupted");
            } finally {
        // record should now be available in the repository. We can copy the values from latestRecords to ringBuffer.
        final RingBuffer<ProvenanceEventRecord> latestRecordBuffer = this.latestRecords;
        latestRecords.forEach(new ForEachEvaluator<ProvenanceEventRecord>() {

            public boolean evaluate(final ProvenanceEventRecord event) {
                return true;
    } finally {
        for (final RecordReader reader : readers) {
            try {
            } catch (final IOException ioe) {
    // Success. Remove all of the journal files, as they're no longer needed, now that they've been merged.
    for (final File journalFile : availableFiles) {
        if (!journalFile.delete() && journalFile.exists()) {
            logger.warn("Failed to remove temporary journal file {}; this file should be cleaned up manually", journalFile.getAbsolutePath());
            if (eventReporter != null) {
                eventReporter.reportEvent(Severity.WARNING, EVENT_CATEGORY, "Failed to remove temporary journal file " + journalFile.getAbsolutePath() + "; this file should be cleaned up manually");
        final File tocFile = TocUtil.getTocFile(journalFile);
        if (!tocFile.delete() && tocFile.exists()) {
            logger.warn("Failed to remove temporary journal TOC file {}; this file should be cleaned up manually", tocFile.getAbsolutePath());
            if (eventReporter != null) {
                eventReporter.reportEvent(Severity.WARNING, EVENT_CATEGORY, "Failed to remove temporary journal TOC file " + tocFile.getAbsolutePath() + "; this file should be cleaned up manually");
    if (records == 0) {
        logger.debug("Couldn't merge journals: No Records to merge");
        return null;
    } else {
        final long nanos = System.nanoTime() - startNanos;
        final long millis = TimeUnit.MILLISECONDS.convert(nanos, TimeUnit.NANOSECONDS);"Successfully merged {} journal files ({} records) into single Provenance Log File {} in {} milliseconds", numAvailableFiles, records, suggestedMergeFile, millis);
    return writerFile;
Also used : NamedThreadFactory(org.apache.nifi.provenance.util.NamedThreadFactory) ThreadFactory(java.util.concurrent.ThreadFactory) RecordReader(org.apache.nifi.provenance.serialization.RecordReader) ArrayList(java.util.ArrayList) RingBuffer(org.apache.nifi.util.RingBuffer) LinkedBlockingQueue(java.util.concurrent.LinkedBlockingQueue) Callable(java.util.concurrent.Callable) RecordWriter(org.apache.nifi.provenance.serialization.RecordWriter) DeleteIndexAction(org.apache.nifi.provenance.lucene.DeleteIndexAction) EOFException( ExecutionException(java.util.concurrent.ExecutionException) IndexingAction(org.apache.nifi.provenance.lucene.IndexingAction) IOException( TreeMap(java.util.TreeMap) IndexNotFoundException(org.apache.lucene.index.IndexNotFoundException) ResourceNotFoundException(org.apache.nifi.web.ResourceNotFoundException) AccessDeniedException(org.apache.nifi.authorization.AccessDeniedException) IOException( ExecutionException(java.util.concurrent.ExecutionException) EOFException( FileNotFoundException( AtomicInteger(java.util.concurrent.atomic.AtomicInteger) AtomicBoolean(java.util.concurrent.atomic.AtomicBoolean) AtomicInteger(java.util.concurrent.atomic.AtomicInteger) ScheduledExecutorService(java.util.concurrent.ScheduledExecutorService) ExecutorService(java.util.concurrent.ExecutorService) Future(java.util.concurrent.Future) EventIndexWriter(org.apache.nifi.provenance.index.EventIndexWriter) File( Tuple(org.apache.nifi.util.Tuple)

Example 4 with Tuple

use of org.apache.nifi.util.Tuple in project nifi by apache.

the class QueryTask method readDocuments.

private Tuple<List<ProvenanceEventRecord>, Integer> readDocuments(final TopDocs topDocs, final IndexReader indexReader) {
    // If no topDocs is supplied, just provide a Tuple that has no records and a hit count of 0.
    if (topDocs == null || topDocs.totalHits == 0) {
        return new Tuple<>(Collections.<ProvenanceEventRecord>emptyList(), 0);
    final long start = System.nanoTime();
    final List<Long> eventIds = -> scoreDoc.doc).mapToObj(docId -> {
        try {
            return indexReader.document(docId, LUCENE_FIELDS_TO_LOAD);
        } catch (final Exception e) {
            throw new SearchFailedException("Failed to read Provenance Events from Event File", e);
    }).map(doc -> doc.getField(SearchableFields.Identifier.getSearchableFieldName()).numericValue().longValue()).collect(Collectors.toList());
    final long endConvert = System.nanoTime();
    final long ms = TimeUnit.NANOSECONDS.toMillis(endConvert - start);
    logger.debug("Converting documents took {} ms", ms);
    List<ProvenanceEventRecord> events;
    try {
        events = eventStore.getEvents(eventIds, authorizer, transformer);
    } catch (IOException e) {
        throw new SearchFailedException("Unable to retrieve events from the Provenance Store", e);
    final long fetchEventNanos = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - endConvert);
    logger.debug("Fetching {} events from Event Store took {} ms ({} events actually fetched)", eventIds.size(), fetchEventNanos, events.size());
    final int totalHits = topDocs.totalHits;
    return new Tuple<>(events, totalHits);
Also used : Query( TopDocs( EventIndexSearcher(org.apache.nifi.provenance.index.EventIndexSearcher) Arrays(java.util.Arrays) EventTransformer(org.apache.nifi.provenance.authorization.EventTransformer) Logger(org.slf4j.Logger) SearchFailedException(org.apache.nifi.provenance.index.SearchFailedException) LoggerFactory(org.slf4j.LoggerFactory) Set(java.util.Set) IOException( SearchableFields(org.apache.nifi.provenance.SearchableFields) Collectors( File( FileNotFoundException( TimeUnit(java.util.concurrent.TimeUnit) List(java.util.List) ProvenanceEventRecord(org.apache.nifi.provenance.ProvenanceEventRecord) Tuple(org.apache.nifi.util.Tuple) IndexManager(org.apache.nifi.provenance.lucene.IndexManager) EventStore( ProgressiveResult(org.apache.nifi.provenance.ProgressiveResult) Collections(java.util.Collections) IndexReader(org.apache.lucene.index.IndexReader) EventAuthorizer(org.apache.nifi.provenance.authorization.EventAuthorizer) SearchFailedException(org.apache.nifi.provenance.index.SearchFailedException) ProvenanceEventRecord(org.apache.nifi.provenance.ProvenanceEventRecord) IOException( Tuple(org.apache.nifi.util.Tuple) SearchFailedException(org.apache.nifi.provenance.index.SearchFailedException) IOException( FileNotFoundException(

Example 5 with Tuple

use of org.apache.nifi.util.Tuple in project nifi by apache.

the class LookupRecord method getFlowFileContext.

protected Tuple<Map<String, RecordPath>, RecordPath> getFlowFileContext(final FlowFile flowFile, final ProcessContext context) {
    final Map<String, RecordPath> recordPaths = new HashMap<>();
    for (final PropertyDescriptor prop : context.getProperties().keySet()) {
        if (!prop.isDynamic()) {
        final String pathText = context.getProperty(prop).evaluateAttributeExpressions(flowFile).getValue();
        final RecordPath lookupRecordPath = recordPathCache.getCompiled(pathText);
        recordPaths.put(prop.getName(), lookupRecordPath);
    final RecordPath resultRecordPath;
    if (context.getProperty(RESULT_RECORD_PATH).isSet()) {
        final String resultPathText = context.getProperty(RESULT_RECORD_PATH).evaluateAttributeExpressions(flowFile).getValue();
        resultRecordPath = recordPathCache.getCompiled(resultPathText);
    } else {
        resultRecordPath = null;
    return new Tuple<>(recordPaths, resultRecordPath);
Also used : PropertyDescriptor(org.apache.nifi.components.PropertyDescriptor) HashMap(java.util.HashMap) RecordPath(org.apache.nifi.record.path.RecordPath) Tuple(org.apache.nifi.util.Tuple)


Tuple (org.apache.nifi.util.Tuple)27 HashMap (java.util.HashMap)9 IOException ( ArrayList (java.util.ArrayList)8 Map (java.util.Map)7 File ( List (java.util.List)5 OptionalLong (java.util.OptionalLong)5 SchemaVersionInfo (com.hortonworks.registries.schemaregistry.SchemaVersionInfo)4 SchemaNotFoundException (com.hortonworks.registries.schemaregistry.errors.SchemaNotFoundException)4 FileNotFoundException ( Collections (java.util.Collections)4 TreeMap (java.util.TreeMap)4 Collectors ( DataSetRefs (org.apache.nifi.atlas.provenance.DataSetRefs)4 InputStream ( OptionalInt (java.util.OptionalInt)3 SortedMap (java.util.SortedMap)3 Matcher (java.util.regex.Matcher)3 AtlasEntity (org.apache.atlas.model.instance.AtlasEntity)3