Search in sources :

Example 21 with SimpleFSDirectory

use of org.apache.lucene.store.SimpleFSDirectory in project crate by crate.

the class PersistedClusterStateServiceTests method testFailsIfGlobalMetadataIsMissing.

public void testFailsIfGlobalMetadataIsMissing() throws IOException {
    try (NodeEnvironment nodeEnvironment = newNodeEnvironment(createDataPaths())) {
        try (Writer writer = newPersistedClusterStateService(nodeEnvironment).createWriter()) {
            final ClusterState clusterState = loadPersistedClusterState(newPersistedClusterStateService(nodeEnvironment));
            writeState(writer, 0L, ClusterState.builder(clusterState).version(randomLongBetween(1L, Long.MAX_VALUE)).build(), clusterState);
        }
        final Path brokenPath = randomFrom(nodeEnvironment.nodeDataPaths());
        try (Directory directory = new SimpleFSDirectory(brokenPath.resolve(PersistedClusterStateService.METADATA_DIRECTORY_NAME))) {
            final IndexWriterConfig indexWriterConfig = new IndexWriterConfig();
            indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
            try (IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig)) {
                indexWriter.commit();
            }
        }
        final String message = expectThrows(IllegalStateException.class, () -> newPersistedClusterStateService(nodeEnvironment).loadBestOnDiskState()).getMessage();
        assertThat(message, allOf(containsString("no global metadata found"), containsString(brokenPath.toString())));
    }
}
Also used : Path(java.nio.file.Path) ClusterState(org.elasticsearch.cluster.ClusterState) NodeEnvironment(org.elasticsearch.env.NodeEnvironment) IndexWriter(org.apache.lucene.index.IndexWriter) Matchers.containsString(org.hamcrest.Matchers.containsString) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) Writer(org.elasticsearch.gateway.PersistedClusterStateService.Writer) IndexWriter(org.apache.lucene.index.IndexWriter) Directory(org.apache.lucene.store.Directory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) FilterDirectory(org.apache.lucene.store.FilterDirectory) IndexWriterConfig(org.apache.lucene.index.IndexWriterConfig)

Example 22 with SimpleFSDirectory

use of org.apache.lucene.store.SimpleFSDirectory in project zm-mailbox by Zimbra.

the class LuceneDirectory method open.

/**
 * Creates a new {@link LuceneDirectory} with {@code SingleInstanceLockFactory}.
 * <p>
 * You can switch Lucene's {@link FSDirectory} implementation by {@link LC#zimbra_index_lucene_io_impl}.
 * <ul>
 *  <li>{@code null} -Lucene will try to pick the best {@link FSDirectory} implementation given the current
 *      environment. Currently this returns {@link MMapDirectory} for most Solaris and Windows 64-bit JREs,
 *      {@link NIOFSDirectory} for other non-Windows JREs, and {@link SimpleFSDirectory} for other JREs on Windows.
 *  <li>{@code simple} - straightforward implementation using java.io.RandomAccessFile. However, it has poor
 *      concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from
 *      the same file.
 *  <li>{@code nio} - uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading
 *      from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but
 *      on all other platforms this is the preferred choice.
 *  <li>{@code mmap} - uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual
 *      memory relative to your index size, eg if you are running on a 64 bit JRE, or you are running on a 32 bit
 *      JRE but your index sizes are small enough to fit into the virtual memory space. Java has currently the
 *      limitation of not being able to unmap files from user code. The files are unmapped, when GC releases the
 *      byte buffers. Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the
 *      underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some
 *      time later, will the file handle be closed. This will consume additional transient disk usage: on Windows,
 *      attempts to delete or overwrite the files will result in an exception; on other platforms, which typically
 *      have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming
 *      space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk
 *      space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be
 *      aware of. This class supplies a (possibly dangerous) workaround mentioned in the bug report, which may fail
 *      on non-Sun JVMs.
 * </ul>
 *
 * @param path directory path
 */
public static LuceneDirectory open(File path) throws IOException {
    String impl = LC.zimbra_index_lucene_io_impl.value();
    FSDirectory dir;
    if ("nio".equals(impl)) {
        dir = new NIOFSDirectory(path, new SingleInstanceLockFactory());
    } else if ("mmap".equals(impl)) {
        dir = new MMapDirectory(path, new SingleInstanceLockFactory());
    } else if ("simple".equals(impl)) {
        dir = new SimpleFSDirectory(path, new SingleInstanceLockFactory());
    } else {
        dir = FSDirectory.open(path, new SingleInstanceLockFactory());
    }
    ZimbraLog.index.info("OpenLuceneIndex impl=%s,dir=%s", dir.getClass().getSimpleName(), path);
    return new LuceneDirectory(dir);
}
Also used : NIOFSDirectory(org.apache.lucene.store.NIOFSDirectory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) NIOFSDirectory(org.apache.lucene.store.NIOFSDirectory) FSDirectory(org.apache.lucene.store.FSDirectory) SingleInstanceLockFactory(org.apache.lucene.store.SingleInstanceLockFactory) MMapDirectory(org.apache.lucene.store.MMapDirectory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory)

Example 23 with SimpleFSDirectory

use of org.apache.lucene.store.SimpleFSDirectory in project languagetool by languagetool-org.

the class Searcher method main.

public static void main(String[] args) throws Exception {
    ensureCorrectUsageOrExit(args);
    long startTime = System.currentTimeMillis();
    String[] ruleIds = args[0].split(",");
    String languageCode = args[1];
    Language language = Languages.getLanguageForShortCode(languageCode);
    File indexDir = new File(args[2]);
    boolean limitSearch = !(args.length > 3 && "--no_limit".equals(args[3]));
    Searcher searcher = new Searcher(new SimpleFSDirectory(indexDir.toPath()));
    if (!limitSearch) {
        searcher.setMaxHits(100_000);
    }
    searcher.limitSearch = limitSearch;
    ContextTools contextTools = getContextTools(140);
    int totalMatches = 0;
    for (String ruleId : ruleIds) {
        long ruleStartTime = System.currentTimeMillis();
        for (PatternRule rule : searcher.getRuleById(ruleId, language)) {
            System.out.println("===== " + rule.getFullId() + " =========================================================");
            SearcherResult searcherResult = searcher.findRuleMatchesOnIndex(rule, language);
            int i = 1;
            if (searcherResult.getMatchingSentences().size() == 0) {
                System.out.println("[no matches]");
            }
            for (MatchingSentence ruleMatch : searcherResult.getMatchingSentences()) {
                for (RuleMatch match : ruleMatch.getRuleMatches()) {
                    String context = contextTools.getContext(match.getFromPos(), match.getToPos(), ruleMatch.getSentence());
                    if (WIKITEXT_OUTPUT) {
                        ContextTools contextTools2 = getContextTools(0);
                        String coveredText = contextTools2.getContext(match.getFromPos(), match.getToPos(), ruleMatch.getSentence());
                        coveredText = coveredText.replaceFirst("^\\.\\.\\.", "").replaceFirst("\\.\\.\\.$", "");
                        coveredText = coveredText.replaceFirst("^\\*\\*", "").replaceFirst("\\*\\*$", "");
                        String encodedTextWithQuotes = URLEncoder.encode("\"" + coveredText + "\"", "UTF-8");
                        String searchLink = "https://de.wikipedia.org/w/index.php?search=" + encodedTextWithQuotes + "&title=Spezial%3ASuche&go=Artikel";
                        context = context.replaceAll("\\*\\*.*?\\*\\*", "[" + searchLink + " " + coveredText + "]");
                        String encTitle = URLEncoder.encode(ruleMatch.getTitle(), "UTF-8");
                        String encodedText = URLEncoder.encode(coveredText, "UTF-8");
                        System.out.println("# [[" + ruleMatch.getTitle() + "]]: " + context + " ([http://wikipedia.ramselehof.de/wikiblame.php?user_lang=de&lang=de&project=wikipedia&article=" + encTitle + "&needle=" + encodedText + "&skipversions=0&ignorefirst=0&limit=500&searchmethod=int&order=desc&start=Start WikiBlame])");
                    } else {
                        System.out.println(i + ": " + context + " [" + ruleMatch.getSource() + "]");
                    }
                }
                totalMatches += ruleMatch.getRuleMatches().size();
                i++;
            }
            System.out.println("Time: " + (System.currentTimeMillis() - ruleStartTime) + "ms");
        }
    }
    System.out.println("Total time: " + (System.currentTimeMillis() - startTime) + "ms, " + totalMatches + " matches");
}
Also used : PatternRule(org.languagetool.rules.patterns.PatternRule) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) ContextTools(org.languagetool.tools.ContextTools) RuleMatch(org.languagetool.rules.RuleMatch) Language(org.languagetool.Language) File(java.io.File)

Example 24 with SimpleFSDirectory

use of org.apache.lucene.store.SimpleFSDirectory in project elasticsearch by elastic.

the class HunspellService method loadDictionary.

/**
     * Loads the hunspell dictionary for the given local.
     *
     * @param locale       The locale of the hunspell dictionary to be loaded.
     * @param nodeSettings The node level settings
     * @param env          The node environment (from which the conf path will be resolved)
     * @return The loaded Hunspell dictionary
     * @throws Exception when loading fails (due to IO errors or malformed dictionary files)
     */
private Dictionary loadDictionary(String locale, Settings nodeSettings, Environment env) throws Exception {
    if (logger.isDebugEnabled()) {
        logger.debug("Loading hunspell dictionary [{}]...", locale);
    }
    Path dicDir = hunspellDir.resolve(locale);
    if (FileSystemUtils.isAccessibleDirectory(dicDir, logger) == false) {
        throw new ElasticsearchException(String.format(Locale.ROOT, "Could not find hunspell dictionary [%s]", locale));
    }
    // merging node settings with hunspell dictionary specific settings
    Settings dictSettings = HUNSPELL_DICTIONARY_OPTIONS.get(nodeSettings);
    nodeSettings = loadDictionarySettings(dicDir, dictSettings.getByPrefix(locale + "."));
    boolean ignoreCase = nodeSettings.getAsBoolean("ignore_case", defaultIgnoreCase);
    Path[] affixFiles = FileSystemUtils.files(dicDir, "*.aff");
    if (affixFiles.length == 0) {
        throw new ElasticsearchException(String.format(Locale.ROOT, "Missing affix file for hunspell dictionary [%s]", locale));
    }
    if (affixFiles.length != 1) {
        throw new ElasticsearchException(String.format(Locale.ROOT, "Too many affix files exist for hunspell dictionary [%s]", locale));
    }
    InputStream affixStream = null;
    Path[] dicFiles = FileSystemUtils.files(dicDir, "*.dic");
    List<InputStream> dicStreams = new ArrayList<>(dicFiles.length);
    try {
        for (int i = 0; i < dicFiles.length; i++) {
            dicStreams.add(Files.newInputStream(dicFiles[i]));
        }
        affixStream = Files.newInputStream(affixFiles[0]);
        try (Directory tmp = new SimpleFSDirectory(env.tmpFile())) {
            return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
        }
    } catch (Exception e) {
        logger.error((Supplier<?>) () -> new ParameterizedMessage("Could not load hunspell dictionary [{}]", locale), e);
        throw e;
    } finally {
        IOUtils.close(affixStream);
        IOUtils.close(dicStreams);
    }
}
Also used : Path(java.nio.file.Path) Dictionary(org.apache.lucene.analysis.hunspell.Dictionary) InputStream(java.io.InputStream) ArrayList(java.util.ArrayList) ElasticsearchException(org.elasticsearch.ElasticsearchException) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) ElasticsearchException(org.elasticsearch.ElasticsearchException) IOException(java.io.IOException) Supplier(org.apache.logging.log4j.util.Supplier) ParameterizedMessage(org.apache.logging.log4j.message.ParameterizedMessage) Settings(org.elasticsearch.common.settings.Settings) Directory(org.apache.lucene.store.Directory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory)

Example 25 with SimpleFSDirectory

use of org.apache.lucene.store.SimpleFSDirectory in project elasticsearch by elastic.

the class Checkpoint method read.

public static Checkpoint read(Path path) throws IOException {
    try (Directory dir = new SimpleFSDirectory(path.getParent())) {
        try (IndexInput indexInput = dir.openInput(path.getFileName().toString(), IOContext.DEFAULT)) {
            // We checksum the entire file before we even go and parse it. If it's corrupted we barf right here.
            CodecUtil.checksumEntireFile(indexInput);
            final int fileVersion = CodecUtil.checkHeader(indexInput, CHECKPOINT_CODEC, INITIAL_VERSION, CURRENT_VERSION);
            if (fileVersion == INITIAL_VERSION) {
                assert indexInput.length() == V1_FILE_SIZE : indexInput.length();
                return Checkpoint.readCheckpointV5_0_0(indexInput);
            } else {
                assert fileVersion == CURRENT_VERSION : fileVersion;
                assert indexInput.length() == FILE_SIZE : indexInput.length();
                return Checkpoint.readCheckpointV6_0_0(indexInput);
            }
        }
    }
}
Also used : IndexInput(org.apache.lucene.store.IndexInput) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) Directory(org.apache.lucene.store.Directory)

Aggregations

SimpleFSDirectory (org.apache.lucene.store.SimpleFSDirectory)37 Directory (org.apache.lucene.store.Directory)23 Path (java.nio.file.Path)15 IOException (java.io.IOException)13 File (java.io.File)9 IndexWriter (org.apache.lucene.index.IndexWriter)9 FSDirectory (org.apache.lucene.store.FSDirectory)7 Settings (org.elasticsearch.common.settings.Settings)7 LockObtainFailedException (org.apache.lucene.store.LockObtainFailedException)6 CorruptIndexException (org.apache.lucene.index.CorruptIndexException)5 IndexSearcher (org.apache.lucene.search.IndexSearcher)5 FilterDirectory (org.apache.lucene.store.FilterDirectory)5 IndexInput (org.apache.lucene.store.IndexInput)5 InputStream (java.io.InputStream)4 ParameterizedMessage (org.apache.logging.log4j.message.ParameterizedMessage)4 Dictionary (org.apache.lucene.analysis.hunspell.Dictionary)4 IndexReader (org.apache.lucene.index.IndexReader)4 IndexWriterConfig (org.apache.lucene.index.IndexWriterConfig)4 MMapDirectory (org.apache.lucene.store.MMapDirectory)4 NIOFSDirectory (org.apache.lucene.store.NIOFSDirectory)4