Search in sources :

Example 6 with FSDirectory

use of org.apache.lucene.store.FSDirectory in project languagetool by languagetool-org.

the class GermanUppercasePhraseFinder method main.

public static void main(String[] args) throws IOException {
    if (args.length != 1) {
        System.out.println("Usage: " + GermanUppercasePhraseFinder.class.getSimpleName() + " <ngramIndexDir>");
        System.exit(1);
    }
    JLanguageTool lt = new JLanguageTool(Languages.getLanguageForShortCode("de"));
    FSDirectory fsDir = FSDirectory.open(new File(args[0]).toPath());
    IndexReader reader = DirectoryReader.open(fsDir);
    IndexSearcher searcher = new IndexSearcher(reader);
    Fields fields = MultiFields.getFields(reader);
    Terms terms = fields.terms("ngram");
    TermsEnum termsEnum = terms.iterator();
    int count = 0;
    BytesRef next;
    while ((next = termsEnum.next()) != null) {
        String term = next.utf8ToString();
        count++;
        //term = "persischer Golf";  // for testing
        String[] parts = term.split(" ");
        boolean useful = true;
        int lcCount = 0;
        List<String> ucParts = new ArrayList<>();
        for (String part : parts) {
            if (part.length() < MIN_TERM_LEN) {
                useful = false;
                break;
            }
            String uc = StringTools.uppercaseFirstChar(part);
            if (!part.equals(uc)) {
                lcCount++;
            }
            ucParts.add(uc);
        }
        if (!useful || lcCount == 0 || lcCount == 2) {
            continue;
        }
        String uppercase = Strings.join(ucParts, " ");
        if (term.equals(uppercase)) {
            continue;
        }
        long thisCount = getOccurrenceCount(reader, searcher, term);
        long thisUpperCount = getOccurrenceCount(reader, searcher, uppercase);
        if (count % 10_000 == 0) {
            System.err.println(count + " @ " + term);
        }
        if (thisCount > LIMIT || thisUpperCount > LIMIT) {
            if (thisUpperCount > thisCount) {
                if (isRelevant(lt, term)) {
                    float factor = (float) thisUpperCount / thisCount;
                    System.out.printf("%.2f " + thisUpperCount + " " + uppercase + " " + thisCount + " " + term + "\n", factor);
                }
            }
        }
    }
}
Also used : IndexSearcher(org.apache.lucene.search.IndexSearcher) JLanguageTool(org.languagetool.JLanguageTool) ArrayList(java.util.ArrayList) FSDirectory(org.apache.lucene.store.FSDirectory) File(java.io.File) BytesRef(org.apache.lucene.util.BytesRef)

Example 7 with FSDirectory

use of org.apache.lucene.store.FSDirectory in project CoreNLP by stanfordnlp.

the class PatternsForEachTokenLucene method setIndexReaderSearcher.

static synchronized void setIndexReaderSearcher() {
    try {
        FSDirectory index = NIOFSDirectory.open(indexDir);
        if (reader == null) {
            reader = DirectoryReader.open(index);
            searcher = new IndexSearcher(reader);
        } else {
            DirectoryReader newreader = DirectoryReader.openIfChanged(reader);
            if (newreader != null) {
                reader.close();
                reader = newreader;
                searcher = new IndexSearcher(reader);
            }
        }
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
Also used : NIOFSDirectory(org.apache.lucene.store.NIOFSDirectory) FSDirectory(org.apache.lucene.store.FSDirectory)

Example 8 with FSDirectory

use of org.apache.lucene.store.FSDirectory in project CoreNLP by stanfordnlp.

the class LuceneSentenceIndex method setIndexReaderSearcher.

void setIndexReaderSearcher() throws IOException {
    FSDirectory index = FSDirectory.open(indexDir);
    if (reader == null) {
        reader = DirectoryReader.open(index);
        searcher = new IndexSearcher(reader);
    } else {
        DirectoryReader newreader = DirectoryReader.openIfChanged(reader);
        if (newreader != null) {
            reader.close();
            reader = newreader;
            searcher = new IndexSearcher(reader);
        }
    }
}
Also used : FSDirectory(org.apache.lucene.store.FSDirectory)

Example 9 with FSDirectory

use of org.apache.lucene.store.FSDirectory in project zm-mailbox by Zimbra.

the class LuceneDirectory method open.

/**
     * Creates a new {@link LuceneDirectory} with {@code SingleInstanceLockFactory}.
     * <p>
     * You can switch Lucene's {@link FSDirectory} implementation by {@link LC#zimbra_index_lucene_io_impl}.
     * <ul>
     *  <li>{@code null} -Lucene will try to pick the best {@link FSDirectory} implementation given the current
     *      environment. Currently this returns {@link MMapDirectory} for most Solaris and Windows 64-bit JREs,
     *      {@link NIOFSDirectory} for other non-Windows JREs, and {@link SimpleFSDirectory} for other JREs on Windows.
     *  <li>{@code simple} - straightforward implementation using java.io.RandomAccessFile. However, it has poor
     *      concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from
     *      the same file.
     *  <li>{@code nio} - uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading
     *      from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but
     *      on all other platforms this is the preferred choice.
     *  <li>{@code mmap} - uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual
     *      memory relative to your index size, eg if you are running on a 64 bit JRE, or you are running on a 32 bit
     *      JRE but your index sizes are small enough to fit into the virtual memory space. Java has currently the
     *      limitation of not being able to unmap files from user code. The files are unmapped, when GC releases the
     *      byte buffers. Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the
     *      underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some
     *      time later, will the file handle be closed. This will consume additional transient disk usage: on Windows,
     *      attempts to delete or overwrite the files will result in an exception; on other platforms, which typically
     *      have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming
     *      space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk
     *      space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be
     *      aware of. This class supplies a (possibly dangerous) workaround mentioned in the bug report, which may fail
     *      on non-Sun JVMs.
     * </ul>
     *
     * @param path directory path
     */
public static LuceneDirectory open(File path) throws IOException {
    String impl = LC.zimbra_index_lucene_io_impl.value();
    FSDirectory dir;
    if ("nio".equals(impl)) {
        dir = new NIOFSDirectory(path, new SingleInstanceLockFactory());
    } else if ("mmap".equals(impl)) {
        dir = new MMapDirectory(path, new SingleInstanceLockFactory());
    } else if ("simple".equals(impl)) {
        dir = new SimpleFSDirectory(path, new SingleInstanceLockFactory());
    } else {
        dir = FSDirectory.open(path, new SingleInstanceLockFactory());
    }
    ZimbraLog.index.info("OpenLuceneIndex impl=%s,dir=%s", dir.getClass().getSimpleName(), path);
    return new LuceneDirectory(dir);
}
Also used : NIOFSDirectory(org.apache.lucene.store.NIOFSDirectory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) NIOFSDirectory(org.apache.lucene.store.NIOFSDirectory) FSDirectory(org.apache.lucene.store.FSDirectory) SingleInstanceLockFactory(org.apache.lucene.store.SingleInstanceLockFactory) MMapDirectory(org.apache.lucene.store.MMapDirectory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory)

Example 10 with FSDirectory

use of org.apache.lucene.store.FSDirectory in project lucene-solr by apache.

the class TestBackwardsCompatibility method testCommandLineArgs.

public void testCommandLineArgs() throws Exception {
    PrintStream savedSystemOut = System.out;
    System.setOut(new PrintStream(new ByteArrayOutputStream(), false, "UTF-8"));
    try {
        for (Map.Entry<String, Directory> entry : oldIndexDirs.entrySet()) {
            String name = entry.getKey();
            int indexCreatedVersion = SegmentInfos.readLatestCommit(entry.getValue()).getIndexCreatedVersionMajor();
            Path dir = createTempDir(name);
            TestUtil.unzip(getDataInputStream("index." + name + ".zip"), dir);
            String path = dir.toAbsolutePath().toString();
            List<String> args = new ArrayList<>();
            if (random().nextBoolean()) {
                args.add("-verbose");
            }
            if (random().nextBoolean()) {
                args.add("-delete-prior-commits");
            }
            if (random().nextBoolean()) {
                // TODO: need to better randomize this, but ...
                //  - LuceneTestCase.FS_DIRECTORIES is private
                //  - newFSDirectory returns BaseDirectoryWrapper
                //  - BaseDirectoryWrapper doesn't expose delegate
                Class<? extends FSDirectory> dirImpl = random().nextBoolean() ? SimpleFSDirectory.class : NIOFSDirectory.class;
                args.add("-dir-impl");
                args.add(dirImpl.getName());
            }
            args.add(path);
            IndexUpgrader upgrader = null;
            try {
                upgrader = IndexUpgrader.parseArgs(args.toArray(new String[0]));
            } catch (Exception e) {
                throw new AssertionError("unable to parse args: " + args, e);
            }
            upgrader.upgrade();
            Directory upgradedDir = newFSDirectory(dir);
            try {
                checkAllSegmentsUpgraded(upgradedDir, indexCreatedVersion);
            } finally {
                upgradedDir.close();
            }
        }
    } finally {
        System.setOut(savedSystemOut);
    }
}
Also used : Path(java.nio.file.Path) PrintStream(java.io.PrintStream) ArrayList(java.util.ArrayList) ByteArrayOutputStream(java.io.ByteArrayOutputStream) BinaryPoint(org.apache.lucene.document.BinaryPoint) DoublePoint(org.apache.lucene.document.DoublePoint) LongPoint(org.apache.lucene.document.LongPoint) IntPoint(org.apache.lucene.document.IntPoint) FloatPoint(org.apache.lucene.document.FloatPoint) IOException(java.io.IOException) Map(java.util.Map) HashMap(java.util.HashMap) Directory(org.apache.lucene.store.Directory) RAMDirectory(org.apache.lucene.store.RAMDirectory) FSDirectory(org.apache.lucene.store.FSDirectory) SimpleFSDirectory(org.apache.lucene.store.SimpleFSDirectory) NIOFSDirectory(org.apache.lucene.store.NIOFSDirectory)

Aggregations

FSDirectory (org.apache.lucene.store.FSDirectory)43 File (java.io.File)18 Directory (org.apache.lucene.store.Directory)12 IOException (java.io.IOException)10 Path (java.nio.file.Path)10 IndexSearcher (org.apache.lucene.search.IndexSearcher)9 FileNotFoundException (java.io.FileNotFoundException)5 FileSystem (java.nio.file.FileSystem)5 Document (org.apache.lucene.document.Document)5 IndexReader (org.apache.lucene.index.IndexReader)5 MMapDirectory (org.apache.lucene.store.MMapDirectory)5 NIOFSDirectory (org.apache.lucene.store.NIOFSDirectory)5 FilterDirectory (org.apache.lucene.store.FilterDirectory)4 SimpleFSDirectory (org.apache.lucene.store.SimpleFSDirectory)4 PrintStream (java.io.PrintStream)3 ArrayList (java.util.ArrayList)3 DirectoryReader (org.apache.lucene.index.DirectoryReader)3 Term (org.apache.lucene.index.Term)3 WindowsFS (org.apache.lucene.mockfile.WindowsFS)3 TermQuery (org.apache.lucene.search.TermQuery)3