Example 6 with FSDirectory

use of in project languagetool by languagetool-org.

the class GermanUppercasePhraseFinder method main.

public static void main(String[] args) throws IOException {
    if (args.length != 1) {
        System.out.println("Usage: " + GermanUppercasePhraseFinder.class.getSimpleName() + " <ngramIndexDir>");
    JLanguageTool lt = new JLanguageTool(Languages.getLanguageForShortCode("de"));
    FSDirectory fsDir = File(args[0]).toPath());
    IndexReader reader =;
    IndexSearcher searcher = new IndexSearcher(reader);
    Fields fields = MultiFields.getFields(reader);
    Terms terms = fields.terms("ngram");
    TermsEnum termsEnum = terms.iterator();
    int count = 0;
    BytesRef next;
    while ((next = != null) {
        String term = next.utf8ToString();
        //term = "persischer Golf";  // for testing
        String[] parts = term.split(" ");
        boolean useful = true;
        int lcCount = 0;
        List<String> ucParts = new ArrayList<>();
        for (String part : parts) {
            if (part.length() < MIN_TERM_LEN) {
                useful = false;
            String uc = StringTools.uppercaseFirstChar(part);
            if (!part.equals(uc)) {
        if (!useful || lcCount == 0 || lcCount == 2) {
        String uppercase = Strings.join(ucParts, " ");
        if (term.equals(uppercase)) {
        long thisCount = getOccurrenceCount(reader, searcher, term);
        long thisUpperCount = getOccurrenceCount(reader, searcher, uppercase);
        if (count % 10_000 == 0) {
            System.err.println(count + " @ " + term);
        if (thisCount > LIMIT || thisUpperCount > LIMIT) {
            if (thisUpperCount > thisCount) {
                if (isRelevant(lt, term)) {
                    float factor = (float) thisUpperCount / thisCount;
                    System.out.printf("%.2f " + thisUpperCount + " " + uppercase + " " + thisCount + " " + term + "\n", factor);
Example 7 with FSDirectory

use of in project CoreNLP by stanfordnlp.

the class PatternsForEachTokenLucene method setIndexReaderSearcher.

static synchronized void setIndexReaderSearcher() {
    try {
        FSDirectory index =;
        if (reader == null) {
            reader =;
            searcher = new IndexSearcher(reader);
        } else {
            DirectoryReader newreader = DirectoryReader.openIfChanged(reader);
            if (newreader != null) {
                reader = newreader;
                searcher = new IndexSearcher(reader);
    } catch (IOException e) {
        throw new RuntimeException(e);
Example 8 with FSDirectory

use of in project CoreNLP by stanfordnlp.

the class LuceneSentenceIndex method setIndexReaderSearcher.

void setIndexReaderSearcher() throws IOException {
    FSDirectory index =;
    if (reader == null) {
        reader =;
        searcher = new IndexSearcher(reader);
    } else {
        DirectoryReader newreader = DirectoryReader.openIfChanged(reader);
        if (newreader != null) {
            reader = newreader;
            searcher = new IndexSearcher(reader);
Example 9 with FSDirectory

use of in project zm-mailbox by Zimbra.

the class LuceneDirectory method open.

     * Creates a new {@link LuceneDirectory} with {@code SingleInstanceLockFactory}.
     * <p>
     * You can switch Lucene's {@link FSDirectory} implementation by {@link LC#zimbra_index_lucene_io_impl}.
     * <ul>
     *  <li>{@code null} -Lucene will try to pick the best {@link FSDirectory} implementation given the current
     *      environment. Currently this returns {@link MMapDirectory} for most Solaris and Windows 64-bit JREs,
     *      {@link NIOFSDirectory} for other non-Windows JREs, and {@link SimpleFSDirectory} for other JREs on Windows.
     *  <li>{@code simple} - straightforward implementation using However, it has poor
     *      concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from
     *      the same file.
     *  <li>{@code nio} - uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading
     *      from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but
     *      on all other platforms this is the preferred choice.
     *  <li>{@code mmap} - uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual
     *      memory relative to your index size, eg if you are running on a 64 bit JRE, or you are running on a 32 bit
     *      JRE but your index sizes are small enough to fit into the virtual memory space. Java has currently the
     *      limitation of not being able to unmap files from user code. The files are unmapped, when GC releases the
     *      byte buffers. Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the
     *      underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some
     *      time later, will the file handle be closed. This will consume additional transient disk usage: on Windows,
     *      attempts to delete or overwrite the files will result in an exception; on other platforms, which typically
     *      have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming
     *      space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk
     *      space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be
     *      aware of. This class supplies a (possibly dangerous) workaround mentioned in the bug report, which may fail
     *      on non-Sun JVMs.
     * </ul>
     * @param path directory path
public static LuceneDirectory open(File path) throws IOException {
    String impl = LC.zimbra_index_lucene_io_impl.value();
    FSDirectory dir;
    if ("nio".equals(impl)) {
        dir = new NIOFSDirectory(path, new SingleInstanceLockFactory());
    } else if ("mmap".equals(impl)) {
        dir = new MMapDirectory(path, new SingleInstanceLockFactory());
    } else if ("simple".equals(impl)) {
        dir = new SimpleFSDirectory(path, new SingleInstanceLockFactory());
    } else {
        dir =, new SingleInstanceLockFactory());
    }"OpenLuceneIndex impl=%s,dir=%s", dir.getClass().getSimpleName(), path);
    return new LuceneDirectory(dir);
Example 10 with FSDirectory

use of in project lucene-solr by apache.

the class TestBackwardsCompatibility method testCommandLineArgs.

public void testCommandLineArgs() throws Exception {
    PrintStream savedSystemOut = System.out;
    System.setOut(new PrintStream(new ByteArrayOutputStream(), false, "UTF-8"));
    try {
        for (Map.Entry<String, Directory> entry : oldIndexDirs.entrySet()) {
            String name = entry.getKey();
            int indexCreatedVersion = SegmentInfos.readLatestCommit(entry.getValue()).getIndexCreatedVersionMajor();
            Path dir = createTempDir(name);
            TestUtil.unzip(getDataInputStream("index." + name + ".zip"), dir);
            String path = dir.toAbsolutePath().toString();
            List<String> args = new ArrayList<>();
            if (random().nextBoolean()) {
            if (random().nextBoolean()) {
            if (random().nextBoolean()) {
                // TODO: need to better randomize this, but ...
                //  - LuceneTestCase.FS_DIRECTORIES is private
                //  - newFSDirectory returns BaseDirectoryWrapper
                //  - BaseDirectoryWrapper doesn't expose delegate
                Class<? extends FSDirectory> dirImpl = random().nextBoolean() ? SimpleFSDirectory.class : NIOFSDirectory.class;
            IndexUpgrader upgrader = null;
            try {
                upgrader = IndexUpgrader.parseArgs(args.toArray(new String[0]));
            } catch (Exception e) {
                throw new AssertionError("unable to parse args: " + args, e);
            Directory upgradedDir = newFSDirectory(dir);
            try {
                checkAllSegmentsUpgraded(upgradedDir, indexCreatedVersion);
            } finally {
    } finally {
