use of org.apache.lucene.store.FSDirectory in project languagetool by languagetool-org.
the class GermanUppercasePhraseFinder method main.
public static void main(String[] args) throws IOException {
if (args.length != 1) {
System.out.println("Usage: " + GermanUppercasePhraseFinder.class.getSimpleName() + " <ngramIndexDir>");
System.exit(1);
}
JLanguageTool lt = new JLanguageTool(Languages.getLanguageForShortCode("de"));
FSDirectory fsDir = FSDirectory.open(new File(args[0]).toPath());
IndexReader reader = DirectoryReader.open(fsDir);
IndexSearcher searcher = new IndexSearcher(reader);
Fields fields = MultiFields.getFields(reader);
Terms terms = fields.terms("ngram");
TermsEnum termsEnum = terms.iterator();
int count = 0;
BytesRef next;
while ((next = termsEnum.next()) != null) {
String term = next.utf8ToString();
count++;
//term = "persischer Golf"; // for testing
String[] parts = term.split(" ");
boolean useful = true;
int lcCount = 0;
List<String> ucParts = new ArrayList<>();
for (String part : parts) {
if (part.length() < MIN_TERM_LEN) {
useful = false;
break;
}
String uc = StringTools.uppercaseFirstChar(part);
if (!part.equals(uc)) {
lcCount++;
}
ucParts.add(uc);
}
if (!useful || lcCount == 0 || lcCount == 2) {
continue;
}
String uppercase = Strings.join(ucParts, " ");
if (term.equals(uppercase)) {
continue;
}
long thisCount = getOccurrenceCount(reader, searcher, term);
long thisUpperCount = getOccurrenceCount(reader, searcher, uppercase);
if (count % 10_000 == 0) {
System.err.println(count + " @ " + term);
}
if (thisCount > LIMIT || thisUpperCount > LIMIT) {
if (thisUpperCount > thisCount) {
if (isRelevant(lt, term)) {
float factor = (float) thisUpperCount / thisCount;
System.out.printf("%.2f " + thisUpperCount + " " + uppercase + " " + thisCount + " " + term + "\n", factor);
}
}
}
}
}
use of org.apache.lucene.store.FSDirectory in project CoreNLP by stanfordnlp.
the class PatternsForEachTokenLucene method setIndexReaderSearcher.
static synchronized void setIndexReaderSearcher() {
try {
FSDirectory index = NIOFSDirectory.open(indexDir);
if (reader == null) {
reader = DirectoryReader.open(index);
searcher = new IndexSearcher(reader);
} else {
DirectoryReader newreader = DirectoryReader.openIfChanged(reader);
if (newreader != null) {
reader.close();
reader = newreader;
searcher = new IndexSearcher(reader);
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
use of org.apache.lucene.store.FSDirectory in project CoreNLP by stanfordnlp.
the class LuceneSentenceIndex method setIndexReaderSearcher.
void setIndexReaderSearcher() throws IOException {
FSDirectory index = FSDirectory.open(indexDir);
if (reader == null) {
reader = DirectoryReader.open(index);
searcher = new IndexSearcher(reader);
} else {
DirectoryReader newreader = DirectoryReader.openIfChanged(reader);
if (newreader != null) {
reader.close();
reader = newreader;
searcher = new IndexSearcher(reader);
}
}
}
use of org.apache.lucene.store.FSDirectory in project zm-mailbox by Zimbra.
the class LuceneDirectory method open.
/**
* Creates a new {@link LuceneDirectory} with {@code SingleInstanceLockFactory}.
* <p>
* You can switch Lucene's {@link FSDirectory} implementation by {@link LC#zimbra_index_lucene_io_impl}.
* <ul>
* <li>{@code null} -Lucene will try to pick the best {@link FSDirectory} implementation given the current
* environment. Currently this returns {@link MMapDirectory} for most Solaris and Windows 64-bit JREs,
* {@link NIOFSDirectory} for other non-Windows JREs, and {@link SimpleFSDirectory} for other JREs on Windows.
* <li>{@code simple} - straightforward implementation using java.io.RandomAccessFile. However, it has poor
* concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from
* the same file.
* <li>{@code nio} - uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading
* from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but
* on all other platforms this is the preferred choice.
* <li>{@code mmap} - uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual
* memory relative to your index size, eg if you are running on a 64 bit JRE, or you are running on a 32 bit
* JRE but your index sizes are small enough to fit into the virtual memory space. Java has currently the
* limitation of not being able to unmap files from user code. The files are unmapped, when GC releases the
* byte buffers. Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the
* underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some
* time later, will the file handle be closed. This will consume additional transient disk usage: on Windows,
* attempts to delete or overwrite the files will result in an exception; on other platforms, which typically
* have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming
* space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk
* space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be
* aware of. This class supplies a (possibly dangerous) workaround mentioned in the bug report, which may fail
* on non-Sun JVMs.
* </ul>
*
* @param path directory path
*/
public static LuceneDirectory open(File path) throws IOException {
String impl = LC.zimbra_index_lucene_io_impl.value();
FSDirectory dir;
if ("nio".equals(impl)) {
dir = new NIOFSDirectory(path, new SingleInstanceLockFactory());
} else if ("mmap".equals(impl)) {
dir = new MMapDirectory(path, new SingleInstanceLockFactory());
} else if ("simple".equals(impl)) {
dir = new SimpleFSDirectory(path, new SingleInstanceLockFactory());
} else {
dir = FSDirectory.open(path, new SingleInstanceLockFactory());
}
ZimbraLog.index.info("OpenLuceneIndex impl=%s,dir=%s", dir.getClass().getSimpleName(), path);
return new LuceneDirectory(dir);
}
use of org.apache.lucene.store.FSDirectory in project lucene-solr by apache.
the class TestBackwardsCompatibility method testCommandLineArgs.
public void testCommandLineArgs() throws Exception {
PrintStream savedSystemOut = System.out;
System.setOut(new PrintStream(new ByteArrayOutputStream(), false, "UTF-8"));
try {
for (Map.Entry<String, Directory> entry : oldIndexDirs.entrySet()) {
String name = entry.getKey();
int indexCreatedVersion = SegmentInfos.readLatestCommit(entry.getValue()).getIndexCreatedVersionMajor();
Path dir = createTempDir(name);
TestUtil.unzip(getDataInputStream("index." + name + ".zip"), dir);
String path = dir.toAbsolutePath().toString();
List<String> args = new ArrayList<>();
if (random().nextBoolean()) {
args.add("-verbose");
}
if (random().nextBoolean()) {
args.add("-delete-prior-commits");
}
if (random().nextBoolean()) {
// TODO: need to better randomize this, but ...
// - LuceneTestCase.FS_DIRECTORIES is private
// - newFSDirectory returns BaseDirectoryWrapper
// - BaseDirectoryWrapper doesn't expose delegate
Class<? extends FSDirectory> dirImpl = random().nextBoolean() ? SimpleFSDirectory.class : NIOFSDirectory.class;
args.add("-dir-impl");
args.add(dirImpl.getName());
}
args.add(path);
IndexUpgrader upgrader = null;
try {
upgrader = IndexUpgrader.parseArgs(args.toArray(new String[0]));
} catch (Exception e) {
throw new AssertionError("unable to parse args: " + args, e);
}
upgrader.upgrade();
Directory upgradedDir = newFSDirectory(dir);
try {
checkAllSegmentsUpgraded(upgradedDir, indexCreatedVersion);
} finally {
upgradedDir.close();
}
}
} finally {
System.setOut(savedSystemOut);
}
}
Aggregations