use of org.apache.lucene.store.SimpleFSDirectory in project crate by crate.
the class PersistedClusterStateServiceTests method testFailsIfGlobalMetadataIsMissing.
public void testFailsIfGlobalMetadataIsMissing() throws IOException {
try (NodeEnvironment nodeEnvironment = newNodeEnvironment(createDataPaths())) {
try (Writer writer = newPersistedClusterStateService(nodeEnvironment).createWriter()) {
final ClusterState clusterState = loadPersistedClusterState(newPersistedClusterStateService(nodeEnvironment));
writeState(writer, 0L, ClusterState.builder(clusterState).version(randomLongBetween(1L, Long.MAX_VALUE)).build(), clusterState);
}
final Path brokenPath = randomFrom(nodeEnvironment.nodeDataPaths());
try (Directory directory = new SimpleFSDirectory(brokenPath.resolve(PersistedClusterStateService.METADATA_DIRECTORY_NAME))) {
final IndexWriterConfig indexWriterConfig = new IndexWriterConfig();
indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
try (IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig)) {
indexWriter.commit();
}
}
final String message = expectThrows(IllegalStateException.class, () -> newPersistedClusterStateService(nodeEnvironment).loadBestOnDiskState()).getMessage();
assertThat(message, allOf(containsString("no global metadata found"), containsString(brokenPath.toString())));
}
}
use of org.apache.lucene.store.SimpleFSDirectory in project zm-mailbox by Zimbra.
the class LuceneDirectory method open.
/**
* Creates a new {@link LuceneDirectory} with {@code SingleInstanceLockFactory}.
* <p>
* You can switch Lucene's {@link FSDirectory} implementation by {@link LC#zimbra_index_lucene_io_impl}.
* <ul>
* <li>{@code null} -Lucene will try to pick the best {@link FSDirectory} implementation given the current
* environment. Currently this returns {@link MMapDirectory} for most Solaris and Windows 64-bit JREs,
* {@link NIOFSDirectory} for other non-Windows JREs, and {@link SimpleFSDirectory} for other JREs on Windows.
* <li>{@code simple} - straightforward implementation using java.io.RandomAccessFile. However, it has poor
* concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from
* the same file.
* <li>{@code nio} - uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading
* from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but
* on all other platforms this is the preferred choice.
* <li>{@code mmap} - uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual
* memory relative to your index size, eg if you are running on a 64 bit JRE, or you are running on a 32 bit
* JRE but your index sizes are small enough to fit into the virtual memory space. Java has currently the
* limitation of not being able to unmap files from user code. The files are unmapped, when GC releases the
* byte buffers. Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the
* underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some
* time later, will the file handle be closed. This will consume additional transient disk usage: on Windows,
* attempts to delete or overwrite the files will result in an exception; on other platforms, which typically
* have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming
* space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk
* space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be
* aware of. This class supplies a (possibly dangerous) workaround mentioned in the bug report, which may fail
* on non-Sun JVMs.
* </ul>
*
* @param path directory path
*/
public static LuceneDirectory open(File path) throws IOException {
String impl = LC.zimbra_index_lucene_io_impl.value();
FSDirectory dir;
if ("nio".equals(impl)) {
dir = new NIOFSDirectory(path, new SingleInstanceLockFactory());
} else if ("mmap".equals(impl)) {
dir = new MMapDirectory(path, new SingleInstanceLockFactory());
} else if ("simple".equals(impl)) {
dir = new SimpleFSDirectory(path, new SingleInstanceLockFactory());
} else {
dir = FSDirectory.open(path, new SingleInstanceLockFactory());
}
ZimbraLog.index.info("OpenLuceneIndex impl=%s,dir=%s", dir.getClass().getSimpleName(), path);
return new LuceneDirectory(dir);
}
use of org.apache.lucene.store.SimpleFSDirectory in project languagetool by languagetool-org.
the class Searcher method main.
public static void main(String[] args) throws Exception {
ensureCorrectUsageOrExit(args);
long startTime = System.currentTimeMillis();
String[] ruleIds = args[0].split(",");
String languageCode = args[1];
Language language = Languages.getLanguageForShortCode(languageCode);
File indexDir = new File(args[2]);
boolean limitSearch = !(args.length > 3 && "--no_limit".equals(args[3]));
Searcher searcher = new Searcher(new SimpleFSDirectory(indexDir.toPath()));
if (!limitSearch) {
searcher.setMaxHits(100_000);
}
searcher.limitSearch = limitSearch;
ContextTools contextTools = getContextTools(140);
int totalMatches = 0;
for (String ruleId : ruleIds) {
long ruleStartTime = System.currentTimeMillis();
for (PatternRule rule : searcher.getRuleById(ruleId, language)) {
System.out.println("===== " + rule.getFullId() + " =========================================================");
SearcherResult searcherResult = searcher.findRuleMatchesOnIndex(rule, language);
int i = 1;
if (searcherResult.getMatchingSentences().size() == 0) {
System.out.println("[no matches]");
}
for (MatchingSentence ruleMatch : searcherResult.getMatchingSentences()) {
for (RuleMatch match : ruleMatch.getRuleMatches()) {
String context = contextTools.getContext(match.getFromPos(), match.getToPos(), ruleMatch.getSentence());
if (WIKITEXT_OUTPUT) {
ContextTools contextTools2 = getContextTools(0);
String coveredText = contextTools2.getContext(match.getFromPos(), match.getToPos(), ruleMatch.getSentence());
coveredText = coveredText.replaceFirst("^\\.\\.\\.", "").replaceFirst("\\.\\.\\.$", "");
coveredText = coveredText.replaceFirst("^\\*\\*", "").replaceFirst("\\*\\*$", "");
String encodedTextWithQuotes = URLEncoder.encode("\"" + coveredText + "\"", "UTF-8");
String searchLink = "https://de.wikipedia.org/w/index.php?search=" + encodedTextWithQuotes + "&title=Spezial%3ASuche&go=Artikel";
context = context.replaceAll("\\*\\*.*?\\*\\*", "[" + searchLink + " " + coveredText + "]");
String encTitle = URLEncoder.encode(ruleMatch.getTitle(), "UTF-8");
String encodedText = URLEncoder.encode(coveredText, "UTF-8");
System.out.println("# [[" + ruleMatch.getTitle() + "]]: " + context + " ([http://wikipedia.ramselehof.de/wikiblame.php?user_lang=de&lang=de&project=wikipedia&article=" + encTitle + "&needle=" + encodedText + "&skipversions=0&ignorefirst=0&limit=500&searchmethod=int&order=desc&start=Start WikiBlame])");
} else {
System.out.println(i + ": " + context + " [" + ruleMatch.getSource() + "]");
}
}
totalMatches += ruleMatch.getRuleMatches().size();
i++;
}
System.out.println("Time: " + (System.currentTimeMillis() - ruleStartTime) + "ms");
}
}
System.out.println("Total time: " + (System.currentTimeMillis() - startTime) + "ms, " + totalMatches + " matches");
}
use of org.apache.lucene.store.SimpleFSDirectory in project elasticsearch by elastic.
the class HunspellService method loadDictionary.
/**
* Loads the hunspell dictionary for the given local.
*
* @param locale The locale of the hunspell dictionary to be loaded.
* @param nodeSettings The node level settings
* @param env The node environment (from which the conf path will be resolved)
* @return The loaded Hunspell dictionary
* @throws Exception when loading fails (due to IO errors or malformed dictionary files)
*/
private Dictionary loadDictionary(String locale, Settings nodeSettings, Environment env) throws Exception {
if (logger.isDebugEnabled()) {
logger.debug("Loading hunspell dictionary [{}]...", locale);
}
Path dicDir = hunspellDir.resolve(locale);
if (FileSystemUtils.isAccessibleDirectory(dicDir, logger) == false) {
throw new ElasticsearchException(String.format(Locale.ROOT, "Could not find hunspell dictionary [%s]", locale));
}
// merging node settings with hunspell dictionary specific settings
Settings dictSettings = HUNSPELL_DICTIONARY_OPTIONS.get(nodeSettings);
nodeSettings = loadDictionarySettings(dicDir, dictSettings.getByPrefix(locale + "."));
boolean ignoreCase = nodeSettings.getAsBoolean("ignore_case", defaultIgnoreCase);
Path[] affixFiles = FileSystemUtils.files(dicDir, "*.aff");
if (affixFiles.length == 0) {
throw new ElasticsearchException(String.format(Locale.ROOT, "Missing affix file for hunspell dictionary [%s]", locale));
}
if (affixFiles.length != 1) {
throw new ElasticsearchException(String.format(Locale.ROOT, "Too many affix files exist for hunspell dictionary [%s]", locale));
}
InputStream affixStream = null;
Path[] dicFiles = FileSystemUtils.files(dicDir, "*.dic");
List<InputStream> dicStreams = new ArrayList<>(dicFiles.length);
try {
for (int i = 0; i < dicFiles.length; i++) {
dicStreams.add(Files.newInputStream(dicFiles[i]));
}
affixStream = Files.newInputStream(affixFiles[0]);
try (Directory tmp = new SimpleFSDirectory(env.tmpFile())) {
return new Dictionary(tmp, "hunspell", affixStream, dicStreams, ignoreCase);
}
} catch (Exception e) {
logger.error((Supplier<?>) () -> new ParameterizedMessage("Could not load hunspell dictionary [{}]", locale), e);
throw e;
} finally {
IOUtils.close(affixStream);
IOUtils.close(dicStreams);
}
}
use of org.apache.lucene.store.SimpleFSDirectory in project elasticsearch by elastic.
the class Checkpoint method read.
public static Checkpoint read(Path path) throws IOException {
try (Directory dir = new SimpleFSDirectory(path.getParent())) {
try (IndexInput indexInput = dir.openInput(path.getFileName().toString(), IOContext.DEFAULT)) {
// We checksum the entire file before we even go and parse it. If it's corrupted we barf right here.
CodecUtil.checksumEntireFile(indexInput);
final int fileVersion = CodecUtil.checkHeader(indexInput, CHECKPOINT_CODEC, INITIAL_VERSION, CURRENT_VERSION);
if (fileVersion == INITIAL_VERSION) {
assert indexInput.length() == V1_FILE_SIZE : indexInput.length();
return Checkpoint.readCheckpointV5_0_0(indexInput);
} else {
assert fileVersion == CURRENT_VERSION : fileVersion;
assert indexInput.length() == FILE_SIZE : indexInput.length();
return Checkpoint.readCheckpointV6_0_0(indexInput);
}
}
}
}
Aggregations