Search in sources :

Example 1 with EventBasedExcelExtractor

use of org.apache.poi.hssf.extractor.EventBasedExcelExtractor in project poi by apache.

the class OLE2ExtractorFactory method createExtractor.

/**
     * Create the Extractor, if possible. Generally needs the Scratchpad jar.
     * Note that this won't check for embedded OOXML resources either, use
     *  {@link org.apache.poi.extractor.ExtractorFactory} for that.
     */
public static POITextExtractor createExtractor(DirectoryNode poifsDir) throws IOException {
    // out from
    for (String workbookName : WORKBOOK_DIR_ENTRY_NAMES) {
        if (poifsDir.hasEntry(workbookName)) {
            if (getPreferEventExtractor()) {
                return new EventBasedExcelExtractor(poifsDir);
            }
            return new ExcelExtractor(poifsDir);
        }
    }
    if (poifsDir.hasEntry(OLD_WORKBOOK_DIR_ENTRY_NAME)) {
        throw new OldExcelFormatException("Old Excel Spreadsheet format (1-95) " + "found. Please call OldExcelExtractor directly for basic text extraction");
    }
    // Ask Scratchpad, or fail trying
    Class<?> cls = getScratchpadClass();
    try {
        Method m = cls.getDeclaredMethod("createExtractor", DirectoryNode.class);
        POITextExtractor ext = (POITextExtractor) m.invoke(null, poifsDir);
        if (ext != null)
            return ext;
    } catch (IllegalArgumentException iae) {
        throw iae;
    } catch (Exception e) {
        throw new IllegalArgumentException("Error creating Scratchpad Extractor", e);
    }
    throw new IllegalArgumentException("No supported documents found in the OLE2 stream");
}
Also used : POITextExtractor(org.apache.poi.POITextExtractor) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor) Method(java.lang.reflect.Method) OldExcelFormatException(org.apache.poi.hssf.OldExcelFormatException) IOException(java.io.IOException) OldExcelFormatException(org.apache.poi.hssf.OldExcelFormatException) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor)

Example 2 with EventBasedExcelExtractor

use of org.apache.poi.hssf.extractor.EventBasedExcelExtractor in project poi by apache.

the class TestExtractorFactory method testPreferEventBased.

@Test
public void testPreferEventBased() throws Exception {
    assertFalse(ExtractorFactory.getPreferEventExtractor());
    assertFalse(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    ExtractorFactory.setThreadPrefersEventExtractors(true);
    assertTrue(ExtractorFactory.getPreferEventExtractor());
    assertTrue(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    ExtractorFactory.setAllThreadsPreferEventExtractors(false);
    assertFalse(ExtractorFactory.getPreferEventExtractor());
    assertTrue(ExtractorFactory.getThreadPrefersEventExtractors());
    assertEquals(Boolean.FALSE, ExtractorFactory.getAllThreadsPreferEventExtractors());
    ExtractorFactory.setAllThreadsPreferEventExtractors(null);
    assertTrue(ExtractorFactory.getPreferEventExtractor());
    assertTrue(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    // Check we get the right extractors now
    POITextExtractor extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor instanceof EventBasedExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString(), PackageAccess.READ));
    assertTrue(extractor instanceof XSSFEventBasedExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString(), PackageAccess.READ));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
    // Put back to normal
    ExtractorFactory.setThreadPrefersEventExtractors(false);
    assertFalse(ExtractorFactory.getPreferEventExtractor());
    assertFalse(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    // And back
    extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor instanceof ExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString(), PackageAccess.READ));
    assertTrue(extractor instanceof XSSFExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString()));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
}
Also used : XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) POITextExtractor(org.apache.poi.POITextExtractor) XSSFExcelExtractor(org.apache.poi.xssf.extractor.XSSFExcelExtractor) OPOIFSFileSystem(org.apache.poi.poifs.filesystem.OPOIFSFileSystem) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) XSSFExcelExtractor(org.apache.poi.xssf.extractor.XSSFExcelExtractor) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor) FileInputStream(java.io.FileInputStream) XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor) Test(org.junit.Test)

Aggregations

POITextExtractor (org.apache.poi.POITextExtractor)2 EventBasedExcelExtractor (org.apache.poi.hssf.extractor.EventBasedExcelExtractor)2 ExcelExtractor (org.apache.poi.hssf.extractor.ExcelExtractor)2 FileInputStream (java.io.FileInputStream)1 IOException (java.io.IOException)1 Method (java.lang.reflect.Method)1 OldExcelFormatException (org.apache.poi.hssf.OldExcelFormatException)1 OPOIFSFileSystem (org.apache.poi.poifs.filesystem.OPOIFSFileSystem)1 POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)1 XSSFEventBasedExcelExtractor (org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor)1 XSSFExcelExtractor (org.apache.poi.xssf.extractor.XSSFExcelExtractor)1 Test (org.junit.Test)1