Search in sources :

Example 11 with ExcelExtractor

use of org.apache.poi.hssf.extractor.ExcelExtractor in project poi by apache.

the class TestBugs method bug51535.

/**
     * Large row numbers and NPOIFS vs POIFS
     */
@SuppressWarnings("resource")
@Test
public void bug51535() throws Exception {
    byte[] data = HSSFITestDataProvider.instance.getTestDataFileContent("51535.xls");
    HSSFWorkbook wbPOIFS = new HSSFWorkbook(new POIFSFileSystem(new ByteArrayInputStream(data)).getRoot(), false);
    HSSFWorkbook wbNPOIFS = new HSSFWorkbook(new NPOIFSFileSystem(new ByteArrayInputStream(data)).getRoot(), false);
    for (HSSFWorkbook wb : new HSSFWorkbook[] { wbPOIFS, wbNPOIFS }) {
        assertEquals(3, wb.getNumberOfSheets());
        // Check directly
        HSSFSheet s = wb.getSheetAt(0);
        assertEquals("Top Left Cell", s.getRow(0).getCell(0).getStringCellValue());
        assertEquals("Top Right Cell", s.getRow(0).getCell(255).getStringCellValue());
        assertEquals("Bottom Left Cell", s.getRow(65535).getCell(0).getStringCellValue());
        assertEquals("Bottom Right Cell", s.getRow(65535).getCell(255).getStringCellValue());
        // Extract and check
        ExcelExtractor ex = new ExcelExtractor(wb);
        String text = ex.getText();
        assertContains(text, "Top Left Cell");
        assertContains(text, "Top Right Cell");
        assertContains(text, "Bottom Left Cell");
        assertContains(text, "Bottom Right Cell");
        ex.close();
    }
}
Also used : NPOIFSFileSystem(org.apache.poi.poifs.filesystem.NPOIFSFileSystem) ByteArrayInputStream(java.io.ByteArrayInputStream) OPOIFSFileSystem(org.apache.poi.poifs.filesystem.OPOIFSFileSystem) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) NPOIFSFileSystem(org.apache.poi.poifs.filesystem.NPOIFSFileSystem) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) UnicodeString(org.apache.poi.hssf.record.common.UnicodeString) Test(org.junit.Test)

Example 12 with ExcelExtractor

use of org.apache.poi.hssf.extractor.ExcelExtractor in project poi by apache.

the class TestXSSFEventBasedExcelExtractor method testComparedToOLE2.

/**
	 * Test that we return pretty much the same as
	 *  ExcelExtractor does, when we're both passed
	 *  the same file, just saved as xls and xlsx
	 */
@Test
public void testComparedToOLE2() throws Exception {
    // A fairly simple file - ooxml
    XSSFEventBasedExcelExtractor ooxmlExtractor = getExtractor("SampleSS.xlsx");
    ExcelExtractor ole2Extractor = new ExcelExtractor(HSSFTestDataSamples.openSampleWorkbook("SampleSS.xls"));
    POITextExtractor[] extractors = new POITextExtractor[] { ooxmlExtractor, ole2Extractor };
    for (POITextExtractor extractor : extractors) {
        String text = extractor.getText().replaceAll("[\r\t]", "");
        assertStartsWith(text, "First Sheet\nTest spreadsheet\n2nd row2nd row 2nd column\n");
        Pattern pattern = Pattern.compile(".*13(\\.0+)?\\s+Sheet3.*", Pattern.DOTALL);
        Matcher m = pattern.matcher(text);
        assertTrue(m.matches());
    }
    ole2Extractor.close();
    ooxmlExtractor.close();
}
Also used : Pattern(java.util.regex.Pattern) POITextExtractor(org.apache.poi.POITextExtractor) Matcher(java.util.regex.Matcher) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) Test(org.junit.Test)

Example 13 with ExcelExtractor

use of org.apache.poi.hssf.extractor.ExcelExtractor in project poi by apache.

the class TestXSSFExcelExtractor method testComparedToOLE2.

/**
	 * Test that we return pretty much the same as
	 *  ExcelExtractor does, when we're both passed
	 *  the same file, just saved as xls and xlsx
	 */
public void testComparedToOLE2() throws IOException {
    // A fairly simple file - ooxml
    XSSFExcelExtractor ooxmlExtractor = getExtractor("SampleSS.xlsx");
    ExcelExtractor ole2Extractor = new ExcelExtractor(HSSFTestDataSamples.openSampleWorkbook("SampleSS.xls"));
    Map<String, POITextExtractor> extractors = new HashMap<String, POITextExtractor>();
    extractors.put("SampleSS.xlsx", ooxmlExtractor);
    extractors.put("SampleSS.xls", ole2Extractor);
    for (final Entry<String, POITextExtractor> e : extractors.entrySet()) {
        String filename = e.getKey();
        POITextExtractor extractor = e.getValue();
        String text = extractor.getText().replaceAll("[\r\t]", "");
        assertStartsWith(filename, text, "First Sheet\nTest spreadsheet\n2nd row2nd row 2nd column\n");
        Pattern pattern = Pattern.compile(".*13(\\.0+)?\\s+Sheet3.*", Pattern.DOTALL);
        Matcher m = pattern.matcher(text);
        assertTrue(filename, m.matches());
    }
    ole2Extractor.close();
    ooxmlExtractor.close();
}
Also used : Pattern(java.util.regex.Pattern) HashMap(java.util.HashMap) POITextExtractor(org.apache.poi.POITextExtractor) Matcher(java.util.regex.Matcher) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor)

Example 14 with ExcelExtractor

use of org.apache.poi.hssf.extractor.ExcelExtractor in project poi by apache.

the class TestExtractorFactory method testPreferEventBased.

@Test
public void testPreferEventBased() throws Exception {
    assertFalse(ExtractorFactory.getPreferEventExtractor());
    assertFalse(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    ExtractorFactory.setThreadPrefersEventExtractors(true);
    assertTrue(ExtractorFactory.getPreferEventExtractor());
    assertTrue(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    ExtractorFactory.setAllThreadsPreferEventExtractors(false);
    assertFalse(ExtractorFactory.getPreferEventExtractor());
    assertTrue(ExtractorFactory.getThreadPrefersEventExtractors());
    assertEquals(Boolean.FALSE, ExtractorFactory.getAllThreadsPreferEventExtractors());
    ExtractorFactory.setAllThreadsPreferEventExtractors(null);
    assertTrue(ExtractorFactory.getPreferEventExtractor());
    assertTrue(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    // Check we get the right extractors now
    POITextExtractor extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor instanceof EventBasedExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString(), PackageAccess.READ));
    assertTrue(extractor instanceof XSSFEventBasedExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString(), PackageAccess.READ));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
    // Put back to normal
    ExtractorFactory.setThreadPrefersEventExtractors(false);
    assertFalse(ExtractorFactory.getPreferEventExtractor());
    assertFalse(ExtractorFactory.getThreadPrefersEventExtractors());
    assertNull(ExtractorFactory.getAllThreadsPreferEventExtractors());
    // And back
    extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor instanceof ExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls)));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString(), PackageAccess.READ));
    assertTrue(extractor instanceof XSSFExcelExtractor);
    extractor.close();
    extractor = ExtractorFactory.createExtractor(OPCPackage.open(xlsx.toString()));
    assertTrue(extractor.getText().length() > 200);
    extractor.close();
}
Also used : XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) POITextExtractor(org.apache.poi.POITextExtractor) XSSFExcelExtractor(org.apache.poi.xssf.extractor.XSSFExcelExtractor) OPOIFSFileSystem(org.apache.poi.poifs.filesystem.OPOIFSFileSystem) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) XSSFExcelExtractor(org.apache.poi.xssf.extractor.XSSFExcelExtractor) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor) FileInputStream(java.io.FileInputStream) XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor) Test(org.junit.Test)

Example 15 with ExcelExtractor

use of org.apache.poi.hssf.extractor.ExcelExtractor in project poi by apache.

the class TestExtractorFactory method testPOIFS.

@Test
public void testPOIFS() throws Exception {
    // Excel
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls))) instanceof ExcelExtractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(xls))).getText().length() > 200);
    // Word
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(doc))) instanceof WordExtractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(doc))).getText().length() > 120);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(doc6))) instanceof Word6Extractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(doc6))).getText().length() > 20);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(doc95))) instanceof Word6Extractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(doc95))).getText().length() > 120);
    // PowerPoint
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(ppt))) instanceof PowerPointExtractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(ppt))).getText().length() > 120);
    // Visio
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(vsd))) instanceof VisioTextExtractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(vsd))).getText().length() > 50);
    // Publisher
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(pub))) instanceof PublisherTextExtractor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(pub))).getText().length() > 50);
    // Outlook msg
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(msg))) instanceof OutlookTextExtactor);
    assertTrue(ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(msg))).getText().length() > 50);
    // Text
    try {
        ExtractorFactory.createExtractor(new POIFSFileSystem(new FileInputStream(txt)));
        fail();
    } catch (IOException e) {
    // Good
    }
}
Also used : OutlookTextExtactor(org.apache.poi.hsmf.extractor.OutlookTextExtactor) XSSFExcelExtractor(org.apache.poi.xssf.extractor.XSSFExcelExtractor) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) XSSFEventBasedExcelExtractor(org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor) EventBasedExcelExtractor(org.apache.poi.hssf.extractor.EventBasedExcelExtractor) OPOIFSFileSystem(org.apache.poi.poifs.filesystem.OPOIFSFileSystem) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) Word6Extractor(org.apache.poi.hwpf.extractor.Word6Extractor) PowerPointExtractor(org.apache.poi.hslf.extractor.PowerPointExtractor) XSLFPowerPointExtractor(org.apache.poi.xslf.extractor.XSLFPowerPointExtractor) PublisherTextExtractor(org.apache.poi.hpbf.extractor.PublisherTextExtractor) IOException(java.io.IOException) VisioTextExtractor(org.apache.poi.hdgf.extractor.VisioTextExtractor) FileInputStream(java.io.FileInputStream) WordExtractor(org.apache.poi.hwpf.extractor.WordExtractor) XWPFWordExtractor(org.apache.poi.xwpf.extractor.XWPFWordExtractor) Test(org.junit.Test)

Aggregations

ExcelExtractor (org.apache.poi.hssf.extractor.ExcelExtractor)18 XSSFExcelExtractor (org.apache.poi.xssf.extractor.XSSFExcelExtractor)10 POITextExtractor (org.apache.poi.POITextExtractor)9 EventBasedExcelExtractor (org.apache.poi.hssf.extractor.EventBasedExcelExtractor)8 WordExtractor (org.apache.poi.hwpf.extractor.WordExtractor)8 Test (org.junit.Test)8 IOException (java.io.IOException)7 PowerPointExtractor (org.apache.poi.hslf.extractor.PowerPointExtractor)7 XSSFEventBasedExcelExtractor (org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor)7 XWPFWordExtractor (org.apache.poi.xwpf.extractor.XWPFWordExtractor)7 OutlookTextExtactor (org.apache.poi.hsmf.extractor.OutlookTextExtactor)6 POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)6 XSLFPowerPointExtractor (org.apache.poi.xslf.extractor.XSLFPowerPointExtractor)6 FileInputStream (java.io.FileInputStream)5 VisioTextExtractor (org.apache.poi.hdgf.extractor.VisioTextExtractor)4 PublisherTextExtractor (org.apache.poi.hpbf.extractor.PublisherTextExtractor)4 Word6Extractor (org.apache.poi.hwpf.extractor.Word6Extractor)4 InputStream (java.io.InputStream)3 OPOIFSFileSystem (org.apache.poi.poifs.filesystem.OPOIFSFileSystem)3 ByteArrayInputStream (java.io.ByteArrayInputStream)2