Search in sources :

Example 16 with POITextExtractor

use of org.apache.poi.POITextExtractor in project poi by apache.

the class TestZipPackage method testZipEntityExpansionSharedStringTable.

@Test
public void testZipEntityExpansionSharedStringTable() throws Exception {
    Workbook wb = WorkbookFactory.create(XSSFTestDataSamples.openSamplePackage("poc-shared-strings.xlsx"));
    wb.close();
    POITextExtractor extractor = ExtractorFactory.createExtractor(HSSFTestDataSamples.getSampleFile("poc-shared-strings.xlsx"));
    try {
        assertNotNull(extractor);
        try {
            extractor.getText();
        } catch (IllegalStateException e) {
        // expected due to shared strings expansion
        }
    } finally {
        extractor.close();
    }
}
Also used : POITextExtractor(org.apache.poi.POITextExtractor) Workbook(org.apache.poi.ss.usermodel.Workbook) Test(org.junit.Test)

Example 17 with POITextExtractor

use of org.apache.poi.POITextExtractor in project poi by apache.

the class TestXSSFEventBasedExcelExtractor method testComparedToOLE2.

/**
	 * Test that we return pretty much the same as
	 *  ExcelExtractor does, when we're both passed
	 *  the same file, just saved as xls and xlsx
	 */
@Test
public void testComparedToOLE2() throws Exception {
    // A fairly simple file - ooxml
    XSSFEventBasedExcelExtractor ooxmlExtractor = getExtractor("SampleSS.xlsx");
    ExcelExtractor ole2Extractor = new ExcelExtractor(HSSFTestDataSamples.openSampleWorkbook("SampleSS.xls"));
    POITextExtractor[] extractors = new POITextExtractor[] { ooxmlExtractor, ole2Extractor };
    for (POITextExtractor extractor : extractors) {
        String text = extractor.getText().replaceAll("[\r\t]", "");
        assertStartsWith(text, "First Sheet\nTest spreadsheet\n2nd row2nd row 2nd column\n");
        Pattern pattern = Pattern.compile(".*13(\\.0+)?\\s+Sheet3.*", Pattern.DOTALL);
        Matcher m = pattern.matcher(text);
        assertTrue(m.matches());
    }
    ole2Extractor.close();
    ooxmlExtractor.close();
}
Also used : Pattern(java.util.regex.Pattern) POITextExtractor(org.apache.poi.POITextExtractor) Matcher(java.util.regex.Matcher) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) Test(org.junit.Test)

Example 18 with POITextExtractor

use of org.apache.poi.POITextExtractor in project poi by apache.

the class TestXSSFExcelExtractor method testComparedToOLE2.

/**
	 * Test that we return pretty much the same as
	 *  ExcelExtractor does, when we're both passed
	 *  the same file, just saved as xls and xlsx
	 */
public void testComparedToOLE2() throws IOException {
    // A fairly simple file - ooxml
    XSSFExcelExtractor ooxmlExtractor = getExtractor("SampleSS.xlsx");
    ExcelExtractor ole2Extractor = new ExcelExtractor(HSSFTestDataSamples.openSampleWorkbook("SampleSS.xls"));
    Map<String, POITextExtractor> extractors = new HashMap<String, POITextExtractor>();
    extractors.put("SampleSS.xlsx", ooxmlExtractor);
    extractors.put("SampleSS.xls", ole2Extractor);
    for (final Entry<String, POITextExtractor> e : extractors.entrySet()) {
        String filename = e.getKey();
        POITextExtractor extractor = e.getValue();
        String text = extractor.getText().replaceAll("[\r\t]", "");
        assertStartsWith(filename, text, "First Sheet\nTest spreadsheet\n2nd row2nd row 2nd column\n");
        Pattern pattern = Pattern.compile(".*13(\\.0+)?\\s+Sheet3.*", Pattern.DOTALL);
        Matcher m = pattern.matcher(text);
        assertTrue(filename, m.matches());
    }
    ole2Extractor.close();
    ooxmlExtractor.close();
}
Also used : Pattern(java.util.regex.Pattern) HashMap(java.util.HashMap) POITextExtractor(org.apache.poi.POITextExtractor) Matcher(java.util.regex.Matcher) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor)

Aggregations

POITextExtractor (org.apache.poi.POITextExtractor)18 Test (org.junit.Test)11 ExcelExtractor (org.apache.poi.hssf.extractor.ExcelExtractor)9 EventBasedExcelExtractor (org.apache.poi.hssf.extractor.EventBasedExcelExtractor)6 XSSFEventBasedExcelExtractor (org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor)5 XSSFExcelExtractor (org.apache.poi.xssf.extractor.XSSFExcelExtractor)5 FileInputStream (java.io.FileInputStream)4 InputStream (java.io.InputStream)4 OutlookTextExtactor (org.apache.poi.hsmf.extractor.OutlookTextExtactor)4 WordExtractor (org.apache.poi.hwpf.extractor.WordExtractor)4 XWPFWordExtractor (org.apache.poi.xwpf.extractor.XWPFWordExtractor)4 IOException (java.io.IOException)3 PowerPointExtractor (org.apache.poi.hslf.extractor.PowerPointExtractor)3 XSLFPowerPointExtractor (org.apache.poi.xslf.extractor.XSLFPowerPointExtractor)3 Method (java.lang.reflect.Method)2 ArrayList (java.util.ArrayList)2 Matcher (java.util.regex.Matcher)2 Pattern (java.util.regex.Pattern)2 POIOLE2TextExtractor (org.apache.poi.POIOLE2TextExtractor)2 POIXMLException (org.apache.poi.POIXMLException)2