Search in sources :

Example 66 with POIFSFileSystem

use of org.apache.poi.poifs.filesystem.POIFSFileSystem in project Gargoyle by callakrsos.

the class DocFileParser method DocFileContentParser.

public String DocFileContentParser(String fileName) {
    POIFSFileSystem fs = null;
    try {
        fs = new POIFSFileSystem(new FileInputStream(fileName));
        if (fileName.endsWith(".doc")) {
            HWPFDocument doc = new HWPFDocument(fs);
            WordExtractor we = new WordExtractor(doc);
            return we.getText();
        } else if (fileName.endsWith(".xls")) {
            ExcelExtractor ex = new ExcelExtractor(fs);
            ex.setFormulasNotResults(true);
            ex.setIncludeSheetNames(true);
            return ex.getText();
        } else if (fileName.endsWith(".ppt")) {
            PowerPointExtractor extractor = new PowerPointExtractor(fs);
            return extractor.getText();
        }
    } catch (Exception e) {
        LOGGER.debug("document file cant be indexed");
    }
    return "";
}
Also used : HWPFDocument(org.apache.poi.hwpf.HWPFDocument) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) ExcelExtractor(org.apache.poi.hssf.extractor.ExcelExtractor) PowerPointExtractor(org.apache.poi.hslf.extractor.PowerPointExtractor) FileInputStream(java.io.FileInputStream) WordExtractor(org.apache.poi.hwpf.extractor.WordExtractor)

Example 67 with POIFSFileSystem

use of org.apache.poi.poifs.filesystem.POIFSFileSystem in project OpenClinica by OpenClinica.

the class SpreadsheetPreview method main.

public static void main(String[] args) throws IOException {
    // Simple3.xls , Cancer_History5.xls , Can3.xls
    POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(new File("/Users/bruceperry/work/OpenClinica-Cancer-Demo-Study/Cancer_History5.xls")));
    HSSFWorkbook wb = new HSSFWorkbook(fs);
    SpreadsheetPreview prev = new SpreadsheetPreview();
    // createSectionsMap createItemsMap
    Map map = prev.createItemsOrSectionMap(wb, "sections");
    Map.Entry me;
    Map.Entry me2;
    for (Iterator iter = map.entrySet().iterator(); iter.hasNext(); ) {
        me = (Map.Entry) iter.next();
        Map mp = (Map) me.getValue();
    // logger.info(me.getKey() + ": " + me.getValue());
    }
}
Also used : POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) Iterator(java.util.Iterator) File(java.io.File) HashMap(java.util.HashMap) TreeMap(java.util.TreeMap) Map(java.util.Map) SortedMap(java.util.SortedMap) FileInputStream(java.io.FileInputStream) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook)

Example 68 with POIFSFileSystem

use of org.apache.poi.poifs.filesystem.POIFSFileSystem in project OpenClinica by OpenClinica.

the class SpreadsheetPreviewNw method main.

public static void main(String[] args) throws IOException {
    // Simple3.xls , Cancer_History5.xls , Can3.xls
    POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(new File("d:/23TestComma2.xls")));
    HSSFWorkbook wb = new HSSFWorkbook(fs);
    SpreadsheetPreviewNw spnw = new SpreadsheetPreviewNw();
    // createSectionsMap createItemsMap
    Map map = spnw.createCrfMetaObject(wb);
    // Map map2 = spnw.createItemsOrSectionMap(wb,"items");
    Map.Entry me;
    for (Iterator iter = map.entrySet().iterator(); iter.hasNext(); ) {
        me = (Map.Entry) iter.next();
        Map mp = (Map) me.getValue();
        logger.debug(me.getKey() + ": " + me.getValue());
    }
}
Also used : POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) Iterator(java.util.Iterator) File(java.io.File) HashMap(java.util.HashMap) TreeMap(java.util.TreeMap) Map(java.util.Map) SortedMap(java.util.SortedMap) FileInputStream(java.io.FileInputStream) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook)

Example 69 with POIFSFileSystem

use of org.apache.poi.poifs.filesystem.POIFSFileSystem in project h2o-2 by h2oai.

the class XlsParser method streamParse.

@Override
public DataOut streamParse(final InputStream is, final DataOut dout) throws Exception {
    _dout = dout;
    _firstRow = true;
    try {
        _fs = new POIFSFileSystem(is);
        MissingRecordAwareHSSFListener listener = new MissingRecordAwareHSSFListener(this);
        _formatListener = new FormatTrackingHSSFListener(listener);
        HSSFEventFactory factory = new HSSFEventFactory();
        HSSFRequest request = new HSSFRequest();
        request.addListenerForAllRecords(_formatListener);
        factory.processWorkbookEvents(request, _fs);
    } finally {
        try {
            is.close();
        } catch (IOException e) {
        }
    }
    return dout;
}
Also used : POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem)

Example 70 with POIFSFileSystem

use of org.apache.poi.poifs.filesystem.POIFSFileSystem in project OpenRefine by OpenRefine.

the class ExcelImporter method parseOneFile.

@Override
public void parseOneFile(Project project, ProjectMetadata metadata, ImportingJob job, String fileSource, InputStream inputStream, int limit, JSONObject options, List<Exception> exceptions) {
    Workbook wb = null;
    if (!inputStream.markSupported()) {
        inputStream = new PushbackInputStream(inputStream, 8);
    }
    try {
        wb = POIXMLDocument.hasOOXMLHeader(inputStream) ? new XSSFWorkbook(inputStream) : new HSSFWorkbook(new POIFSFileSystem(inputStream));
    } catch (IOException e) {
        exceptions.add(new ImportException("Attempted to parse as an Excel file but failed. " + "Try to use Excel to re-save the file as a different Excel version or as TSV and upload again.", e));
        return;
    } catch (ArrayIndexOutOfBoundsException e) {
        exceptions.add(new ImportException("Attempted to parse file as an Excel file but failed. " + "This is probably caused by a corrupt excel file, or due to the file having previously been created or saved by a non-Microsoft application. " + "Please try opening the file in Microsoft Excel and resaving it, then try re-uploading the file. " + "See https://issues.apache.org/bugzilla/show_bug.cgi?id=48261 for further details", e));
        return;
    } catch (IllegalArgumentException e) {
        exceptions.add(new ImportException("Attempted to parse as an Excel file but failed. " + "Only Excel 97 and later formats are supported.", e));
        return;
    } catch (POIXMLException e) {
        exceptions.add(new ImportException("Attempted to parse as an Excel file but failed. " + "Invalid XML.", e));
        return;
    }
    int[] sheets = JSONUtilities.getIntArray(options, "sheets");
    for (int sheetIndex : sheets) {
        final Sheet sheet = wb.getSheetAt(sheetIndex);
        final int lastRow = sheet.getLastRowNum();
        TableDataReader dataReader = new TableDataReader() {

            int nextRow = 0;

            Map<String, Recon> reconMap = new HashMap<String, Recon>();

            @Override
            public List<Object> getNextRowOfCells() throws IOException {
                if (nextRow > lastRow) {
                    return null;
                }
                List<Object> cells = new ArrayList<Object>();
                org.apache.poi.ss.usermodel.Row row = sheet.getRow(nextRow++);
                if (row != null) {
                    short lastCell = row.getLastCellNum();
                    for (short cellIndex = 0; cellIndex < lastCell; cellIndex++) {
                        Cell cell = null;
                        org.apache.poi.ss.usermodel.Cell sourceCell = row.getCell(cellIndex);
                        if (sourceCell != null) {
                            cell = extractCell(sourceCell, reconMap);
                        }
                        cells.add(cell);
                    }
                }
                return cells;
            }
        };
        TabularImportingParserBase.readTable(project, metadata, job, dataReader, fileSource + "#" + sheet.getSheetName(), limit, options, exceptions);
    }
}
Also used : ArrayList(java.util.ArrayList) POIXMLException(org.apache.poi.POIXMLException) PushbackInputStream(java.io.PushbackInputStream) XSSFWorkbook(org.apache.poi.xssf.usermodel.XSSFWorkbook) Cell(com.google.refine.model.Cell) IOException(java.io.IOException) XSSFWorkbook(org.apache.poi.xssf.usermodel.XSSFWorkbook) Workbook(org.apache.poi.ss.usermodel.Workbook) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook) HSSFWorkbook(org.apache.poi.hssf.usermodel.HSSFWorkbook) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) JSONObject(org.json.JSONObject) Sheet(org.apache.poi.ss.usermodel.Sheet) Recon(com.google.refine.model.Recon) HashMap(java.util.HashMap) Map(java.util.Map)

Aggregations

POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)121 Test (org.junit.Test)58 NPOIFSFileSystem (org.apache.poi.poifs.filesystem.NPOIFSFileSystem)38 InputStream (java.io.InputStream)36 ByteArrayInputStream (java.io.ByteArrayInputStream)33 ByteArrayOutputStream (java.io.ByteArrayOutputStream)33 FileInputStream (java.io.FileInputStream)31 File (java.io.File)25 OPOIFSFileSystem (org.apache.poi.poifs.filesystem.OPOIFSFileSystem)15 FileOutputStream (java.io.FileOutputStream)14 OutputStream (java.io.OutputStream)14 HSSFWorkbook (org.apache.poi.hssf.usermodel.HSSFWorkbook)13 DirectoryNode (org.apache.poi.poifs.filesystem.DirectoryNode)13 TempFile (org.apache.poi.util.TempFile)13 IOException (java.io.IOException)12 MutablePropertySet (org.apache.poi.hpsf.MutablePropertySet)7 MutableSection (org.apache.poi.hpsf.MutableSection)7 HashMap (java.util.HashMap)6 DocumentEntry (org.apache.poi.poifs.filesystem.DocumentEntry)6 NDocumentOutputStream (org.apache.poi.poifs.filesystem.NDocumentOutputStream)6