Search in sources :

Example 11 with IndexDocument

use of org.wso2.carbon.registry.indexing.solr.IndexDocument in project carbon-apimgt by wso2.

the class XMLIndexer method getIndexedDocument.

public IndexDocument getIndexedDocument(File2Index fileData) throws SolrException, RegistryException {
    // we register both the content as it is and only text content
    String xmlAsStr = RegistryUtils.decodeBytes(fileData.data);
    final StringBuilder contentOnly = new StringBuilder();
    IndexDocument indexDocument = new IndexDocument(fileData.path, xmlAsStr, contentOnly.toString());
    Map<String, List<String>> attributes = new HashMap<String, List<String>>();
    attributes.put("path", Arrays.asList(fileData.path));
    if (fileData.mediaType != null) {
        attributes.put(IndexingConstants.FIELD_MEDIA_TYPE, Arrays.asList(fileData.mediaType));
    }
    indexDocument.setFields(attributes);
    return indexDocument;
}
Also used : IndexDocument(org.wso2.carbon.registry.indexing.solr.IndexDocument) HashMap(java.util.HashMap) List(java.util.List)

Example 12 with IndexDocument

use of org.wso2.carbon.registry.indexing.solr.IndexDocument in project carbon-apimgt by wso2.

the class WSDLIndexer method getIndexedDocument.

@Override
public IndexDocument getIndexedDocument(File2Index fileData) throws SolrException, RegistryException {
    if (log.isDebugEnabled()) {
        log.debug("Indexing wsdl");
    }
    String xmlAsStr = RegistryUtils.decodeBytes(fileData.data);
    if (log.isDebugEnabled()) {
        log.debug("Indexing string " + xmlAsStr);
    }
    final StringBuilder contentOnly = new StringBuilder();
    IndexDocument indexDocument = new IndexDocument(fileData.path, xmlAsStr, contentOnly.toString());
    Map<String, List<String>> attributes = new HashMap<String, List<String>>();
    attributes.put("path", Arrays.asList(fileData.path));
    if (fileData.mediaType != null) {
        attributes.put(IndexingConstants.FIELD_MEDIA_TYPE, Arrays.asList(fileData.mediaType));
    }
    indexDocument.setFields(attributes);
    return indexDocument;
}
Also used : IndexDocument(org.wso2.carbon.registry.indexing.solr.IndexDocument) HashMap(java.util.HashMap) List(java.util.List)

Example 13 with IndexDocument

use of org.wso2.carbon.registry.indexing.solr.IndexDocument in project carbon-apimgt by wso2.

the class MSPowerpointIndexerTest method testShouldReturnIndexedDocumentWhenParameterCorrect.

@Test
public void testShouldReturnIndexedDocumentWhenParameterCorrect() throws Exception {
    POIFSFileSystem ppExtractor = Mockito.mock(POIFSFileSystem.class);
    PowerPointExtractor powerPointExtractor = Mockito.mock(PowerPointExtractor.class);
    XSLFPowerPointExtractor xslfExtractor = Mockito.mock(XSLFPowerPointExtractor.class);
    XMLSlideShow xmlSlideShow = Mockito.mock(XMLSlideShow.class);
    PowerMockito.whenNew(POIFSFileSystem.class).withParameterTypes(InputStream.class).withArguments(Mockito.any(InputStream.class)).thenThrow(OfficeXmlFileException.class).thenReturn(ppExtractor).thenThrow(APIManagementException.class);
    PowerMockito.whenNew(PowerPointExtractor.class).withParameterTypes(POIFSFileSystem.class).withArguments(ppExtractor).thenReturn(powerPointExtractor);
    PowerMockito.whenNew(XMLSlideShow.class).withParameterTypes(InputStream.class).withArguments(Mockito.any()).thenReturn(xmlSlideShow);
    PowerMockito.whenNew(XSLFPowerPointExtractor.class).withArguments(xmlSlideShow).thenReturn(xslfExtractor);
    Mockito.when(powerPointExtractor.getText()).thenReturn("");
    Mockito.when(xslfExtractor.getText()).thenReturn("");
    MSPowerpointIndexer indexer = new MSPowerpointIndexer();
    IndexDocument ppDoc = indexer.getIndexedDocument(file2Index);
    // should return the default media type when media type is not defined in file2Index
    if (!"application/vnd.ms-powerpoint".equals(ppDoc.getFields().get(IndexingConstants.FIELD_MEDIA_TYPE).get(0))) {
        Assert.fail();
    }
    // should return the media type we have set in the file2Index
    file2Index.mediaType = "text/html";
    ppDoc = indexer.getIndexedDocument(file2Index);
    if (!"text/html".equals(ppDoc.getFields().get(IndexingConstants.FIELD_MEDIA_TYPE).get(0))) {
        Assert.fail();
    }
    // should return the media type we have set in the file2Index even if exception occurred while reading the file
    ppDoc = indexer.getIndexedDocument(file2Index);
    if (!"text/html".equals(ppDoc.getFields().get(IndexingConstants.FIELD_MEDIA_TYPE).get(0))) {
        Assert.fail();
    }
}
Also used : IndexDocument(org.wso2.carbon.registry.indexing.solr.IndexDocument) XSLFPowerPointExtractor(org.apache.poi.xslf.extractor.XSLFPowerPointExtractor) POIFSFileSystem(org.apache.poi.poifs.filesystem.POIFSFileSystem) PowerPointExtractor(org.apache.poi.hslf.extractor.PowerPointExtractor) XSLFPowerPointExtractor(org.apache.poi.xslf.extractor.XSLFPowerPointExtractor) XMLSlideShow(org.apache.poi.xslf.usermodel.XMLSlideShow) Test(org.junit.Test) PrepareForTest(org.powermock.core.classloader.annotations.PrepareForTest)

Example 14 with IndexDocument

use of org.wso2.carbon.registry.indexing.solr.IndexDocument in project carbon-apimgt by wso2.

the class WSDLIndexerTest method testShouldReturnIndexedDocumentWhenParameterCorrect.

@Test
public void testShouldReturnIndexedDocumentWhenParameterCorrect() throws RegistryException {
    String mediaType = "application/wsdl";
    final String MEDIA_TYPE = "mediaType";
    AsyncIndexer.File2Index file2Index = new AsyncIndexer.File2Index("".getBytes(), null, "", -1234, "");
    WSDLIndexer indexer = new WSDLIndexer();
    // should return the default media type when media type is not defined in file2Index
    IndexDocument xml = indexer.getIndexedDocument(file2Index);
    if (xml.getFields().get(MEDIA_TYPE) != null) {
        Assert.fail();
    }
    // should return the media type we have set in the file2Index
    file2Index.mediaType = mediaType;
    xml = indexer.getIndexedDocument(file2Index);
    if (!mediaType.equals(xml.getFields().get(MEDIA_TYPE).get(0))) {
        Assert.fail();
    }
}
Also used : IndexDocument(org.wso2.carbon.registry.indexing.solr.IndexDocument) AsyncIndexer(org.wso2.carbon.registry.indexing.AsyncIndexer) Test(org.junit.Test)

Example 15 with IndexDocument

use of org.wso2.carbon.registry.indexing.solr.IndexDocument in project carbon-apimgt by wso2.

the class PlainTextIndexerTest method testShouldReturnIndexedDocumentWhenParameterCorrect.

@Test
public void testShouldReturnIndexedDocumentWhenParameterCorrect() throws RegistryException {
    String mediaType = "text/txt";
    final String MEDIA_TYPE = "mediaType";
    AsyncIndexer.File2Index file2Index = new AsyncIndexer.File2Index("".getBytes(), null, "", -1234, "");
    PlainTextIndexer indexer = new PlainTextIndexer();
    // should return the default media type when media type is not defined in file2Index
    IndexDocument text = indexer.getIndexedDocument(file2Index);
    if (!"text/(.)".equals(text.getFields().get(MEDIA_TYPE).get(0))) {
        Assert.fail();
    }
    // should return the media type we have set in the file2Index
    file2Index.mediaType = mediaType;
    text = indexer.getIndexedDocument(file2Index);
    if (!mediaType.equals(text.getFields().get(MEDIA_TYPE).get(0))) {
        Assert.fail();
    }
}
Also used : IndexDocument(org.wso2.carbon.registry.indexing.solr.IndexDocument) AsyncIndexer(org.wso2.carbon.registry.indexing.AsyncIndexer) Test(org.junit.Test)

Aggregations

IndexDocument (org.wso2.carbon.registry.indexing.solr.IndexDocument)15 List (java.util.List)9 HashMap (java.util.HashMap)7 Test (org.junit.Test)6 IOException (java.io.IOException)5 POIFSFileSystem (org.apache.poi.poifs.filesystem.POIFSFileSystem)4 SolrException (org.apache.solr.common.SolrException)4 OfficeXmlFileException (org.apache.poi.poifs.filesystem.OfficeXmlFileException)3 AsyncIndexer (org.wso2.carbon.registry.indexing.AsyncIndexer)3 ByteArrayInputStream (java.io.ByteArrayInputStream)2 COSDocument (org.apache.pdfbox.cos.COSDocument)2 PDFParser (org.apache.pdfbox.pdfparser.PDFParser)2 PDDocument (org.apache.pdfbox.pdmodel.PDDocument)2 PDFTextStripper (org.apache.pdfbox.text.PDFTextStripper)2 PowerPointExtractor (org.apache.poi.hslf.extractor.PowerPointExtractor)2 WordExtractor (org.apache.poi.hwpf.extractor.WordExtractor)2 XSLFPowerPointExtractor (org.apache.poi.xslf.extractor.XSLFPowerPointExtractor)2 XMLSlideShow (org.apache.poi.xslf.usermodel.XMLSlideShow)2 XWPFWordExtractor (org.apache.poi.xwpf.extractor.XWPFWordExtractor)2 XWPFDocument (org.apache.poi.xwpf.usermodel.XWPFDocument)2