Search in sources :

Example 6 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project intellij-community by JetBrains.

the class ExportTestResultsAction method getOutputText.

@Nullable
private String getOutputText(ExportTestResultsConfiguration config) throws IOException, TransformerException, SAXException {
    ExportTestResultsConfiguration.ExportFormat exportFormat = config.getExportFormat();
    SAXTransformerFactory transformerFactory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    TransformerHandler handler;
    if (exportFormat == ExportTestResultsConfiguration.ExportFormat.Xml) {
        handler = transformerFactory.newTransformerHandler();
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        handler.getTransformer().setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
    } else {
        Source xslSource;
        if (config.getExportFormat() == ExportTestResultsConfiguration.ExportFormat.BundledTemplate) {
            URL bundledXsltUrl = getClass().getResource("intellij-export.xsl");
            xslSource = new StreamSource(URLUtil.openStream(bundledXsltUrl));
        } else {
            File xslFile = new File(config.getUserTemplatePath());
            if (!xslFile.isFile()) {
                showBalloon(myRunConfiguration.getProject(), MessageType.ERROR, ExecutionBundle.message("export.test.results.custom.template.not.found", xslFile.getPath()), null);
                return null;
            }
            xslSource = new StreamSource(xslFile);
        }
        handler = transformerFactory.newTransformerHandler(xslSource);
        handler.getTransformer().setParameter("TITLE", ExecutionBundle.message("export.test.results.filename", myRunConfiguration.getName(), myRunConfiguration.getType().getDisplayName()));
    }
    StringWriter w = new StringWriter();
    handler.setResult(new StreamResult(w));
    try {
        TestResultsXmlFormatter.execute(myModel.getRoot(), myRunConfiguration, myModel.getProperties(), handler);
    } catch (ProcessCanceledException e) {
        return null;
    }
    return w.toString();
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StringWriter(java.io.StringWriter) StreamResult(javax.xml.transform.stream.StreamResult) StreamSource(javax.xml.transform.stream.StreamSource) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) VirtualFile(com.intellij.openapi.vfs.VirtualFile) File(java.io.File) StreamSource(javax.xml.transform.stream.StreamSource) Source(javax.xml.transform.Source) URL(java.net.URL) Nullable(org.jetbrains.annotations.Nullable)

Example 7 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project tika by apache.

the class TikaResource method produceOutput.

private StreamingOutput produceOutput(final InputStream is, final MultivaluedMap<String, String> httpHeaders, final UriInfo info, final String format) {
    final Parser parser = createParser();
    final Metadata metadata = new Metadata();
    final ParseContext context = new ParseContext();
    fillMetadata(parser, metadata, context, httpHeaders);
    fillParseContext(context, httpHeaders, parser);
    logRequest(LOG, info, metadata);
    return new StreamingOutput() {

        public void write(OutputStream outputStream) throws IOException, WebApplicationException {
            Writer writer = new OutputStreamWriter(outputStream, UTF_8);
            ContentHandler content;
            try {
                SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
                TransformerHandler handler = factory.newTransformerHandler();
                handler.getTransformer().setOutputProperty(OutputKeys.METHOD, format);
                handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
                handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, UTF_8.name());
                handler.setResult(new StreamResult(writer));
                content = new ExpandedTitleContentHandler(handler);
            } catch (TransformerConfigurationException e) {
                throw new WebApplicationException(e);
            }
            parse(parser, LOG, info.getPath(), is, content, metadata, context);
        }
    };
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StreamResult(javax.xml.transform.stream.StreamResult) TransformerConfigurationException(javax.xml.transform.TransformerConfigurationException) WebApplicationException(javax.ws.rs.WebApplicationException) OutputStream(java.io.OutputStream) Metadata(org.apache.tika.metadata.Metadata) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) StreamingOutput(javax.ws.rs.core.StreamingOutput) BoilerpipeContentHandler(org.apache.tika.parser.html.BoilerpipeContentHandler) ExpandedTitleContentHandler(org.apache.tika.sax.ExpandedTitleContentHandler) BodyContentHandler(org.apache.tika.sax.BodyContentHandler) ContentHandler(org.xml.sax.ContentHandler) RichTextContentHandler(org.apache.tika.sax.RichTextContentHandler) ExpandedTitleContentHandler(org.apache.tika.sax.ExpandedTitleContentHandler) Parser(org.apache.tika.parser.Parser) HtmlParser(org.apache.tika.parser.html.HtmlParser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) DigestingParser(org.apache.tika.parser.DigestingParser) ParseContext(org.apache.tika.parser.ParseContext) OutputStreamWriter(java.io.OutputStreamWriter) Writer(java.io.Writer) OutputStreamWriter(java.io.OutputStreamWriter)

Example 8 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project tika by apache.

the class TikaCLI method getTransformerHandler.

/**
     * Returns a transformer handler that serializes incoming SAX events
     * to XHTML or HTML (depending the given method) using the given output
     * encoding.
     *
     * @see <a href="https://issues.apache.org/jira/browse/TIKA-277">TIKA-277</a>
     * @param output output stream
     * @param method "xml" or "html"
     * @param encoding output encoding,
     *                 or <code>null</code> for the platform default
     * @return {@link System#out} transformer handler
     * @throws TransformerConfigurationException
     *         if the transformer can not be created
     */
private static TransformerHandler getTransformerHandler(OutputStream output, String method, String encoding, boolean prettyPrint) throws TransformerConfigurationException {
    SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, method);
    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, prettyPrint ? "yes" : "no");
    if (encoding != null) {
        handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, encoding);
    }
    handler.setResult(new StreamResult(output));
    return handler;
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StreamResult(javax.xml.transform.stream.StreamResult) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory)

Example 9 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project tika by apache.

the class OOXMLParserTest method testEmbeddedPDF.

// TIKA-989:
@Test
public void testEmbeddedPDF() throws Exception {
    Metadata metadata = new Metadata();
    StringWriter sw = new StringWriter();
    SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
    handler.setResult(new StreamResult(sw));
    try (InputStream input = OOXMLParserTest.class.getResourceAsStream("/test-documents/testWORD_embedded_pdf.docx")) {
        new OOXMLParser().parse(input, handler, metadata, new ParseContext());
    }
    String xml = sw.toString();
    int i = xml.indexOf("Here is the pdf file:");
    int j = xml.indexOf("<div class=\"embedded\" id=\"rId5\"/>");
    int k = xml.indexOf("Bye Bye");
    int l = xml.indexOf("<div class=\"embedded\" id=\"rId6\"/>");
    int m = xml.indexOf("Bye for real.");
    assertTrue(i != -1);
    assertTrue(j != -1);
    assertTrue(k != -1);
    assertTrue(l != -1);
    assertTrue(m != -1);
    assertTrue(i < j);
    assertTrue(j < k);
    assertTrue(k < l);
    assertTrue(l < m);
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StringWriter(java.io.StringWriter) StreamResult(javax.xml.transform.stream.StreamResult) TikaInputStream(org.apache.tika.io.TikaInputStream) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) ParseContext(org.apache.tika.parser.ParseContext) ExcelParserTest(org.apache.tika.parser.microsoft.ExcelParserTest) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest) WordParserTest(org.apache.tika.parser.microsoft.WordParserTest)

Example 10 with SAXTransformerFactory

use of javax.xml.transform.sax.SAXTransformerFactory in project tika by apache.

the class OutlookParserTest method testOutlookHTMLVersion.

@Test
public void testOutlookHTMLVersion() throws Exception {
    Parser parser = new AutoDetectParser();
    Metadata metadata = new Metadata();
    // Check the HTML version
    StringWriter sw = new StringWriter();
    SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
    handler.setResult(new StreamResult(sw));
    try (InputStream stream = OutlookParserTest.class.getResourceAsStream("/test-documents/testMSG_chinese.msg")) {
        parser.parse(stream, handler, metadata, new ParseContext());
    }
    // As the HTML version should have been processed, ensure
    //  we got some of the links
    String content = sw.toString();
    assertContains("<dd>tests.chang@fengttt.com</dd>", content);
    assertContains("<p>Alfresco MSG format testing", content);
    assertContains("<li>1", content);
    assertContains("<li>2", content);
    // Make sure we don't have nested html docs
    assertEquals(2, content.split("<body>").length);
    assertEquals(2, content.split("<\\/body>").length);
    // Make sure that the Chinese actually came through
    assertContains("張毓倫", metadata.get(TikaCoreProperties.CREATOR));
    assertContains("陳惠珍", content);
    assertEquals("tests.chang@fengttt.com", metadata.get(Message.MESSAGE_TO_EMAIL));
    assertEquals("Tests Chang@FT (張毓倫)", metadata.get(Office.MAPI_FROM_REPRESENTING_NAME));
    assertEquals("/O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG", metadata.get(Office.MAPI_FROM_REPRESENTING_EMAIL));
}
Also used : TransformerHandler(javax.xml.transform.sax.TransformerHandler) StringWriter(java.io.StringWriter) StreamResult(javax.xml.transform.stream.StreamResult) InputStream(java.io.InputStream) Metadata(org.apache.tika.metadata.Metadata) SAXTransformerFactory(javax.xml.transform.sax.SAXTransformerFactory) ParseContext(org.apache.tika.parser.ParseContext) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) Parser(org.apache.tika.parser.Parser) AutoDetectParser(org.apache.tika.parser.AutoDetectParser) Test(org.junit.Test) TikaTest(org.apache.tika.TikaTest)

Aggregations

SAXTransformerFactory (javax.xml.transform.sax.SAXTransformerFactory)27 TransformerHandler (javax.xml.transform.sax.TransformerHandler)22 StreamResult (javax.xml.transform.stream.StreamResult)22 AttributesImpl (org.xml.sax.helpers.AttributesImpl)7 InputStream (java.io.InputStream)6 StringWriter (java.io.StringWriter)6 Metadata (org.apache.tika.metadata.Metadata)6 File (java.io.File)5 IOException (java.io.IOException)5 Transformer (javax.xml.transform.Transformer)5 ParseContext (org.apache.tika.parser.ParseContext)5 OutputStream (java.io.OutputStream)4 OutputStreamWriter (java.io.OutputStreamWriter)4 TikaTest (org.apache.tika.TikaTest)4 AutoDetectParser (org.apache.tika.parser.AutoDetectParser)4 Test (org.junit.Test)4 ByteArrayOutputStream (java.io.ByteArrayOutputStream)3 FileOutputStream (java.io.FileOutputStream)3 Map (java.util.Map)3 TransformerConfigurationException (javax.xml.transform.TransformerConfigurationException)3