Search in sources :

Example 1 with HtmlFormatFamily

use of org.talend.dataprep.schema.html.HtmlFormatFamily in project data-prep by Talend.

the class CompositeFormatDetectorTest method guess_html_format_fail.

@Test
public void guess_html_format_fail() throws Exception {
    String fileName = "html/foo.html";
    DataSetMetadata datasetMetadata = ioTestUtils.getSimpleDataSetMetadata();
    datasetMetadata.setEncoding("UTF-16");
    Format actual = formatDetector.detect(this.getClass().getResourceAsStream(fileName));
    assertFalse(actual.getFormatFamily() instanceof HtmlFormatFamily);
}
Also used : HtmlFormatFamily(org.talend.dataprep.schema.html.HtmlFormatFamily) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) Test(org.junit.Test)

Example 2 with HtmlFormatFamily

use of org.talend.dataprep.schema.html.HtmlFormatFamily in project data-prep by Talend.

the class CompositeFormatDetectorTest method guess_html_format_success.

@Test
public void guess_html_format_success() throws Exception {
    String fileName = "html/sales-force.xls";
    DataSetMetadata datasetMetadata = ioTestUtils.getSimpleDataSetMetadata();
    datasetMetadata.setEncoding("UTF-16");
    Charset charset = new HtmlEncodingDetector().detect(this.getClass().getResourceAsStream(fileName), new Metadata());
    Format actual = formatDetector.detect(this.getClass().getResourceAsStream(fileName));
    assertTrue(actual.getFormatFamily() instanceof HtmlFormatFamily);
    assertTrue(StringUtils.equals("UTF-16", actual.getEncoding()));
}
Also used : Metadata(org.apache.tika.metadata.Metadata) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) HtmlFormatFamily(org.talend.dataprep.schema.html.HtmlFormatFamily) Charset(java.nio.charset.Charset) DataSetMetadata(org.talend.dataprep.api.dataset.DataSetMetadata) HtmlEncodingDetector(org.apache.tika.parser.html.HtmlEncodingDetector) Test(org.junit.Test)

Aggregations

Test (org.junit.Test)2 DataSetMetadata (org.talend.dataprep.api.dataset.DataSetMetadata)2 HtmlFormatFamily (org.talend.dataprep.schema.html.HtmlFormatFamily)2 Charset (java.nio.charset.Charset)1 Metadata (org.apache.tika.metadata.Metadata)1 HtmlEncodingDetector (org.apache.tika.parser.html.HtmlEncodingDetector)1