Search in sources :

Example 6 with FileMetadata

use of com.thinkbiganalytics.kylo.metadata.file.FileMetadata in project kylo by Teradata.

the class TikaParserTest method test.

@Test
public void test() throws Exception {
    String file = "test.xml";
    FileMetadata type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/xml", type.getMimeType());
    Assert.assertEquals("catalog", type.getProperties().get("rowTag"));
    file = "test2.xml";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/xml", type.getMimeType());
    Assert.assertEquals("some-books", type.getProperties().get("rowTag"));
    file = "MOCK_DATA.commasep.txt";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("text/csv", type.getMimeType());
    Assert.assertEquals(",", type.getProperties().get("delimiter"));
    file = "MOCK_DATA.pipe.txt";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("text/csv", type.getMimeType());
    Assert.assertEquals("|", type.getProperties().get("delimiter"));
    file = "test.parquet";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/parquet", type.getMimeType());
    file = "books1.json";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/json", type.getMimeType());
    file = "userdata1.avro";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/avro", type.getMimeType());
    file = "userdata1.orva";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/avro", type.getMimeType());
    file = "userdata1_orc";
    type = FileMetadataService.detectFromStream(getFile(file).getInputStream(), file);
    Assert.assertEquals("application/orc", type.getMimeType());
}
Also used : FileMetadata(com.thinkbiganalytics.kylo.metadata.file.FileMetadata) Test(org.junit.Test)

Example 7 with FileMetadata

use of com.thinkbiganalytics.kylo.metadata.file.FileMetadata in project kylo by Teradata.

the class SparkFileMetadataExtractor method parse.

@Override
public List<FileMetadata> parse(String[] filePaths) {
    List<Dataset> dataFrameList = new ArrayList<>();
    for (String path : filePaths) {
        Dataset df = (Dataset) sqlContext.read().format("com.thinkbiganalytics.spark.file.metadata").load(path);
        dataFrameList.add(df);
    }
    Dataset unionDf = unionAll(dataFrameList);
    Encoder<FileMetadata> encoder = Encoders.bean(FileMetadata.class);
    Dataset fileData = unionDf.as(encoder);
    return fileData.collectAsList();
}
Also used : Dataset(org.apache.spark.sql.Dataset) ArrayList(java.util.ArrayList) FileMetadata(com.thinkbiganalytics.kylo.metadata.file.FileMetadata)

Aggregations

FileMetadata (com.thinkbiganalytics.kylo.metadata.file.FileMetadata)7 ArrayList (java.util.ArrayList)3 Test (org.junit.Test)3 Dataset (org.apache.spark.sql.Dataset)2 FileParserFactory (com.thinkbiganalytics.discovery.FileParserFactory)1 SchemaParserDescriptor (com.thinkbiganalytics.discovery.model.SchemaParserDescriptor)1 FileSchemaParser (com.thinkbiganalytics.discovery.parser.FileSchemaParser)1 SampleFileSparkScript (com.thinkbiganalytics.discovery.parser.SampleFileSparkScript)1 SparkFileSchemaParser (com.thinkbiganalytics.discovery.parser.SparkFileSchemaParser)1 SchemaParserAnnotationTransformer (com.thinkbiganalytics.discovery.rest.controller.SchemaParserAnnotationTransformer)1 AbstractTransformResponseModifier (com.thinkbiganalytics.spark.rest.controller.AbstractTransformResponseModifier)1 FileMetadataResponse (com.thinkbiganalytics.spark.rest.model.FileMetadataResponse)1 ModifiedTransformResponse (com.thinkbiganalytics.spark.rest.model.ModifiedTransformResponse)1 Charset (java.nio.charset.Charset)1 Arrays (java.util.Arrays)1 List (java.util.List)1 Map (java.util.Map)1 Optional (java.util.Optional)1 Collectors (java.util.stream.Collectors)1 QName (javax.xml.namespace.QName)1