Search in sources :

Example 1 with WorkUnitState

use of gobblin.configuration.WorkUnitState in project incubator-gobblin by apache.

the class JsonStringToJsonIntermediateConverterTest method setUp.

@BeforeClass
public static void setUp() throws SchemaConversionException {
    converter = new JsonStringToJsonIntermediateConverter();
    WorkUnitState workUnit = new WorkUnitState();
    workUnit.getPropAsBoolean("gobblin.converter.jsonStringToJsonIntermediate.unpackComplexSchemas", true);
    converter.convertSchema("[]", workUnit);
    Type jsonType = new TypeToken<JsonObject>() {
    }.getType();
    Gson gson = new Gson();
    testJsonData = gson.fromJson(new InputStreamReader(JsonStringToJsonIntermediateConverterTest.class.getResourceAsStream("/converter/JsonStringToJsonIntermediateConverter.json")), jsonType);
}
Also used : Type(java.lang.reflect.Type) InputStreamReader(java.io.InputStreamReader) WorkUnitState(gobblin.configuration.WorkUnitState) JsonObject(com.google.gson.JsonObject) Gson(com.google.gson.Gson) BeforeClass(org.testng.annotations.BeforeClass)

Example 2 with WorkUnitState

use of gobblin.configuration.WorkUnitState in project incubator-gobblin by apache.

the class GobblinMCEPublisherTest method testPublishGMCEWithoutFile.

@Test(dependsOnMethods = { "testPublishGMCEForAvro" })
public void testPublishGMCEWithoutFile() throws IOException {
    GobblinMCEProducer producer = Mockito.mock(GobblinMCEProducer.class);
    Mockito.doCallRealMethod().when(producer).getGobblinMetadataChangeEvent(anyMap(), anyList(), anyList(), anyMap(), any(), any());
    Mockito.doAnswer(new Answer() {

        @Override
        public Object answer(InvocationOnMock invocation) throws Throwable {
            Object[] args = invocation.getArguments();
            GobblinMetadataChangeEvent gmce = producer.getGobblinMetadataChangeEvent((Map<Path, Metrics>) args[0], null, null, (Map<String, String>) args[1], OperationType.change_property, SchemaSource.NONE);
            Assert.assertEquals(gmce.getNewFiles().size(), 1);
            Assert.assertNull(gmce.getOldFiles());
            Assert.assertNull(gmce.getOldFilePrefixes());
            Assert.assertEquals(gmce.getOperationType(), OperationType.change_property);
            return null;
        }
    }).when(producer).sendGMCE(anyMap(), anyList(), anyList(), anyMap(), any(), any());
    WorkUnitState state = new WorkUnitState();
    setGMCEPublisherStateWithoutNewFile(state);
    Mockito.doCallRealMethod().when(producer).setState(state);
    producer.setState(state);
    GobblinMCEPublisher publisher = new GobblinMCEPublisher(state, producer);
    publisher.publishData(Arrays.asList(state));
}
Also used : Answer(org.mockito.stubbing.Answer) GobblinMetadataChangeEvent(org.apache.gobblin.metadata.GobblinMetadataChangeEvent) InvocationOnMock(org.mockito.invocation.InvocationOnMock) WorkUnitState(gobblin.configuration.WorkUnitState) GobblinMCEProducer(org.apache.gobblin.iceberg.GobblinMCEProducer) Map(java.util.Map) Test(org.testng.annotations.Test)

Example 3 with WorkUnitState

use of gobblin.configuration.WorkUnitState in project incubator-gobblin by apache.

the class GobblinMCEPublisherTest method testPublishGMCEForAvro.

@Test
public void testPublishGMCEForAvro() throws IOException {
    GobblinMCEProducer producer = Mockito.mock(GobblinMCEProducer.class);
    Mockito.doCallRealMethod().when(producer).getGobblinMetadataChangeEvent(anyMap(), anyList(), anyList(), anyMap(), any(), any());
    Mockito.doAnswer(new Answer() {

        @Override
        public Object answer(InvocationOnMock invocation) throws Throwable {
            Object[] args = invocation.getArguments();
            GobblinMetadataChangeEvent gmce = producer.getGobblinMetadataChangeEvent((Map<Path, Metrics>) args[0], null, null, (Map<String, String>) args[1], OperationType.add_files, SchemaSource.SCHEMAREGISTRY);
            Assert.assertEquals(gmce.getNewFiles().size(), 1);
            FileSystem fs = FileSystem.get(new Configuration());
            Assert.assertEquals(gmce.getNewFiles().get(0).getFilePath(), new Path(dataFile.getAbsolutePath()).makeQualified(fs.getUri(), new Path("/")).toString());
            return null;
        }
    }).when(producer).sendGMCE(anyMap(), anyList(), anyList(), anyMap(), any(), any());
    WorkUnitState state = new WorkUnitState();
    setGMCEPublisherStateForAvroFile(state);
    Mockito.doCallRealMethod().when(producer).setState(state);
    producer.setState(state);
    GobblinMCEPublisher publisher = new GobblinMCEPublisher(state, producer);
    publisher.publishData(Arrays.asList(state));
}
Also used : Path(org.apache.hadoop.fs.Path) GobblinMetadataChangeEvent(org.apache.gobblin.metadata.GobblinMetadataChangeEvent) Configuration(org.apache.hadoop.conf.Configuration) WorkUnitState(gobblin.configuration.WorkUnitState) GobblinMCEProducer(org.apache.gobblin.iceberg.GobblinMCEProducer) Answer(org.mockito.stubbing.Answer) InvocationOnMock(org.mockito.invocation.InvocationOnMock) FileSystem(org.apache.hadoop.fs.FileSystem) Map(java.util.Map) Test(org.testng.annotations.Test)

Example 4 with WorkUnitState

use of gobblin.configuration.WorkUnitState in project incubator-gobblin by apache.

the class GrokToJsonConverterTest method convertOutputWithNullableFields.

@Test
public void convertOutputWithNullableFields() throws Exception {
    JsonParser parser = new JsonParser();
    String inputRecord = "10.121.123.104 - - [01/Nov/2012:21:01:17 +0100] \"GET /cpc/auth.do?loginsetup=true&targetPage=%2Fcpc%2F HTTP/1.1\" 302 466";
    JsonElement jsonElement = parser.parse(new InputStreamReader(getClass().getResourceAsStream("/converter/grok/schemaWithNullableFields.json")));
    JsonArray outputSchema = jsonElement.getAsJsonArray();
    GrokToJsonConverter grokToJsonConverter = new GrokToJsonConverter();
    WorkUnitState workUnitState = new WorkUnitState();
    workUnitState.setProp(GrokToJsonConverter.GROK_PATTERN, "^%{IPORHOST:clientip} (?:-|%{USER:ident}) (?:-|%{USER:auth}) \\[%{HTTPDATE:timestamp}\\] \\\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|-)\\\" %{NUMBER:response} (?:-|%{NUMBER:bytes})");
    grokToJsonConverter.init(workUnitState);
    JsonObject actual = grokToJsonConverter.convertRecord(outputSchema, inputRecord, workUnitState).iterator().next();
    JsonObject expected = parser.parse(new InputStreamReader(getClass().getResourceAsStream("/converter/grok/convertedRecord.json"))).getAsJsonObject();
    Assert.assertEquals(actual, expected);
    grokToJsonConverter.close();
}
Also used : JsonArray(com.google.gson.JsonArray) InputStreamReader(java.io.InputStreamReader) JsonElement(com.google.gson.JsonElement) WorkUnitState(gobblin.configuration.WorkUnitState) JsonObject(com.google.gson.JsonObject) JsonParser(com.google.gson.JsonParser) Test(org.testng.annotations.Test)

Example 5 with WorkUnitState

use of gobblin.configuration.WorkUnitState in project incubator-gobblin by apache.

the class GrokToJsonConverterTest method convertWithNullStringSet.

@Test
public void convertWithNullStringSet() throws Exception {
    JsonParser parser = new JsonParser();
    String inputRecord = "79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be mybucket [06/Feb/2014:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 3E57427F3EXAMPLE REST.GET.VERSIONING - \"GET /mybucket?versioning HTTP/1.1\" 200 - 113 - 7 - \"-\" \"S3Console/0.4\" -";
    JsonElement jsonElement = parser.parse(new InputStreamReader(getClass().getResourceAsStream("/converter/grok/s3AccessLogSchema.json")));
    JsonArray outputSchema = jsonElement.getAsJsonArray();
    GrokToJsonConverter grokToJsonConverter = new GrokToJsonConverter();
    WorkUnitState workUnitState = new WorkUnitState();
    // Grok expression was taken from https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/aws
    workUnitState.setProp(GrokToJsonConverter.GROK_PATTERN, "%{WORD:owner} %{NOTSPACE:bucket} \\[%{HTTPDATE:timestamp}\\] %{IP:clientip} %{NOTSPACE:requester} %{NOTSPACE:request_id} %{NOTSPACE:operation} %{NOTSPACE:key} (?:\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\"|-) (?:%{INT:response:int}|-) (?:-|%{NOTSPACE:error_code}) (?:%{INT:bytes:int}|-) (?:%{INT:object_size:int}|-) (?:%{INT:request_time_ms:int}|-) (?:%{INT:turnaround_time_ms:int}|-) (?:%{QS:referrer}|-) (?:\"?%{QS:agent}\"?|-) (?:-|%{NOTSPACE:version_id})");
    workUnitState.setProp(GrokToJsonConverter.NULLSTRING_REGEXES, "[\\s-]");
    grokToJsonConverter.init(workUnitState);
    JsonObject actual = grokToJsonConverter.convertRecord(outputSchema, inputRecord, workUnitState).iterator().next();
    JsonObject expected = parser.parse(new InputStreamReader(getClass().getResourceAsStream("/converter/grok/convertedS3AccessLogRecord.json"))).getAsJsonObject();
    Assert.assertEquals(actual, expected);
    grokToJsonConverter.close();
}
Also used : JsonArray(com.google.gson.JsonArray) InputStreamReader(java.io.InputStreamReader) JsonElement(com.google.gson.JsonElement) WorkUnitState(gobblin.configuration.WorkUnitState) JsonObject(com.google.gson.JsonObject) JsonParser(com.google.gson.JsonParser) Test(org.testng.annotations.Test)

Aggregations

WorkUnitState (gobblin.configuration.WorkUnitState)7 Test (org.testng.annotations.Test)6 JsonObject (com.google.gson.JsonObject)4 InputStreamReader (java.io.InputStreamReader)4 JsonArray (com.google.gson.JsonArray)3 JsonElement (com.google.gson.JsonElement)3 JsonParser (com.google.gson.JsonParser)3 Map (java.util.Map)3 GobblinMCEProducer (org.apache.gobblin.iceberg.GobblinMCEProducer)3 GobblinMetadataChangeEvent (org.apache.gobblin.metadata.GobblinMetadataChangeEvent)3 InvocationOnMock (org.mockito.invocation.InvocationOnMock)3 Answer (org.mockito.stubbing.Answer)3 Configuration (org.apache.hadoop.conf.Configuration)2 FileSystem (org.apache.hadoop.fs.FileSystem)2 Path (org.apache.hadoop.fs.Path)2 Gson (com.google.gson.Gson)1 Type (java.lang.reflect.Type)1 Charset (java.nio.charset.Charset)1 CharsetEncoder (java.nio.charset.CharsetEncoder)1 BeforeClass (org.testng.annotations.BeforeClass)1