Search in sources :

Example 6 with Relation

use of co.cask.cdap.data2.metadata.lineage.Relation in project cdap by caskdata.

the class LineageCollapserTest method testCollapseMulti.

@Test
public void testCollapseMulti() throws Exception {
    Set<Relation> relations = ImmutableSet.of(new Relation(data1, flow1, AccessType.READ, runId1, ImmutableSet.of(flowlet11)), new Relation(data1, flow1, AccessType.WRITE, runId1, ImmutableSet.of(flowlet11)), new Relation(data1, flow1, AccessType.READ, runId1, ImmutableSet.of(flowlet12)), new Relation(data1, flow2, AccessType.READ, runId1, ImmutableSet.of(flowlet11)), new Relation(data1, flow2, AccessType.READ, runId1, ImmutableSet.of(flowlet11)), new Relation(data2, flow1, AccessType.READ, runId1, ImmutableSet.of(flowlet11)), new Relation(data2, flow1, AccessType.READ, runId1, ImmutableSet.of(flowlet11)));
    // Collapse on access
    Assert.assertEquals(toSet(new CollapsedRelation(data1, flow1, toSet(AccessType.READ, AccessType.WRITE), toSet(runId1), toSet(flowlet11)), new CollapsedRelation(data1, flow1, toSet(AccessType.READ), toSet(runId1), toSet(flowlet12)), new CollapsedRelation(data1, flow2, toSet(AccessType.READ), toSet(runId1), toSet(flowlet11)), new CollapsedRelation(data2, flow1, toSet(AccessType.READ), toSet(runId1), toSet(flowlet11))), LineageCollapser.collapseRelations(relations, ImmutableSet.of(CollapseType.ACCESS)));
}
Also used : CollapsedRelation(co.cask.cdap.data2.metadata.lineage.CollapsedRelation) CollapsedRelation(co.cask.cdap.data2.metadata.lineage.CollapsedRelation) Relation(co.cask.cdap.data2.metadata.lineage.Relation) Test(org.junit.Test)

Example 7 with Relation

use of co.cask.cdap.data2.metadata.lineage.Relation in project cdap by caskdata.

the class LineageCollapserTest method testCollapseAccess.

@Test
public void testCollapseAccess() throws Exception {
    Set<Relation> relations = ImmutableSet.of(new Relation(data1, flow1, AccessType.READ, runId1, ImmutableSet.of(flowlet11)), new Relation(data1, flow1, AccessType.WRITE, runId1, ImmutableSet.of(flowlet11)), new Relation(data1, flow1, AccessType.READ, runId1, ImmutableSet.of(flowlet12)));
    // Collapse on access
    Assert.assertEquals(toSet(new CollapsedRelation(data1, flow1, toSet(AccessType.READ, AccessType.WRITE), toSet(runId1), toSet(flowlet11)), new CollapsedRelation(data1, flow1, toSet(AccessType.READ), toSet(runId1), toSet(flowlet12))), LineageCollapser.collapseRelations(relations, ImmutableSet.of(CollapseType.ACCESS)));
}
Also used : CollapsedRelation(co.cask.cdap.data2.metadata.lineage.CollapsedRelation) CollapsedRelation(co.cask.cdap.data2.metadata.lineage.CollapsedRelation) Relation(co.cask.cdap.data2.metadata.lineage.Relation) Test(org.junit.Test)

Example 8 with Relation

use of co.cask.cdap.data2.metadata.lineage.Relation in project cdap by caskdata.

the class LineageAdmin method getRollupRelations.

private Multimap<RelationKey, Relation> getRollupRelations(Multimap<RelationKey, Relation> relations, Map<ProgramRunId, RunRecordMeta> runRecordMap, Map<String, ProgramRunId> workflowIdMap) throws NotFoundException {
    Multimap<RelationKey, Relation> relationsNew = HashMultimap.create();
    for (Map.Entry<RelationKey, Collection<Relation>> entry : relations.asMap().entrySet()) {
        for (Relation relation : entry.getValue()) {
            ProgramRunId workflowProgramRunId = getWorkflowProgramRunid(relation, runRecordMap, workflowIdMap);
            if (workflowProgramRunId == null) {
                relationsNew.put(entry.getKey(), relation);
            } else {
                ProgramId workflowProgramId = new ProgramId(workflowProgramRunId.getNamespace(), workflowProgramRunId.getApplication(), workflowProgramRunId.getType(), workflowProgramRunId.getProgram());
                Relation workflowRelation;
                NamespacedEntityId data = relation.getData();
                if (data instanceof DatasetId) {
                    workflowRelation = new Relation((DatasetId) data, workflowProgramId, relation.getAccess(), RunIds.fromString(workflowProgramRunId.getRun()));
                } else {
                    workflowRelation = new Relation((StreamId) data, workflowProgramId, relation.getAccess(), RunIds.fromString(workflowProgramRunId.getRun()));
                }
                relationsNew.put(entry.getKey(), workflowRelation);
            }
        }
    }
    return relationsNew;
}
Also used : Relation(co.cask.cdap.data2.metadata.lineage.Relation) NamespacedEntityId(co.cask.cdap.proto.id.NamespacedEntityId) StreamId(co.cask.cdap.proto.id.StreamId) Collection(java.util.Collection) ProgramRunId(co.cask.cdap.proto.id.ProgramRunId) ProgramId(co.cask.cdap.proto.id.ProgramId) HashMap(java.util.HashMap) Map(java.util.Map) DatasetId(co.cask.cdap.proto.id.DatasetId)

Example 9 with Relation

use of co.cask.cdap.data2.metadata.lineage.Relation in project cdap by caskdata.

the class LineageAdmin method getWorkflowIds.

private Set<String> getWorkflowIds(Multimap<RelationKey, Relation> relations, Map<ProgramRunId, RunRecordMeta> runRecordMap) throws NotFoundException {
    final Set<String> workflowIDs = new HashSet<>();
    for (Relation relation : Iterables.concat(relations.values())) {
        RunRecordMeta runRecord = runRecordMap.get(new ProgramRunId(relation.getProgram().getNamespace(), relation.getProgram().getApplication(), relation.getProgram().getType(), relation.getProgram().getProgram(), relation.getRun().getId()));
        if (runRecord != null && runRecord.getProperties().containsKey("workflowrunid")) {
            String workflowRunId = runRecord.getProperties().get("workflowrunid");
            workflowIDs.add(workflowRunId);
        }
    }
    return workflowIDs;
}
Also used : Relation(co.cask.cdap.data2.metadata.lineage.Relation) RunRecordMeta(co.cask.cdap.internal.app.store.RunRecordMeta) ProgramRunId(co.cask.cdap.proto.id.ProgramRunId) HashSet(java.util.HashSet)

Example 10 with Relation

use of co.cask.cdap.data2.metadata.lineage.Relation in project cdap by caskdata.

the class LineageTestRun method testFlowLineage.

@Test
public void testFlowLineage() throws Exception {
    NamespaceId namespace = new NamespaceId("testFlowLineage");
    ApplicationId app = namespace.app(AllProgramsApp.NAME);
    ProgramId flow = app.flow(AllProgramsApp.NoOpFlow.NAME);
    DatasetId dataset = namespace.dataset(AllProgramsApp.DATASET_NAME);
    StreamId stream = namespace.stream(AllProgramsApp.STREAM_NAME);
    namespaceClient.create(new NamespaceMeta.Builder().setName(namespace).build());
    try {
        appClient.deploy(namespace, createAppJarFile(AllProgramsApp.class));
        // Add metadata to applicaton
        ImmutableMap<String, String> appProperties = ImmutableMap.of("app-key1", "app-value1");
        addProperties(app, appProperties);
        Assert.assertEquals(appProperties, getProperties(app, MetadataScope.USER));
        ImmutableSet<String> appTags = ImmutableSet.of("app-tag1");
        addTags(app, appTags);
        Assert.assertEquals(appTags, getTags(app, MetadataScope.USER));
        // Add metadata to flow
        ImmutableMap<String, String> flowProperties = ImmutableMap.of("flow-key1", "flow-value1");
        addProperties(flow, flowProperties);
        Assert.assertEquals(flowProperties, getProperties(flow, MetadataScope.USER));
        ImmutableSet<String> flowTags = ImmutableSet.of("flow-tag1", "flow-tag2");
        addTags(flow, flowTags);
        Assert.assertEquals(flowTags, getTags(flow, MetadataScope.USER));
        // Add metadata to dataset
        ImmutableMap<String, String> dataProperties = ImmutableMap.of("data-key1", "data-value1");
        addProperties(dataset, dataProperties);
        Assert.assertEquals(dataProperties, getProperties(dataset, MetadataScope.USER));
        ImmutableSet<String> dataTags = ImmutableSet.of("data-tag1", "data-tag2");
        addTags(dataset, dataTags);
        Assert.assertEquals(dataTags, getTags(dataset, MetadataScope.USER));
        // Add metadata to stream
        ImmutableMap<String, String> streamProperties = ImmutableMap.of("stream-key1", "stream-value1");
        addProperties(stream, streamProperties);
        Assert.assertEquals(streamProperties, getProperties(stream, MetadataScope.USER));
        ImmutableSet<String> streamTags = ImmutableSet.of("stream-tag1", "stream-tag2");
        addTags(stream, streamTags);
        Assert.assertEquals(streamTags, getTags(stream, MetadataScope.USER));
        long startTime = TimeMathParser.nowInSeconds();
        RunId flowRunId = runAndWait(flow);
        // Wait for few seconds so that the stop time secs is more than start time secs.
        TimeUnit.SECONDS.sleep(2);
        waitForStop(flow, true);
        long stopTime = TimeMathParser.nowInSeconds();
        // Fetch dataset lineage
        LineageRecord lineage = fetchLineage(dataset, startTime, stopTime, 10);
        LineageRecord expected = LineageSerializer.toLineageRecord(startTime, stopTime, new Lineage(ImmutableSet.of(new Relation(dataset, flow, AccessType.UNKNOWN, flowRunId, ImmutableSet.of(flow.flowlet(AllProgramsApp.A.NAME))), new Relation(stream, flow, AccessType.READ, flowRunId, ImmutableSet.of(flow.flowlet(AllProgramsApp.A.NAME))))), Collections.<CollapseType>emptySet());
        Assert.assertEquals(expected, lineage);
        // Fetch dataset lineage with time strings
        lineage = fetchLineage(dataset, "now-1h", "now+1h", 10);
        Assert.assertEquals(expected.getRelations(), lineage.getRelations());
        // Fetch stream lineage
        lineage = fetchLineage(stream, startTime, stopTime, 10);
        // same as dataset's lineage
        Assert.assertEquals(expected, lineage);
        // Fetch stream lineage with time strings
        lineage = fetchLineage(stream, "now-1h", "now+1h", 10);
        // same as dataset's lineage
        Assert.assertEquals(expected.getRelations(), lineage.getRelations());
        // Assert metadata
        // Id.Flow needs conversion to Id.Program JIRA - CDAP-3658
        Assert.assertEquals(toSet(new MetadataRecord(app, MetadataScope.USER, appProperties, appTags), new MetadataRecord(flow, MetadataScope.USER, flowProperties, flowTags), new MetadataRecord(dataset, MetadataScope.USER, dataProperties, dataTags), new MetadataRecord(stream, MetadataScope.USER, streamProperties, streamTags)), fetchRunMetadata(flow.run(flowRunId.getId())));
        // Assert with a time range after the flow run should return no results
        long laterStartTime = stopTime + 1000;
        long laterEndTime = stopTime + 5000;
        // Fetch stream lineage
        lineage = fetchLineage(stream, laterStartTime, laterEndTime, 10);
        Assert.assertEquals(LineageSerializer.toLineageRecord(laterStartTime, laterEndTime, new Lineage(ImmutableSet.<Relation>of()), Collections.<CollapseType>emptySet()), lineage);
        // Assert with a time range before the flow run should return no results
        long earlierStartTime = startTime - 5000;
        long earlierEndTime = startTime - 1000;
        // Fetch stream lineage
        lineage = fetchLineage(stream, earlierStartTime, earlierEndTime, 10);
        Assert.assertEquals(LineageSerializer.toLineageRecord(earlierStartTime, earlierEndTime, new Lineage(ImmutableSet.<Relation>of()), Collections.<CollapseType>emptySet()), lineage);
        // Test bad time ranges
        fetchLineage(dataset, "sometime", "sometime", 10, BadRequestException.class);
        fetchLineage(dataset, "now+1h", "now-1h", 10, BadRequestException.class);
        // Test non-existent run
        assertRunMetadataNotFound(flow.run(RunIds.generate(1000).getId()));
    } finally {
        namespaceClient.delete(namespace);
    }
}
Also used : StreamId(co.cask.cdap.proto.id.StreamId) CollapseType(co.cask.cdap.proto.metadata.lineage.CollapseType) Lineage(co.cask.cdap.data2.metadata.lineage.Lineage) AllProgramsApp(co.cask.cdap.client.app.AllProgramsApp) ProgramId(co.cask.cdap.proto.id.ProgramId) DatasetId(co.cask.cdap.proto.id.DatasetId) Relation(co.cask.cdap.data2.metadata.lineage.Relation) LineageRecord(co.cask.cdap.proto.metadata.lineage.LineageRecord) NamespaceMeta(co.cask.cdap.proto.NamespaceMeta) NamespaceId(co.cask.cdap.proto.id.NamespaceId) ApplicationId(co.cask.cdap.proto.id.ApplicationId) RunId(org.apache.twill.api.RunId) MetadataRecord(co.cask.cdap.common.metadata.MetadataRecord) Test(org.junit.Test)

Aggregations

Relation (co.cask.cdap.data2.metadata.lineage.Relation)20 Test (org.junit.Test)15 Lineage (co.cask.cdap.data2.metadata.lineage.Lineage)10 ProgramRunId (co.cask.cdap.proto.id.ProgramRunId)10 Store (co.cask.cdap.app.store.Store)7 LineageStore (co.cask.cdap.data2.metadata.lineage.LineageStore)7 MetadataStore (co.cask.cdap.data2.metadata.store.MetadataStore)7 DatasetId (co.cask.cdap.proto.id.DatasetId)7 ProgramId (co.cask.cdap.proto.id.ProgramId)6 RunId (org.apache.twill.api.RunId)6 CollapsedRelation (co.cask.cdap.data2.metadata.lineage.CollapsedRelation)5 StreamId (co.cask.cdap.proto.id.StreamId)5 MetadataRecord (co.cask.cdap.common.metadata.MetadataRecord)4 NamespaceId (co.cask.cdap.proto.id.NamespaceId)4 HashSet (java.util.HashSet)4 ApplicationId (co.cask.cdap.proto.id.ApplicationId)3 NamespacedEntityId (co.cask.cdap.proto.id.NamespacedEntityId)3 HashMap (java.util.HashMap)3 AllProgramsApp (co.cask.cdap.client.app.AllProgramsApp)2 RunRecordMeta (co.cask.cdap.internal.app.store.RunRecordMeta)2