Apache NiFi - Data Provenance



Apache NiFi logs and store every information about the events occur on the ingested data in the flow. Data provenance repository stores this information and provides UI to search this event information. Data provenance can be accessed for full NiFi level and processor level also.

Data Provenance

The following table lists down the different fields in the NiFi Data Provenance event list have following fields −

S.No. Field Name Description
1 Date/Time Date and time of event.
2 Type Type of Event like ‘CREATE’.
3 FlowFileUuid UUID of the flowfile on which the event is performed.
4 Size Size of the flowfile.
5 Component Name Name of the component which  performed the event.
6 Component Type Type of the component.
7 Show lineage Last column has the show lineage icon, which is used to see the flowfile lineage as shown in the below image.
Lineage Icon

To get more information about the event, a user can click on the information icon present in the first column of the NiFi Data Provenance UI.

There are some properties in nifi.properties file, which are used to manage NiFi Data Provenance repository.

S.No. Property Name Default Value Description
1 nifi.provenance.repository.directory.default ./provenance_repository To specify the default path of NiFi data provenance .
2 nifi.provenance.repository.max.storage.time 24 hours To specify the maximum retention time of NiFi data provenance.
3 nifi.provenance.repository.max.storage.size 1 GB To specify the maximum storage of NiFi data provenance.
4 nifi.provenance.repository.rollover.time 30 secs To specify the rollover time of NiFi data provenance.
5 nifi.provenance.repository.rollover.size 100 MB To specify the rollover size of NiFi data provenance.
6 nifi.provenance.repository.indexed.fields EventType, FlowFileUUID, Filename, ProcessorID, Relationship To specify the fields used to search and index NiFi data provenance.
Advertisements