What is Tuple ID Propagation?

Data MiningDatabaseData Structure

Tuple ID propagation is an approach for implementing virtual join, which highly improves effectiveness of multirelational classification. Rather than physically joining relations, they are virtually combined by connecting the IDs of target tuples to tuples in non-target relations.

In this method the predicates can be computed as if a physical join were implemented. Tuple ID propagation is flexible and effectiveness, because IDs can simply be propagated between some two relations, needing only small amounts of data transfer and more storage space. By doing so, predicates in multiple relations can be computed with small redundant computation.

Tuple ID propagation must be enforced with specific constraints. There are two cases where such propagation can be counterproductive −

  • propagation via large fan-outs

  • propagation via long, weak links.

The first case appears when, after propagating the IDs to a relation R, it is discovered that each tuple in R is joined with some target tuples and each target tuple is joined with some tuples in R. The semantic connection between R and the target relation is very weak because the connection is unselective.

For instance, propagation between people via birth-country links cannot be productive. The second case appears when the propagation goes through long connection (e.g., connecting a student with his car dealer’s pet cannot be productive). From the benefit of efficiency and certainty, propagation via such connection is discouraged.

CrossMine is a method that need tuple ID propagation for multirelational classification. It can better combine the data of ID propagation, CrossMine need complex predicates as component of rules. A complex predicate, p, includes two parts which are as follows −

prop-path − This denotes how to propagate IDs. For instance, the path “Loan. account_ID → Account.account_ID” denotes propagating IDs from Loan to Account using account_ID. If no ID propagation is contained, prop-path is null.

Constraint − This is a predicate denoting the constraint on the relation to which the IDs are propagated. It can be categorical or numerical.

CrossMine construct a classifier including a set of rules, each including a list of complex predicates and a class label. CrossMine is a sequential covering algorithm such as FOIL. It can construct rules one at a time. After a rule r is construct, all positive target tuples satisfying r are deleted from the data set.

CrossMine regularly searches for the best complex predicate and add it to the modern rule, until the stop criterion is assembled. A relation is active if it occurs in the current rule. Before searching for the following best predicate, each active relation is needed to have the IDset of propagated IDs for every of its tuples.

Updated on 17-Feb-2022 11:49:00