What is the challenges of link mining?

There are several challenges of link mining which are as follows −

  • Logical versus statistical dependencies − Two types of dependencies reside in the graph link structures (representing the logical relationship between objects) and probabilistic dependencies (representing statistical relationships, such as the correlation between attributes of objects where, generally, such objects are logically related).

    The coherent handling of these dependencies is also a challenge for multi-relational data mining, where the data to be mined exist in multiple tables. It should search over the several possible logical relationships between objects, furthermore the standard search over probabilistic dependencies among attributes. This takes a huge search area, which further complicates finding a reasonable mathematical model. Methods developed in inductive logic programming may be applied here, which focus on search over logical relationships.

  • Feature construction − In link-based classification, it can consider the attributes of an object and the attributes of objects connected to it. Furthermore, the links can also have attributes. The objective of feature construction is to construct a single feature defining these attributes. This can contain feature selection and feature aggregation. In feature selection, only the most discriminating features are contained.

  • Instances versus classes − This alludes to whether the model refers explicitly to individuals or classes (generic categories) of individuals. The benefit of the former model is that it can be used to connect specific individuals with high probability. An advantage of the latter model is that it can be used to generalize to new situations, with several individuals.

  • Effective use of labeled and unlabeled data − A recent strategy in learning is to incorporate a mix of both labeled and unlabeled data. Unlabeled data can support infer the object attribute distribution. Links between unlabeled (test) data allow us to use attributes of linked objects. Links among labeled (training) data and unlabeled (test) data induce dependencies that can help create more accurate inferences.

  • Link prediction − A challenge in link prediction is that the prior probability of a particular link between objects is typically extremely low. There are various methods to link prediction have been proposed based on several measures for analyzing the proximity of nodes in a network. Probabilistic models have been proposed as well. For huge data sets, it can be more efficient to model links at a higher level.

  • Closed versus open-world assumption − Most traditional approaches assume that we know all the potential entities in the domain. This “closed world” assumption is unrealistic in real-world applications. Work in this area involves the introduction of a language for defining probability distributions over relational structures that contains several set of objects.