Why Naïve Bayesian is classifications called Naïve?

Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class. Bayesian classifiers have also exhibited high accuracy and speed when applied to a large database.

Once classes are defined, the system should infer rules that govern the classification, therefore the system should be able to find the description of each class. The descriptions should only refer to the predicting attributes of the training set so that only the positive examples should satisfy the description, not the negative examples. A rule is said to be correct if its description covers all the positive examples and none of the negative examples of a class is covered.

It is assuming that the contributions by all attributes are independent and that each contributes equally to the classification problem, a simple classification scheme called Naïve Bayes classification. By analyzing the contribution of each “independent” attribute, a conditional probability is determined. A classification is made by combining the impact that the different attributes have on the prediction to be made.

Naïve Bayes classification is called Naïve because it assumes class conditional independence. The effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is made to reduce computational costs and hence is considered Naïve.

Bayes Theorem − Let X be a data tuple. In Bayesian terms, X is considered “evidence.” Let H be some hypothesis, such as that the data tuple X belong to a specified class C. The probability P (H|X) is determined to classify the data. This probability P (H|X) is the probability that hypothesis H holds given the “evidence” or observed data tuple X.

P (H|X) is the posterior probability of H conditioned on X. For example, suppose a world of data tuples is confined to customers described by the attribute age and income, respectively, and that X is 30 years old customers with Rs. 20,000 income. Suppose that H is the hypothesis that the customer will buy a computer. Then P (H|X) reflects the probability that customer X will buy a computer given that the customer’s age and income are known.

P (H) is the prior probability of H. For example, this is the probability that any given customer will buy a computer, regardless of age, income, or any other information. The posterior probability P (H|X) is based on more information than the prior probability P (H), which is independent of X.

Similarly, P (X|H) is the posterior probability of X conditioned on H. It is the probability that a customer X is 30 years old and earns Rs. 20,000.

P (H), P (X|H), and P (X) can be estimated from the given data. Bayes theorem provides a way of calculating the posterior probability P (H|X), from P (H), P (X|H), and P(X). It is given by