What is Hypothesis Testing?



Hypothesis testing is the simplest approach to integrating data into a company’s decision-making processes. The purpose of hypothesis testing is to substantiate or disprove preconceived ideas, and it is a part of almost all data mining endeavors.

Data miners provide bounce back and forth among methods, first thinking up possible descriptions for observed behavior and letting those hypotheses dictate the data be computed.

Hypothesis testing is what scientists and statisticians traditionally spend their lives doing. A hypothesis is a proposed explanation whose validity can be tested by analyzing data. Such information can easily be collected by observation or created through an experiment, including a test mailing.

Hypothesis testing is most valuable when it discloses that the assumptions that have been guiding an organization’s actions in the industry area are wrong. For instance, consider that an organization’s advertising depends on several hypotheses about the target market for a product or service and the feature of the responses. It is worth testing whether these hypotheses are borne out by actual responses.

One approach is to use different call-in numbers in different ads and record the number that each responder dials. Information collected during the call can then be compared with the profile of the population the advertisement was designed to reach.

The key to generating hypotheses is getting diverse input from throughout the organization and, where appropriate, outside it as well. Often, all that is needed to start the ideas flowing is a clear statement of the problem itself—especially if it is something that has not previously been recognized as a problem.

It happens more often than one might suppose that problems go unrecognized because they are not captured by the metrics being used to evaluate the organization’s performance.

If an organization has always computed its sales force on the multiple fresh sales made each month, the salesperson can never have given much thought to the question of how long new users remain active or how much they spend throughout their associationship with the organization.

Hypothesis testing is certainly useful, but there comes a time when it is not sufficient. The data mining techniques described in the rest of this book are all designed for learning new things by creating models based on data.

In the most general sense, a model is an explanation or description of how something works that reflects reality well enough that it can be used to make inferences about the real world.


Advertisements