What are Generalized Linear Models?

Data MiningDatabaseData Structure

Generalized linear models defines the theoretical authority on which linear regression can be used to the modeling of categorical response variables. In generalized linear models, the variance of the response variable, y, is a function of the mean value of y, unlike in linear regression, where the variance of y is constant.

Generalized linear models (GLMs) are an expansion of traditional linear models. This algorithm fits generalized linear models to the information by maximizing the loglikelihood. The elastic net penalty can be used for parameter regularization. The model fitting calculation is parallel, completely fast, and scales completely well for models with a definite number of predictors with non-zero coefficients.

There are two types of generalized linear models such as logistic regression and Poisson regression. Logistic regression models the probability of various event appearing as a linear function of a group of predictor variables. Count data frequently display a Poisson distribution and are generally modeled using Poisson regression.

Log-linear models exact discrete multidimensional probability distributions. They can be used to calculate the probability value related to data cube cells. For instance, suppose that given data for the attributes city, item, year, and sales. In the log-linear approach, all attributes should be categorical and thus continuous-valued attributes (like sales) should discretized.

The approach can be used to calculate the probability of each cell in the 4-D base cuboid for the given attributes, depends on the 2-D cuboids for city and item, city and year, city and sales, and the 3-D cuboid for item, year, and sales. In this method, an iterative approaches can be used to make higher-series data cubes from lowerorder ones.

The method scales up well to enable for several dimensions. Aside from prediction, the log-linear model is beneficial for data compression (because the smaller-order cuboids generally occupy less area than the base cuboid) and data smoothing (because cell calculates in the smaller-order cuboids are less dependent to sampling variations than cell calculates in the base cuboid).

Decision tree induction can be suitable so as to predict continuous (ordered) values, instead of class labels. There are two types of trees for prediction regression trees and model trees. Regression trees were suggested as an element of the CART learning system.

Every regression tree leaf saves a continuous-valued prediction, which is the average value of the predicted attribute for the training tuples that grasp the leaf. By contrast, in model trees, every leaf influence a regression model and a multivariate linear equation for the predicted attribute. Regression and model trees influence to be more efficient than linear regression when the data are not defined well by a simple linear model.

Updated on 16-Feb-2022 11:52:19