House Price Prediction using Machine Learning in Python

House price prediction using machine learning has revolutionized the real estate industry by leveraging Python's powerful data analysis capabilities. This comprehensive guide explores how to build predictive models that help buyers, sellers, and investors make informed decisions in the dynamic housing market.

Linear Regression for House Price Prediction

Linear regression is a widely used technique for house price prediction due to its simplicity and interpretability. It assumes a linear relationship between independent variables (bedrooms, bathrooms, square footage) and the dependent variable (house price).

By fitting a linear regression model to historical data, we estimate coefficients that represent the relationship between features and target variable. This enables predictions on new data by multiplying feature values with their respective coefficients. Linear regression provides insights into each feature's impact on house prices, helping understand the significance of different factors.

Dataset Overview

We'll use the Kaggle KC House Data dataset, which contains house sale prices for King County, including Seattle. The dataset includes features like:

  • bedrooms ? Number of bedrooms

  • bathrooms ? Number of bathrooms

  • sqft_living ? Square footage of living space

  • sqft_lot ? Square footage of the lot

  • floors ? Number of floors

  • zipcode ? ZIP code location

Implementation Steps

Follow these steps to build a house price prediction model:

Step 1: Import Libraries and Load Data

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# Load the dataset
data = pd.read_csv('kc_house_data.csv')
print("Dataset shape:", data.shape)
print("\nFirst 5 rows:")
print(data.head())

Step 2: Feature Selection and Data Preparation

# Select features and target variable
features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode']
target = 'price'

X = data[features]
y = data[target]

print("Features shape:", X.shape)
print("Target shape:", y.shape)

Step 3: Split Data and Train Model

# Create sample data for demonstration
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Generate sample house data
np.random.seed(42)
n_samples = 1000

bedrooms = np.random.randint(1, 6, n_samples)
bathrooms = np.random.uniform(1, 4, n_samples)
sqft_living = np.random.randint(500, 5000, n_samples)
sqft_lot = np.random.randint(1000, 10000, n_samples)
floors = np.random.randint(1, 4, n_samples)
zipcode = np.random.choice([98001, 98002, 98003, 98004, 98005], n_samples)

# Create price based on features (with some noise)
price = (bedrooms * 20000 + bathrooms * 15000 + sqft_living * 100 + 
         sqft_lot * 5 + floors * 10000 + np.random.normal(0, 20000, n_samples))

# Create DataFrame
data = pd.DataFrame({
    'bedrooms': bedrooms,
    'bathrooms': bathrooms,
    'sqft_living': sqft_living,
    'sqft_lot': sqft_lot,
    'floors': floors,
    'zipcode': zipcode,
    'price': price
})

# Select features and target
features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode']
X = data[features]
y = data['price']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
r2_score = model.score(X_test, y_test)
print("Model R² Score:", round(r2_score, 4))

# Predict price for a new house
new_house = pd.DataFrame({
    'bedrooms': [3], 
    'bathrooms': [2.5], 
    'sqft_living': [2000], 
    'sqft_lot': [5000], 
    'floors': [2], 
    'zipcode': [98004]
})

predicted_price = model.predict(new_house)
print("Predicted Price: $", round(predicted_price[0], 2))
Model R² Score: 0.9476
Predicted Price: $ 346532.44

Model Performance Analysis

from sklearn.metrics import mean_squared_error
import numpy as np

# Calculate additional metrics
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print("Model Performance Metrics:")
print("R² Score:", round(r2_score, 4))
print("Mean Squared Error:", round(mse, 2))
print("Root Mean Squared Error:", round(rmse, 2))

# Display feature coefficients
feature_importance = pd.DataFrame({
    'Feature': features,
    'Coefficient': model.coef_
})
print("\nFeature Coefficients:")
print(feature_importance.round(2))
Model Performance Metrics:
R² Score: 0.9476
Mean Squared Error: 417688362.57
Root Mean Squared Error: 20437.88

Feature Coefficients:
       Feature  Coefficient
0     bedrooms     19661.49
1    bathrooms     15424.85
2  sqft_living       100.11
3     sqft_lot         4.99
4       floors     10066.29
5      zipcode        -0.25

Key Insights

  • R² Score ? Measures how well the model explains the variance in house prices

  • Feature Impact ? Square footage of living space has the highest coefficient, indicating strong influence on price

  • Model Limitations ? Linear regression assumes linear relationships, which may not capture complex market dynamics

Improving the Model

To enhance prediction accuracy, consider:

  • Feature Engineering ? Create new features like price per square foot or age of the house

  • Data Preprocessing ? Handle outliers and normalize features

  • Advanced Models ? Try Random Forest, Gradient Boosting, or Neural Networks

  • Cross-validation ? Use k-fold cross-validation for more robust evaluation

Conclusion

Machine learning provides powerful tools for house price prediction in Python. Linear regression offers a simple, interpretable starting point that reveals feature relationships. With proper data preprocessing and feature engineering, these models can provide valuable insights for real estate decision-making in competitive markets.

Updated on: 2026-03-27T07:51:45+05:30

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements