Building a Recommendation Engine in Python Using the LightFM library

Recommendation engines are one of the most popular applications of machine learning in the real world. With the growth of e-commerce, online streaming services, and social media, recommendation engines have become a critical component in providing personalized content and recommendations to users. In this tutorial, we will learn how to build a recommendation engine using the LightFM library.

LightFM is a Python library that allows you to build recommender systems with both explicit and implicit feedback, such as ratings or user interactions. It is a hybrid recommender system that can handle both content-based and collaborative filtering approaches. LightFM is built on top of NumPy, SciPy, and Cython, and provides fast and scalable training of large datasets.

In this tutorial, we will be using the MovieLens dataset, which contains 100k movie ratings from 943 users on 1682 movies. Our goal is to build a recommendation engine that can predict movie ratings for users based on their past ratings and similar users' ratings.

Installation

Before we dive into using the LightFM library, we first need to install it using pip ?

pip install lightfm

This will download and install the LightFM library and its dependencies. Once installed, we can start working with LightFM and leverage its modules!

Loading the MovieLens Dataset

The first step is to load the MovieLens dataset into our Python environment. We will be using the built-in MovieLens dataset in LightFM, which is available in the datasets module ?

from lightfm.datasets import fetch_movielens

# Load the MovieLens dataset with minimum rating of 4.0
data = fetch_movielens(min_rating=4.0)

print("Dataset keys:", list(data.keys()))
print("Train interactions shape:", data['train'].shape)
print("Test interactions shape:", data['test'].shape)
Dataset keys: ['train', 'test', 'item_features', 'item_feature_labels', 'item_labels']
Train interactions shape: (943, 1682)
Test interactions shape: (943, 1682)

The fetch_movielens function downloads the MovieLens dataset and returns a dictionary containing the ratings matrix, user features, item features, and other information. The ratings matrix is already split into training and testing sets.

Building the Recommendation Model

The next step is to build the recommendation engine using LightFM. We will be using the WARP (Weighted Approximate-Rank Pairwise) algorithm, which is a hybrid recommender system that combines the advantages of both content-based and collaborative filtering approaches ?

from lightfm import LightFM

# Initialize the model with WARP loss function
model = LightFM(loss='warp', random_state=42)

# Fit the model to the training data
model.fit(data['train'], epochs=30, num_threads=2)
print("Model training completed!")
Model training completed!

The LightFM class initializes the recommendation engine with the WARP loss function. We then fit the model to the training data for 30 epochs using two threads.

Generating Recommendations

Now that we have built the recommendation engine, we can use it to generate recommendations for users. The following code generates top 5 movie recommendations for user 3 ?

import numpy as np

def get_recommendations(model, user_id, interactions, n_recommendations=5):
    """Generate top N recommendations for a user"""
    n_items = interactions.shape[1]
    
    # Get predictions for all items
    scores = model.predict(user_id, np.arange(n_items))
    
    # Get items the user has already interacted with
    known_positives = interactions.tocsr()
    user_interactions = known_positives.getrow(user_id).indices
    
    # Remove already known items from recommendations
    scores[user_interactions] = -np.inf
    
    # Get top N recommendations
    top_items = np.argsort(-scores)[:n_recommendations]
    
    return top_items, scores[top_items]

# Generate recommendations for user 3
user_id = 3
recommendations, scores = get_recommendations(model, user_id, data['train'])

print(f"Top 5 recommendations for user {user_id}:")
for i, (item_id, score) in enumerate(zip(recommendations, scores)):
    print(f"{i+1}. Item ID: {item_id}, Score: {score:.3f}")
Top 5 recommendations for user 3:
1. Item ID: 1189, Score: 2.845
2. Item ID: 1201, Score: 2.734
3. Item ID: 1467, Score: 2.692
4. Item ID: 1122, Score: 2.651
5. Item ID: 1653, Score: 2.598

Evaluating the Recommendation Engine

Once the recommendation engine is built, it's important to evaluate its performance. We can use precision at k and AUC metrics to measure the model's effectiveness ?

from lightfm.evaluation import precision_at_k, auc_score

# Evaluate precision at k (top 10 recommendations)
train_precision = precision_at_k(model, data['train'], k=10).mean()
test_precision = precision_at_k(model, data['test'], k=10).mean()

# Evaluate AUC score
train_auc = auc_score(model, data['train']).mean()
test_auc = auc_score(model, data['test']).mean()

print(f"Train Precision@10: {train_precision:.3f}")
print(f"Test Precision@10: {test_precision:.3f}")
print(f"Train AUC: {train_auc:.3f}")
print(f"Test AUC: {test_auc:.3f}")
Train Precision@10: 0.515
Test Precision@10: 0.142
Train AUC: 0.943
Test AUC: 0.862

Complete Example

Here is the complete working example that demonstrates building and evaluating a recommendation engine ?

import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM
from lightfm.evaluation import precision_at_k, auc_score

# Load the MovieLens dataset
data = fetch_movielens(min_rating=4.0)

# Initialize and train the model
model = LightFM(loss='warp', random_state=42)
model.fit(data['train'], epochs=30, num_threads=2)

# Evaluate the model performance
train_precision = precision_at_k(model, data['train'], k=10).mean()
test_precision = precision_at_k(model, data['test'], k=10).mean()
train_auc = auc_score(model, data['train']).mean()
test_auc = auc_score(model, data['test']).mean()

# Print evaluation results
print('Train Precision@10: {:.3f}'.format(train_precision))
print('Train AUC: {:.3f}'.format(train_auc))
print('Test Precision@10: {:.3f}'.format(test_precision))
print('Test AUC: {:.3f}'.format(test_auc))

# Generate sample recommendations
user_id = 5
n_items = data['train'].shape[1]
scores = model.predict(user_id, np.arange(n_items))
top_items = np.argsort(-scores)[:5]

print(f"\nTop 5 recommendations for user {user_id}:")
for i, item_id in enumerate(top_items):
    print(f"{i+1}. Movie ID: {item_id}")
Train Precision@10: 0.515
Train AUC: 0.943
Test Precision@10: 0.142
Test AUC: 0.862

Top 5 recommendations for user 5:
1. Movie ID: 1467
2. Movie ID: 1189
3. Movie ID: 1201
4. Movie ID: 1122
5. Movie ID: 1653

Key Performance Metrics

Metric Purpose Good Value
Precision@k Measures relevant recommendations in top k Higher is better
AUC Score Measures ranking quality Closer to 1.0 is better
Training vs Test Indicates overfitting Small gap preferred

Conclusion

In this tutorial, we've successfully built a recommendation engine using the LightFM library with the WARP algorithm. The model achieved good performance with an AUC score of 0.86 on test data, demonstrating its effectiveness in ranking movies for users. LightFM provides a powerful and flexible framework for building recommendation systems that can handle both explicit and implicit feedback data.

Updated on: 2026-03-27T14:18:06+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements