Building a Recommendation Engine in Python Using the LightFM library

Python Server Side Programming Programming

Recommendation engines are one of the most popular applications of machine learning in the real world. With the growth of e-commerce, online streaming services, and social media, recommendation engines have become a critical component in providing personalized content and recommendations to users. In this tutorial, we will learn how to build a recommendation engine using the LightFM library.

LightFM is a Python library that allows you to build recommender systems with both explicit and implicit feedback, such as ratings or user interactions. It is a hybrid recommender system that can handle both content-based and collaborative filtering approaches. LightFM is built on top of NumPy, SciPy, and Cython, and provides fast and scalable training of large datasets.

In this tutorial, we will be using the Movielens dataset, which contains 100k movie ratings from 943 users on 1682 movies. Our goal is to build a recommendation engine that can predict movie ratings for users based on their past ratings and similar users' ratings.

Getting Started

Before we dive into using the lightfm library, we first need to install the library using pip.

However, since it does not come built-in, we must first install the lightfm library. This can be done using the pip package manager.

To install the lightfm library, open your terminal and type the following command −

pip install lightfm

This will download and install the lightfm library and its dependencies. Once installed, we can start working with lightfm and leverage it’s modules!

Step 1: Loading the Data

The first step is to load the Movielens dataset into our Python environment. We will be using the built-in Movielens dataset in LightFM, which is available in the datasets module. We can load the Movielens dataset using the following code −

from lightfm.datasets import fetch_movielens
data = fetch_movielens(min_rating=4.0)

The fetch_movielens function downloads the Movielens dataset and returns a dictionary containing the ratings matrix, user features, item features, and other information. We are only interested in the ratings matrix for this tutorial.

Step 2: Building the Recommendation Engine

The next step is to build the recommendation engine using LightFM. We will be using the WARP (Weighted Approximate-Rank Pairwise) algorithm, which is a hybrid recommender system that combines the advantages of both content-based and collaborative filtering approaches. We can build the recommendation engine using the following code −

from lightfm import LightFM
model = LightFM(loss='warp')
model.fit(data['train'], epochs=30, num_threads=2)

The LightFM class initializes the recommendation engine with the WARP loss function. We then fit the model to the training data for 30 epochs using two threads.

Step 3: Generating Recommendations

Now that we have built the recommendation engine, we can use it to generate recommendations for users. We can do this by calling the recommend method of the LightFM model. The recommend method takes two arguments: the user_ids and the item_ids. We can generate recommendations for a single user or multiple users at once. We can also specify the number of recommendations to generate. The following code generates 10 recommendations for user 3 −

user_id = 3
n_items = data['train'].shape[1]
recommendations = model.predict(user_id, np.arange(n_items))
top_items = np.argsort(-recommendations)[:10]

The predict method returns the predicted rating for each item for the given user. We then sort the predicted ratings in descending order and select the top 10 items as recommendations.

Step 4: Evaluating the Recommendation Engine

Once the recommendation engine is built, it's important to evaluate its performance. We can do this by using the precision at k and AUC metrics. The precision at k metric measures the percentage of recommendations that were relevant to the user out of the top k recommendations. The AUC metric measures the area under the curve of the receiver operating characteristic (ROC) curve, which shows the true positive rate versus the false positive rate.

We can use the LightFM library to calculate these metrics. The precision at k can be calculated using the precision_at_k method, and the AUC can be calculated using the auc_score method. Here's an example of how to calculate these metrics −

from lightfm.evaluation import precision_at_k, auc_score

# Train the model
model.fit(interactions, epochs=10)

# Evaluate precision at k
print("Train precision at k:", precision_at_k(model, interactions, k=5).mean())
print("Test precision at k:", precision_at_k(model, test_interactions, k=5).mean())

# Evaluate AUC score
print("Train AUC score:", auc_score(model, interactions).mean())
print("Test AUC score:", auc_score(model, test_interactions).mean())

In this example, we're training the model on the interactions matrix and evaluating the precision at k and AUC metrics on both the training and testing datasets. We're using a value of 5 for k, meaning that we're only considering the top 5 recommendations for each user.

Complete Code

Here is the complete code −

import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM

# Load the MovieLens dataset.
data = fetch_movielens(min_rating=4.0)

# Define the model and fit it to the data.
model = LightFM(loss='warp')
model.fit(data['train'], epochs=30, num_threads=2)

# Evaluate the model on the training data.
train_precision = np.mean(precision_at_k(model, data['train'], k=10, num_threads=2))
train_auc = np.mean(auc_score(model, data['train'], num_threads=2))

# Evaluate the model on the test data.
test_precision = np.mean(precision_at_k(model, data['test'], k=10, num_threads=2))
test_auc = np.mean(auc_score(model, data['test'], num_threads=2))

# Print the evaluation results.
print('Train precision: {:.2f}'.format(train_precision))
print('Train AUC: {:.2f}'.format(train_auc))
print('Test precision: {:.2f}'.format(test_precision))
print('Test AUC: {:.2f}'.format(test_auc))

Output

Train precision: 0.51
Train AUC: 0.94
Test precision: 0.14
Test AUC: 0.86

Conclusion

In this tutorial, we've learned how to build a recommendation engine with the LightFM library. We started by preparing our data and creating an interactions matrix. We then trained the LightFM model and used it to make recommendations. Finally, we evaluated the performance of the recommendation engine using precision at k and AUC metrics.

The LightFM library offers a powerful and flexible way to build recommendation engines, with support for both implicit and explicit feedback data. With its ability to handle large datasets and incorporate side information, it's an excellent choice for many real-world recommendation scenarios. By following the steps in this tutorial, you'll be well on your way to building your own recommendation engine with LightFM.

S Vijay Balaji

Updated on: 31-Aug-2023

325 Views

Kickstart Your Career

Get certified by completing the course

Get Started