Recommender Systems – II

Introduction

Next

In Part II, we will:

  • And one modern method:
    • Deep Learning Recommender Model (DLRM)
  • Reflect on the impact of recommender systems

This section draws heavily on

  • Deep Learning Recommender Model for Personalization and Recommendation Systems, (Naumov et al. 2019)

Deep Learning for Recommender Systems

Deep Learning for Recommender Systems

Besides the Collaborative Filtering and Matrix Factorization models, other popular approaches to building recommender systems use Deep Learning.

We’ll look at the Deep Learning Recommender Model (DLRM) proposed by Facebook in 2019 (Naumov et al. 2019) with GitHub repository.

DLRM Architecture

  • Components (Figure 1):
    1. Embeddings: Dense representations for categorical data.
    2. Bottom MLP: Transforms dense continuous features.
    3. Feature Interaction: Dot-product of embeddings and dense features.
    4. Top MLP: Processes interactions and outputs probabilities.

Let’s look at each of these components in turn.

Embeddings

Embeddings: Map categorical inputs to latent factor space.

  • A learned embedding matrix \(W \in \mathbb{R}^{m \times d}\) for each category of input
  • One-hot vector \(e_i\) with \(i\text{-th}\) entry 1 and rest are 0s
  • Embedding of \(e_i\) is \(i\text{-th}\) row of \(W\), i.e., \(w_i^T = e_i^T W\)

We can also use weighted combination of multiple items with a multi-hot vector of weights \(a^T = [0, ..., a_{i_1}, ..., a_{i_k}, ..., 0]\).

The embedding of this multi-hot vector is then \(a^T W\).

DLRM Architecture

DLRM Architecture

PyTorch has a convenient way to do this using EmbeddingBag, which besides summing can combine embeddings via mean or max pooling.

Here’s an example with 5 embeddings of dimension 3:

Code
import torch
import torch.nn as nn

# Example embedding matrix: 5 embeddings, each of dimension 3
embedding_matrix = nn.EmbeddingBag(num_embeddings=5, embedding_dim=3, mode='mean')

# Input: Indices into the embedding matrix
input_indices = torch.tensor([1, 2, 3, 4])  # Flat list of indices
offsets = torch.tensor([0, 2])  # Start new bag at position 0 and 2 in input_indices

# Forward pass
output = embedding_matrix(input_indices, offsets)

print("Embedding Matrix:\n", embedding_matrix.weight)
print("Output:\n", output)
Embedding Matrix:
 Parameter containing:
tensor([[ 0.7934, -1.0148,  0.7098],
        [-0.0186, -0.0246, -1.0278],
        [-0.3661,  0.1436,  1.3483],
        [ 2.1198, -0.3008,  0.4787],
        [-1.0232,  1.6579,  0.4457]], requires_grad=True)
Output:
 tensor([[-0.1923,  0.0595,  0.1602],
        [ 0.5483,  0.6786,  0.4622]], grad_fn=<EmbeddingBagBackward0>)

Dense Features

The advantage of the DLRM architecture is that it can take continuous features as input such as the user’s age, time of day, etc.

There is a bottom MLP that transforms these dense features into a latent space of the same dimension \(d\).

DLRM Architecture

DLRM Architecture

Optional Sparse Feature MLPs

Optionally, one can add MLPs to transform the sparse features as well.

DLRM Architecture

DLRM Architecture

Feature Interactions

The 2nd order interactions are modeled via dot-products of all pairs from the collections of embedding vectors and processed dense features.

The results of the dot-product interactions are concatenated with the processed dense vectors.

DLRM Architecture

DLRM Architecture

Top MLP

The concatenated vector is then passed to a final MLP and then to a sigmoid function to produce the final prediction (e.g., probability score of recommendation)

This entire model is trained end-to-end using standard deep learning techniques.

DLRM Architecture

DLRM Architecture

Training Results

Figure 2: DLRM Training Results

Figure 2 shows the training (solid) and validation (dashed) accuracies of DLRM on the Criteo Ad Kaggle dataset.

Accuracy is compared with Deep and Cross network (DCN) (Wang et al. 2017).

Other Modern Approaches

There are many other modern approaches to recommender systems for example:

  1. Graph-Based Recommender Systems:
    • Leverage graph structures to capture relationships between users and items.
    • Use techniques like Graph Neural Networks (GNNs) to enhance recommendation accuracy.
  2. Context-Aware Recommender Systems:
    • Incorporate contextual information such as time, location, and user mood to provide more personalized recommendations.
    • Contextual data can be integrated using various machine learning models.
  1. Hybrid Recommender Systems:
    • Combine multiple recommendation techniques, such as collaborative filtering and content-based filtering, to improve performance.
    • Aim to leverage the strengths of different methods while mitigating their weaknesses.
  2. Reinforcement Learning-Based Recommender Systems:
    • Use reinforcement learning to optimize long-term user engagement and satisfaction.
    • Models learn to make sequential recommendations by interacting with users and receiving feedback.

These approaches often leverage advancements in machine learning and data processing to provide more accurate and personalized recommendations.

See (Ricci, Rokach, and Shapira 2022) for a comprehensive overview of recommender systems.

Impact of Recommender Systems

Filter Bubbles

There are a number of concerns with the widespread use of recommender systems and personalization in society.

First, recommender systems are accused of creating filter bubbles.

A filter bubble is the tendency for recommender systems to limit the variety of information presented to the user.

The concern is that a user’s past expression of interests will guide the algorithm in continuing to provide “more of the same.”

This is believed to increase polarization in society, and to reinforce confirmation bias.

Maximizing Engagement

Second, recommender systems in modern usage are often tuned to maximize engagement.

In other words, the objective function of the system is not to present the user’s most favored content, but rather the content that will be most likely to keep the user on the site.

The incentive to maximize engagement arises on sites that are supported by advertising revenue.

More engagement time means more revenue for the site.

Extreme Content

However, many studies have shown that sites that strive to maximize engagement do so in large part by guiding users toward extreme content:

  • content that is shocking,
  • or feeds conspiracy theories,
  • or presents extreme views on popular topics.

Given this tendency of modern recommender systems, for a third party to create “clickbait” content such as this, one of the easiest ways is to present false claims.

Methods for addressing these issues are being very actively studied at present.

Ways of addressing these issues can be:

  • via technology
  • via public policy

Recap and References

BU CS/CDS Research

You can read about some of the work done in Professor Mark Crovella’s group on this topic:

Recap

  • Introduction to recommender systems and their importance in modern society.
  • Explanation of collaborative filtering (CF) and its two main approaches: user-user similarity and item-item similarity.
  • Discussion on the challenges of recommender systems, including scalability and data sparsity.
  • Introduction to matrix factorization (MF) as an improvement over CF, using latent vectors and alternating least squares (ALS) for optimization.
  • Practical implementation of ALS for matrix factorization on a subset of Amazon movie reviews.
  • Review of Deep Learning Recommender Model (DLRM) architecture and its components.
  • Discussion on the societal impact of recommender systems, including filter bubbles and engagement maximization.

References

Naumov, Maxim et al. 2019. “Deep Learning Recommendation Model for Personalization and Recommendation Systems.” arXiv Preprint arXiv:1906.00091, May. http://arxiv.org/abs/1906.00091.
Rastegarpanah, Bashir, Krishna P. Gummadi, and Mark Crovella. 2019. “Fighting Fire with Fire: Using Antidote Data to Improve Polarization and Fairness of Recommender Systems.” In Proceedings of WSDM. http://www.cs.bu.edu/faculty/crovella/paper-archive/wsdm19-antidote-data.pdf.
Ricci, Francesco, Lior Rokach, and Bracha Shapira, eds. 2022. Recommender Systems Handbook. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-2197-4.
Spinelli, Larissa, and Mark Crovella. 2017. “Closed-Loop Opinion Formation.” In Proceedings of the 9th International ACM Web Science Conference (WebSci). http://www.cs.bu.edu/faculty/crovella/paper-archive/netsci17-filterbubble.pdf.
———. 2020. “How YouTube Leads Privacy-Seeking Users Away from Reliable Information.” In Proceedings of the Workshop on Fairness in User Modeling, Adaptation, and Personalization (FairUMAP). http://www.cs.bu.edu/faculty/crovella/paper-archive/youtube-fairumap20.pdf.
Wang, Ruoxi, Bin Fu, Gang Fu, and Mingliang Wang. 2017. “Deep & Cross Network for Ad Click Predictions.” In Proc. ADKDD, 12.
Back to top