Logo

dev-resources.site

for different kinds of informations.

Building an Anime Recommendation System with PySpark in SageMaker

Published at
3/17/2024
Categories
pyspark
sagemarker
aws
demo
Author
akin
Categories
4 categories in total
pyspark
open
sagemarker
open
aws
open
demo
open
Author
4 person written this
akin
open
Building an Anime Recommendation System with PySpark in SageMaker

Demonstration of an Anime Recommendation System with PySpark in SageMaker v1.0.0-alpha01

Understanding the Dataset

Our journey begins with understanding the dataset. We will be using the MyAnimeList dataset sourced from Kaggle, which contains valuable information about anime titles, user ratings, and more. This dataset will serve as the foundation for our recommendation system.

Preprocessing the Data

Before diving into model building, we will preprocess the dataset to ensure it is clean and structured for analysis. While the preprocessing steps have already been completed, we will briefly discuss the importance of data preprocessing in the context of recommendation systems.

for detailed preprocessing check out this notebook
anime-recommendation-system.ipynb

TODO: Add all preprocessing step-by-step with explanations


Visualization

Visualizing the preprocessed data can provide valuable insights into the distribution of anime ratings, user preferences, and other patterns. We will utilize various visualization techniques to gain a better understanding of our dataset.

You can find data visualization techniques applied to better understand the content of the data in anime-recommendation-system.ipynb. You can find one of them below as example.

image

TODO: Add all visualizations step-by-step with explanations and insights


Recommendation System

We employed the Alternating Least Squares (ALS) algorithm to build the recommendation system using PySpark. We used the mean square error algorithm to find the best model. We calculated how accurately it measured the real value by training the data allocated for the test and estimating the remaining data.

Since these parameters had the lowest mean square error, we created a model with these parameters to obtain more accurate results.

rank, iter, lambda_ = 50, 10, 0.1
model = ALS.train(rating, rank=rank, iterations=iter, lambda_=lambda_, seed=5047)
Enter fullscreen mode Exit fullscreen mode

Cloud Infrastructure Setup with Terraform

First of all we need to upload preproocessed data to a s3 bucket. Use the following commands. You can reach to upload.py in there.

$ cd sagemaker
$ python upload_data.py -n 'anime-recommendation-system' -r 'eu-central-1' -f '../preprocessed_data'

../preprocessed_data\user.csv  4579798 / 4579798.0  (100.00%)00%)
Enter fullscreen mode Exit fullscreen mode

After that run terraform commands to create an Amazon SageMaker Notebook Instance:

instance.tf

$ terraform init
Initializing the backend...

Initializing provider plugins...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/aws v5.41.0...
- Installed hashicorp/aws v5.41.0 (signed by HashiCorp)
...
$ terraform plan
...
Plan: 4 to add, 0 to change, 0 to destroy.
$ terraform apply --auto-approve
aws_iam_policy.sagemaker_s3_full_access: Creating...
aws_iam_role.sagemaker_role: Creating...
aws_iam_policy.sagemaker_s3_full_access: Creation complete after 1s [id=arn:aws:iam::749270828329:policy/SageMaker_S3FullAccessPoliciy]
aws_iam_role.sagemaker_role: Creation complete after 1s [id=AnimeRecommendation_SageMakerRole]
aws_iam_role_policy_attachment.sagemaker_s3_policy_attachment: Creating...
aws_sagemaker_notebook_instance.notebookinstance: Creating...
aws_iam_role_policy_attachment.sagemaker_s3_policy_attachment: Creation complete after 1s [id=AnimeRecommendation_SageMakerRole-20240317120654686300000001]
aws_sagemaker_notebook_instance.notebookinstance: Still creating... [10s elapsed]
...
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Enter fullscreen mode Exit fullscreen mode

Create a new conda_python3 notebook and Run sagemaker-anime-recommendation-system.ipynb step by step on the Notebook Instance

You can look at the sagemaker-anime-recommendation-system-test.html file to see the output.

In this code, model data is saved locally and then uploaded properly to an s3 bucket.

model.save(SparkContext.getOrCreate(), 'model')

import boto3
import os

# Initialize S3 client
s3 = boto3.client('s3')

# Upload files to the created bucket
bucketname = 'anime-recommendation-system'
local_directory = './model'
destination = 'model/'
for root, dirs, files in os.walk(local_directory):
    for filename in files:
        # construct the full local path
        local_path = os.path.join(root, filename)

        relative_path = os.path.relpath(local_path, local_directory)
        s3_path = os.path.join(destination, relative_path)

        s3.upload_file(local_path, bucketname, s3_path)
Enter fullscreen mode Exit fullscreen mode

image

image

To load and test the model, run the test_another_instance.ipynb jupyter notebook

In this code, model data is retrieved from the s3 bucket and loaded using the Matrix Factorization Model

import boto3
import os 

s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket('anime-recommendation-system') 
for obj in bucket.objects.filter(Prefix = 'model'):
    if not os.path.exists(os.path.dirname(obj.key)):
        os.makedirs(os.path.dirname(obj.key))
    bucket.download_file(obj.key, obj.key)

from pyspark import SparkContext
from pyspark.mllib.recommendation import MatrixFactorizationModel

m = MatrixFactorizationModel.load(SparkContext.getOrCreate(), 'model')
Enter fullscreen mode Exit fullscreen mode

As a result, I explained how to create the model by loading the model data into each separate environment, making the model easier to use, and processing it with SageMaker's notebook instance. See you in another article...πŸ‘‹πŸ‘‹

demo Article's
30 articles in total
Favicon
Using Disposable Emails for a Demo
Favicon
Profylix
Favicon
Demo: Automating GitHub Repo Configuration and Security with Minder
Favicon
How to do great live demos β€” and why they’re important to get right
Favicon
Flems.io
Favicon
Three Tips for Your Next (Software) Demo
Favicon
Using Valibot for Recursive Schema Validation
Favicon
Demo: Minder, a software supply chain security platform from Stacklok
Favicon
asdsadasd
Favicon
Building an Anime Recommendation System with PySpark in SageMaker
Favicon
Demo Highlight: Acey-Deucey
Favicon
Demo Highlight: Asteroids
Favicon
Writting Simple OnceCell
Favicon
How to build a demo project
Favicon
Playing around with Ultra HDR
Favicon
Odoo CRM Demo
Favicon
Odoo Demo
Favicon
demo
Favicon
Demo Highlight: 2dVis
Favicon
Pimcore Demo installed locally but can not log into Admin using the credentials set at the time of installation
Favicon
Learn how to access webcam and take photo with JavaScript
Favicon
Futuristic Dial Button ☎
Favicon
Python SEO Keyword Research Tool: Google Autocomplete, People Also Ask and Related Searches
Favicon
Enterprise Power BI Series
Favicon
Dropshipping Research Tool Demo in Python
Favicon
How to run Apache SkyWalking on AWS EKS and RDS/Aurora
Favicon
New js framework
Favicon
Top 10 trending github repos for Java developers in this week🍸.
Favicon
Floating Islands WebGL demo πŸ‡ΊπŸ‡¦
Favicon
ReactJS Demo Project - Party Planner Web App - Github

Featured ones: