Logo

dev-resources.site

for different kinds of informations.

Sarcasm Detection AI Model (97% Accuracy) Trained With Reddit Comments - Part 1

Published at
7/7/2024
Categories
machin
reddit
sarcasm
Author
stevenmathew
Categories
3 categories in total
machin
open
reddit
open
sarcasm
open
Author
12 person written this
stevenmathew
open
Sarcasm Detection AI Model (97% Accuracy) Trained With Reddit Comments - Part 1

I have trained a Sarcasm Detection AI model using Reddit comments. This is how you can do it too.

Requirements:
Google Colab
Reddit API Credentials
Lots of time
Coffee

  1. First we will import the necessary libraries.
import asyncio  # For asynchronous programming in Python.
import asyncpraw  # Python Reddit API Wrapper for asynchronous Reddit API interactions.
import pandas as pd  # Data manipulation and analysis tool.
import nest_asyncio  # Necessary for allowing nested asyncio run loops.
import re  # Regular expressions for pattern matching and text manipulation.
from sklearn.model_selection import train_test_split  # Splits data into training and testing sets.
from sklearn.feature_extraction.text import TfidfVectorizer  # Converts text data into TF-IDF feature vectors.
from sklearn.ensemble import RandomForestClassifier  # Random Forest classifier for machine learning.
from sklearn.metrics import accuracy_score, classification_report  # Metrics for evaluating model performance.
from imblearn.over_sampling import SMOTE  # Oversampling technique for handling class imbalance.
from sklearn.pipeline import Pipeline  # Constructs a pipeline of transformations and estimators.
from sklearn.model_selection import GridSearchCV  # Performs grid search over specified parameter values.
Enter fullscreen mode Exit fullscreen mode
  1. Connecting to Reddit API Get your API credentials from https://www.reddit.com/prefs/apps
`client_id = 'your_client_id'
client_secret = 'your_client_secret'
user_agent = 'MyRedditApp/0.1 by your_username'

reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     user_agent=user_agent)`
Enter fullscreen mode Exit fullscreen mode

This code sets up authentication credentials (client_id, client_secret, user_agent) to create a Reddit API connection using praw. The Reddit object initializes a connection to Reddit's API, allowing the Python script to interact with Reddit, retrieve data, and perform various actions programmatically on the platform.

  1. Initialization and Setup
`nest_asyncio.apply()`
Enter fullscreen mode Exit fullscreen mode

This line ensures that asyncio can be used in a nested manner, which is necessary when using asynchronous operations in environments that already have an event loop running.

Asynchronous Function Definition

`async def collect_reddit_comments(subreddit_name, keyword, limit=1000):
    reddit = asyncpraw.Reddit(
        client_id=client_id,
        client_secret=client_secret,
        user_agent=user_agent
    )`
Enter fullscreen mode Exit fullscreen mode

Defines an asynchronous function collect_reddit_comments to retrieve comments from Reddit. It initializes a Reddit instance using asyncpraw, passing in credentials (client_id, client_secret, user_agent) for API authentication.

Fetching Subreddit and Comments

`subreddit = await reddit.subreddit(subreddit_name)
comments = []
count = 0
after = None`
Enter fullscreen mode Exit fullscreen mode

Asynchronously fetches the subreddit object based on subreddit_name. Initializes an empty list comments to store comment data, and sets counters (count) and pagination marker (after) for comment retrieval.

Looping Through Submissions and Comments

`while len(comments) < limit:
    try:
        async for submission in subreddit.search(keyword, limit=None, params={'after': after}):
            await submission.load()
            submission.comment_limit = 0
            submission.comments.replace_more(limit=0)`
Enter fullscreen mode Exit fullscreen mode

Explanation: Enters a loop to fetch submissions matching keyword within the specified subreddit. Asynchronously loads submission details and retrieves all comments for each submission, handling cases where more comments are nested (replace_more).

Collecting and Storing Comments

           ` for comment in submission.comments.list():
                if isinstance(comment, asyncpraw.models.Comment):
                    author_name = comment.author.name if comment.author else '[deleted]'
                    comments.append([comment.body, author_name, comment.created_utc])
                    count += 1

                    if count >= limit:
                        break

            after = submission.id  # Sets the 'after' parameter for pagination

            if count >= limit:
                break`
Enter fullscreen mode Exit fullscreen mode

Iterates through each comment in the submission, checking if it's a valid comment. Collects comment details such as body, author name, and creation time (created_utc). Controls the loop with count and limit to ensure the specified number of comments (limit) is collected.

Handling API Exceptions

    `except asyncpraw.exceptions.APIException as e:
        print(f"API exception occurred: {e}")
        wait_time = 60  # Wait for 1 minute before retrying
        print(f"Waiting for {wait_time} seconds before retrying...")
        await asyncio.sleep(wait_time)`
Enter fullscreen mode Exit fullscreen mode

Catches and handles API exceptions that may occur during Reddit API interactions. Prints the exception message, waits for a minute (wait_time) before retrying, and then resumes fetching comments.

Returning Results

`return comments[:limit]`  # Returns up to 'limit' number of comments
Enter fullscreen mode Exit fullscreen mode

Returns a list of collected comments, limited by the specified limit, ensuring only the required number of comments are returned.

Main Function to Execute Collection

async def main():
    comments = await collect_reddit_comments('sarcasm', 'sarcastic', limit=5000)  # Adjust limit as needed
    df = pd.DataFrame(comments, columns=['comment', 'author', 'created_utc'])
    df.to_csv('reddit_comments.csv', index=False)
    print(f"Total comments collected: {len(df)}")
    print(df.head())
Enter fullscreen mode Exit fullscreen mode

Defines an asynchronous main function to orchestrate the comment collection process. Calls collect_reddit_comments with parameters subreddit_name='sarcasm', keyword='sarcastic', and limit=5000 (can be adjusted). Converts collected comments into a Pandas DataFrame (df), stores it as a CSV file (reddit_comments.csv), and prints summary information about the collected data.

Running the Main Function

`await main()`
Enter fullscreen mode Exit fullscreen mode

Executes the main function asynchronously, initiating the process of collecting Reddit comments, processing them into a DataFrame, saving them to a CSV file, and providing feedback on the number of comments collected and a preview of the data.

Read the Part 2 - Sarcasm Detection From Reddit Comments : Cleaning & Saving The Data

GITHUB: https://github.com/stevie1mat/Sarcasm-Detection-With-Reddit-Comments

Author: Steven Mathew

reddit Article's
30 articles in total
Favicon
How to mass delete Reddit comments (2024)
Favicon
Reddit Content Cleaner
Favicon
Making Money on Reddit: Your Step-by-Step Guide to Turning Time into Dollars
Favicon
p2p services radar, peoples around you, services around you
Favicon
The Unfolding Drama of $Early: A Meme Coin Saga with an Unstoppable Community
Favicon
Building Subreddit Signals: The Tool I Needed to Conquer Reddit Lead Generation
Favicon
Lambdas, Loops, and Dota2 Feels
Favicon
How to Post to Reddit Using Python
Favicon
Why the upvote system is a pyramid scheme
Favicon
Building a Node.js Wrapper for Reddit API: A Step-by-Step Guide
Favicon
Join the NBA YoungBoy Merch Community on Reddit!
Favicon
Sarcasm Detection AI Model (97% Accuracy) Trained With Reddit Comments - Part 1
Favicon
How to Automatically Approve All Posts in Your Reddit Subreddit
Favicon
Self-promote on Reddit without gettingย banned
Favicon
I parsed 968 launches from /r/SideProject and analyzed them with Claude 3 Opus
Favicon
Reddit content deal with Google boosts its IPO plans
Favicon
How to mass import YouTube videos into a Reddit subreddit [Python]
Favicon
Introducing ReddAPI, Your Ultimate Programmable Gateway
Favicon
Reddit: Action
Favicon
How to Scrape Reddit data
Favicon
Using golang to filter through reddit posts
Favicon
Collection of tools to view, search and create Reddit archives
Favicon
A Community-Driven Data Exploration Journey: Airbnb Property Data & Bright Data
Favicon
How to run a Nostr relay with nostream
Favicon
What type of College Degree is Necessary for using Reddit today?
Favicon
InterviewBible - Reddit community about Interviewing
Favicon
Read Hackernews and Reddit the Emacs way
Favicon
Analyzing My Reddit Usage: a data-driven approach to achieving my New Yearโ€™s Resolution of reducing my online time
Favicon
Visualizing and Analyzing Reddit in Real-Time With Kafka and Memgraph
Favicon
Reddit Social Listening with Python

Featured ones: