Logo

dev-resources.site

for different kinds of informations.

Collection of tools to view, search and create Reddit archives

Published at
10/28/2023
Categories
reddit
Author
mohammadtaseenkhan
Categories
1 categories in total
reddit
open
Author
18 person written this
mohammadtaseenkhan
open
Collection of tools to view, search and create Reddit archives

redarc

Collection of tools to view, search and create Reddit archives

A self-hosted solution to search, view and create your own Reddit archives.

Features:

  • Ingest pushshift dumps
  • View threads/comments
  • Fulltext search via PostgresFTS
  • Submit threads to be archived via API (Completely untested. Developed with mock data and the PRAW documentation)
  • Periodically fetch rising, new and hot threads from specified subreddits
  • Download i.redd.it images from threads.

Please abide by the Reddit Terms of Service and User Agreement if you are using their API

Collection of tools to view, search and create Reddit archives Collection of tools to view, search and create Reddit archives

Download pushshift dumps

https://the-eye.eu/redarcs/

Enter fullscreen mode Exit fullscreen mode

All data 2005-06 to 2022-12:

magnet:?xt=urn:btih:7c0645c94321311bb05bd879ddee4d0eba08aaee&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

Enter fullscreen mode Exit fullscreen mode

Top 20,000 subreddits:

magnet:?xt=urn:btih:c398a571976c78d346c325bd75c47b82edf6124e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

Enter fullscreen mode Exit fullscreen mode

Installation:

Master branch is unstable. Please checkout a release

Docker

Install Docker: https://docs.docker.com/engine/install

Services:

  • postgres: Main database for threads, comments and subreddits
  • postgres_fts: Database for full-text searching
  • redarc: API backend and React frontend
    • Requires: redis, reddit_worker if INGEST_ENABLED
  • redis: Required for any service that uses a task queue
  • image_downloader: Asynchronously downloads images from Reddit if DOWNLOAD_IMAGES
    • Requires: redis, reddit_worker
  • index_worker: Indexes threads/comments into postgres_fts
    • Requires: postgres_fts and postgres
  • reddit_worker: Asynchronously fetches threads/comments from Reddit
    • Requires: redis, image_downloader
  • subreddit_worker: Asynchronously fetches hot/new/rising thread IDs from subreddits
    • Requires: reddit_worker and redis

If you wish to change the postgres password, make sure POSTGRES_PASSWORD and PGPASSWORD are the same.

If you are using redarc on your personal machine, set docker envars REDARC_API=http://localhost/api and SERVER_NAME=localhost.

REDARC_API is the URL of your API server; it must end with /apieg: http://redarc.mysite.org/api.

REDARC_FE_API is the URL of the API server you want the frontend to send requests to. If you are not using a reverse proxy, it should be the same as REDARC_API.

SERVER_NAME is the URL your redarc instance is running on. eg: redarc.mysite.org

Setting an INGEST_PASSWORD and ADMIN_PASSWORD in your API is highly recommended to prevent abuse.

IMAGE_PATH is the path you want image_downloader worker to download images. This is the same path the API backend fetches images from.

INDEX_DELAY is how often you want index_worker to index comments/threads

SUBREDDITS is a list of subreddits you want subreddit_worker to fetch threads from. It is delimited by commas

FETCH_DELAY is how often you subreddit_worker to fetch threads.

NUM_THREADS is the number of threads you want downloaded from hot, rising or new.

Docker compose (Recommended):

Docker compose:

Modify envars as needed

$ git clone https://github.com/Yakabuff/redarc.git
$ cd redarc
$ git fetch --all --tags
$ git checkout tags/v0.5.1 -b v0.5.1
// Modify .env as-needed
$ cp default.env .env
$ docker compose up -d

Enter fullscreen mode Exit fullscreen mode

Manual installation:

$ git clone https://github.com/Yakabuff/redarc.git
$ cd redarc

Enter fullscreen mode Exit fullscreen mode

1) Provision Postgres database

$ docker pull postgres
$ docker run \
  --name pgsql-dev \
  -e POSTGRES_PASSWORD=test1234 \
  -d \
  -v postgres-docker:/var/lib/postgresql/data \
  -p 5432:5432 postgres 


$ docker run \
  --name pgsql-fts \
  -e POSTGRES_PASSWORD=test1234 \
  -d \
  -v postgresfts-docker:/var/lib/postgresql/data \
  -p 5433:5432 postgres 


psql -h localhost -U postgres -a -f scripts/db_submissions.sql
psql -h localhost -U postgres -a -f scripts/db_comments.sql
psql -h localhost -U postgres -a -f scripts/db_subreddits.sql
psql -h localhost -U postgres -a -f scripts/db_submissions_index.sql
psql -h localhost -U postgres -a -f scripts/db_comments_index.sql
psql -h localhost -U postgres -a -f scripts/db_status_comments.sql
psql -h localhost -U postgres -a -f scripts/db_status_comments_index.sql
psql -h localhost -U postgres -a -f scripts/db_status_submissions.sql
psql -h localhost -U postgres -a -f scripts/db_status_submissions_index.sql
psql -h localhost -U postgres -p 5433 -a -f scripts/db_fts.sql
psql -h localhost -U postgres -a -f scripts/db_progress.sql

Enter fullscreen mode Exit fullscreen mode

2) Process dump and insert rows into postgres database with the load_sub/load_comments scripts

Note: Be sure the ingest and Reddit workers are disabled

python3 scripts/load_sub.py <path_to_submission_file>
python3 scripts/load_comments.py <path_to_comment_file>
python3 scripts/load_sub_fts.py <path_to_submission_file>
python3 scripts/load_comments_fts.py <path_to_comment_file>
python3 scripts/index.py [subreddit_name]
python3 scripts/unlist.py <subreddit> <true|false>

Enter fullscreen mode Exit fullscreen mode

3) Start the API server.

$ cd api
$ python -m venv venv
$ source venv/bin/activate
$ pip install gunicorn
$ pip install falcon
$ pip install rq
$ pip install python-dotenv
$ pip install psycopg2-binary
$ gunicorn app

Enter fullscreen mode Exit fullscreen mode

4) Start the frontend

cd ../redarc-frontend
mv sample.env .env

Enter fullscreen mode Exit fullscreen mode

Set address for API server in the .env file

VITE_API_DOMAIN=http://my-api-server.com/api/


npm i
npm run dev // Dev server

Enter fullscreen mode Exit fullscreen mode

5) Provision NGINX (Optional)

Edit nginx/nginx_original.conf with your own values

$ cd ..
$ mv nginx/redarc_original.conf /etc/nginx/conf.d/redarc.conf


cd redarc-frontend
npm run build 
cp -R dist/* /var/www/html/redarc/
systemctl restart nginx

Enter fullscreen mode Exit fullscreen mode

6) Setup submission workers

Fill in .env files with your own credentials.

$ docker pull redis
$ docker run --name some-redis -d redis
$ cd redarc/ingest
$ python -m venv venv
$ source venv/bin/activate
$ pip install rq
$ pip install python-dotenv
$ pip install praw
$ pip install psycopg2-binary
$ pip install gallery-dl
$ python3 ingest/reddit_worker/reddit_worker.py
$ python3 ingest/index_worker/index_worker.py
$ python3 ingest/subreddit_worker/subreddit_worker.py
$ python3 ingest/image_downloader/image_downloader.py

Enter fullscreen mode Exit fullscreen mode

Ingest data:

Postgres:

Note: Be sure the ingest and Reddit workers are disabled

Ensure python3, pip and psycopg2-binary are installed:

# Decompress dumps

$ unzstd <submission_file>.zst

$ unzstd <comment_file>.zst

$ pip install pyscopg2-binary

# Change database credentials if needed

$ python3 scripts/load_sub.py <path_to_submission_file>

$ python3 scripts/load_sub_fts.py <path_to_submission_file>

$ python3 scripts/load_comments.py <path_to_comment_file>

$ python3 scripts/load_comments_fts.py <path_to_comment_file>

$ python3 scripts/index.py [subreddit_name]

# Optional
$ python3 scripts/unlist.py <subreddit> <true|false>
$ python3 scripts/backfill_images.py <subreddit> <after timestamp utc> <num urls>

Enter fullscreen mode Exit fullscreen mode

Web:

  • Submit Reddit URL using the web form /submit to be fetched by reddit_worker
  • Add subreddits to the SUBREDDITS envar (delimited by commas) to be periodically fetched by subreddit_worker

API:

search/comments?

  • [unflatten = <True/False>]
  • [subreddit = <name>]
  • [id = <id>]
  • [before = <utc_timestamp>]
  • [after = <utc_timestamp>]
  • [parent_id = <parent_id>]
  • [link_id = <link_id>]
  • [sort = <ASC/DESC>]

search/submissions?

  • [subreddit = <name>]
  • [id = <id>]
  • [before = <utc_timestamp>]
  • [after = <utc_timestamp>]
  • [sort = <ASC|DESC>]

search/subreddits

search?

  • <subreddit = <subreddit>>
  • [before = <unix timestamp>]
  • [after = <unix timestamp>]
  • [sort = <asc|desc>]
  • [query = <seach phrase>]
  • <type = <comment|submission>>

License:

Redarc is licensed under the MIT license

GitHub

View Github

reddit Article's
30 articles in total
Favicon
How to mass delete Reddit comments (2024)
Favicon
Reddit Content Cleaner
Favicon
Making Money on Reddit: Your Step-by-Step Guide to Turning Time into Dollars
Favicon
p2p services radar, peoples around you, services around you
Favicon
The Unfolding Drama of $Early: A Meme Coin Saga with an Unstoppable Community
Favicon
Building Subreddit Signals: The Tool I Needed to Conquer Reddit Lead Generation
Favicon
Lambdas, Loops, and Dota2 Feels
Favicon
How to Post to Reddit Using Python
Favicon
Why the upvote system is a pyramid scheme
Favicon
Building a Node.js Wrapper for Reddit API: A Step-by-Step Guide
Favicon
Join the NBA YoungBoy Merch Community on Reddit!
Favicon
Sarcasm Detection AI Model (97% Accuracy) Trained With Reddit Comments - Part 1
Favicon
How to Automatically Approve All Posts in Your Reddit Subreddit
Favicon
Self-promote on Reddit without gettingย banned
Favicon
I parsed 968 launches from /r/SideProject and analyzed them with Claude 3 Opus
Favicon
Reddit content deal with Google boosts its IPO plans
Favicon
How to mass import YouTube videos into a Reddit subreddit [Python]
Favicon
Introducing ReddAPI, Your Ultimate Programmable Gateway
Favicon
Reddit: Action
Favicon
How to Scrape Reddit data
Favicon
Using golang to filter through reddit posts
Favicon
Collection of tools to view, search and create Reddit archives
Favicon
A Community-Driven Data Exploration Journey: Airbnb Property Data & Bright Data
Favicon
How to run a Nostr relay with nostream
Favicon
What type of College Degree is Necessary for using Reddit today?
Favicon
InterviewBible - Reddit community about Interviewing
Favicon
Read Hackernews and Reddit the Emacs way
Favicon
Analyzing My Reddit Usage: a data-driven approach to achieving my New Yearโ€™s Resolution of reducing my online time
Favicon
Visualizing and Analyzing Reddit in Real-Time With Kafka and Memgraph
Favicon
Reddit Social Listening with Python

Featured ones: