dev-resources.site
for different kinds of informations.
Crawler Web dev.to using Colly when learning Golang
Published at
11/1/2022
Categories
go
nginx
crawler
colly
Author
chieund
Author
7 person written this
chieund
open
I would like to recommend a website of mine that I made during my Golang learning.
My website http://techdaily.info is for learning golang language.
Besides crawling dev.to, I also crawl some other websites like freecodecamp.com, medium.com, hashnode.com, logrocket.com, infoq.com
So I built a website that specializes in crawling other sites
some technology that i used.
- Golang
- Colly
- Nginx
- Service
- Docker
- Mysql
- Run action deploy to server
- Cronjob daily crawl
Build Run Local
Change file app_example.yaml to app.yaml
cp app_example.yaml app.yaml
Build Docker
docker-compose up --build
Install package Golang
docker-compose exec crawl go mod tidy
Folder vendor
docker-compose exec crawl go mod vendor
Run Crawl
docker-compose exec crawl go run cmd/main.go
Use air autoload
docker-compose exec crawl air -c .air.conf
Deploy
Run file makefile build project into folder bin
make copy_template build_app_web build_app_crawl
Create Services in run in background
Create Service and Run App Web
sudo nano /lib/systemd/system/app_web.service
Copy Content
[Unit]
Description=App Web
[Service]
Type=simple
Restart=always
RestartSec=5s
WorkingDirectory=/root/actions-runner/crawl/crawl/crawl/bin
ExecStart=/root/actions-runner/crawl/crawl/crawl/bin/app_web
[Install]
WantedBy=multi-user.target
sudo systemctl enable app_web
sudo systemctl start app_web
sudo systemctl status app_web
Run App Crawl
./app_crawl
Add CronTab
crontab -e
add cron time
*/60 * * * * /root/actions-runner/crawl/crawl/crawl/bin/app_crawl crawl-article
*/20 * * * * /root/actions-runner/crawl/crawl/crawl/bin/app_crawl crawl-article-detail
Reload cron run
sudo service cron reload
Website
crawler Article's
30 articles in total
The best web crawler tools in 2025
read article
Proxy IP and crawler anomaly detection make data collection more stable and efficient
read article
Session management of proxy IP in crawlers
read article
How Crawler IP Proxies Enhance Competitor Analysis and Market Research
read article
How to configure Swiftproxy proxy server in Puppeteer?
read article
Common web scraping roadblocks and how to avoid them
read article
什么是网络爬虫及其工作原理?
read article
网络爬虫架构设计
read article
Traditional crawler or AI-assisted crawler? How to choose?
read article
AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?
read article
Building a README Crawler With Node.js
read article
The Ultimate Instagram Scraping API Guide for 2024
read article
How to efficiently scrape millions of Google Businesses on a large scale using a distributed crawler
read article
A Step-by-Step Guide to Building a Scalable Distributed Crawler for Scraping Millions of Top TikTok Profiles
read article
Python爬虫如何爬wss数据
read article
Web Crawler in Action: How to use Webspot to implement automatic recognition and data extraction of list web pages
read article
Web Scraping vs. Crawling: What’s the Difference?
read article
Crawler Web dev.to using Colly when learning Golang
currently reading
Glue Crawlers: No GetObject, No Problem
read article
Simple tool crawl urls form domain
read article
用 2Captcha 通過 CAPTCHA 人機驗證
read article
The Difference Between Web Scraping vs Web Crawling
read article
Design a Web Crawler
read article
Build A Web Crawler To Find Any Broken Links on Your Site with Python & BeautifulSoup
read article
DRUM
read article
15 Best Website Downloaders & Website Copier – Save website locally to read offline
read article
Google News | Crawler
read article
[Beginner] How to build Youtube video crawler web application with Rails 6, Devise, Nokogiri and Bootstrap 4?
read article
TYPO3 Crawler with TYPO3 9 & 10 Support
read article
How to generate a Laravel sitemaps on the fly?
read article
Featured ones: