dev-resources.site
for different kinds of informations.
Python爬虫如何爬wss数据
Published at
5/6/2023
Categories
python
crawler
websocket
Author
dragon72463399
Author
14 person written this
dragon72463399
open
目标站点
https://dexscreener.com
目标数据
列表页数据
问题
- 看到网上有不少推荐使用该库的
aiowebsocket
,测试下来该库无法实现,也是耽误了我一天的时间,各种调试,一开始只是以为是我没有过风控的问题,其实并不是,问题在库上;该项目已经4年没有更新了,废掉了,不建议使用
# from aiowebsocket.converses import AioWebSocket
#https://github.com/asyncins/aiowebsocket
- Sec-WebSocket-Key 加密参数;该参数是根据一定规则随机生成的,只要按照规则生成即可,不按规则就无法连接成功; 代码示例
function generateWebSocketKey() {
// 生成一个16字节的随机值
const buffer = new Uint8Array(16);
window.crypto.getRandomValues(buffer);
// 将随机值进行Base64编码
const key = btoa(String.fromCharCode.apply(null, buffer));
return key;
}
import os
import base64
def generate_websocket_key():
# 生成一个16字节的随机值
buffer = os.urandom(16)
# 将随机值进行Base64编码
key = base64.b64encode(buffer).decode('utf-8')
return key
print(generate_websocket_key())
- headers 是需要对UA进行校验的
解决代码示例
class ListPage:
def __init__(self):
self.base_url = 'wss://io.dexscreener.com/dex/screener/pairs/h24/{}?rankBy[key]=txns&rankBy[order]=desc&filters[liquidity][min]=100000&filters[marketCap][min]=200000&filters[txns][h24][min]=100'
def generate_websocket_key(self):
# 生成一个16字节的随机值
buffer = os.urandom(16)
# 将随机值进行Base64编码
key = base64.b64encode(buffer).decode('utf-8')
return key
def open_connection(self,num):
"""
建立连接
"""
header = {
# 用户唯一性校验+风控识别校验,可以理解为mac地址的作用,该值有一定的规范,是随机生成的,但是得符合一定的标准,在标准之外的随机数是不被认可的,无法建立连接
"Sec-WebSocket-Key": self.generate_websocket_key(),
# 和http请求的用途一样,用于校验客户端角色;主要风控
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
}
ws = websocket.WebSocket(sslopt={"cert_reqs": ssl.CERT_NONE})
# remote = 'wss://io.dexscreener.com/dex/screener/pairs/h24/4??rankBy[key]=txns&rankBy[order]=desc&filters[liquidity][min]=100000&filters[marketCap][min]=200000&filters[txns][h24][min]=100'
url = self.base_url.format(num)
logger.info(f'request {num}: {url}')
# 必须得携带头部信息,Python代码实现的wss通信自动生成的Sec-WebSocket-Key 很多服务器会不支持
ws.connect(url, header=header)
return ws
def get_data(self, num):
"""
发起请求获取数据
"""
retry_num = 0
while True:
retry_num += 1
if retry_num > 3:
logger.error('重试超过次数!')
return
try:
ws = self.open_connection(num)
# message = input("Enter message: ")
# 测了下好像不会过期(约20个小时) 该行代码有可能报错
ws.send('pong')
break
except:
logger.warning('wss通信失败,重试...')
time.sleep(1)
response = ws.recv()
logger.info(f'response: {num}')
data = json.loads(response)
crawler Article's
30 articles in total
The best web crawler tools in 2025
read article
Proxy IP and crawler anomaly detection make data collection more stable and efficient
read article
Session management of proxy IP in crawlers
read article
How Crawler IP Proxies Enhance Competitor Analysis and Market Research
read article
How to configure Swiftproxy proxy server in Puppeteer?
read article
Common web scraping roadblocks and how to avoid them
read article
什么是网络爬虫及其工作原理?
read article
网络爬虫架构设计
read article
Traditional crawler or AI-assisted crawler? How to choose?
read article
AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?
read article
Building a README Crawler With Node.js
read article
The Ultimate Instagram Scraping API Guide for 2024
read article
How to efficiently scrape millions of Google Businesses on a large scale using a distributed crawler
read article
A Step-by-Step Guide to Building a Scalable Distributed Crawler for Scraping Millions of Top TikTok Profiles
read article
Python爬虫如何爬wss数据
currently reading
Web Crawler in Action: How to use Webspot to implement automatic recognition and data extraction of list web pages
read article
Web Scraping vs. Crawling: What’s the Difference?
read article
Crawler Web dev.to using Colly when learning Golang
read article
Glue Crawlers: No GetObject, No Problem
read article
Simple tool crawl urls form domain
read article
用 2Captcha 通過 CAPTCHA 人機驗證
read article
The Difference Between Web Scraping vs Web Crawling
read article
Design a Web Crawler
read article
Build A Web Crawler To Find Any Broken Links on Your Site with Python & BeautifulSoup
read article
DRUM
read article
15 Best Website Downloaders & Website Copier – Save website locally to read offline
read article
Google News | Crawler
read article
[Beginner] How to build Youtube video crawler web application with Rails 6, Devise, Nokogiri and Bootstrap 4?
read article
TYPO3 Crawler with TYPO3 9 & 10 Support
read article
How to generate a Laravel sitemaps on the fly?
read article
Featured ones: