Logo

dev-resources.site

for different kinds of informations.

Python爬虫如何爬wss数据

Published at
5/6/2023
Categories
python
crawler
websocket
Author
dragon72463399
Categories
3 categories in total
python
open
crawler
open
websocket
open
Author
14 person written this
dragon72463399
open
Python爬虫如何爬wss数据

目标站点

https://dexscreener.com

目标数据

列表页数据

问题

  • 看到网上有不少推荐使用该库的aiowebsocket,测试下来该库无法实现,也是耽误了我一天的时间,各种调试,一开始只是以为是我没有过风控的问题,其实并不是,问题在库上;该项目已经4年没有更新了,废掉了,不建议使用
# from aiowebsocket.converses import AioWebSocket
#https://github.com/asyncins/aiowebsocket
Enter fullscreen mode Exit fullscreen mode
  • Sec-WebSocket-Key 加密参数;该参数是根据一定规则随机生成的,只要按照规则生成即可,不按规则就无法连接成功; 代码示例
function generateWebSocketKey() {
  // 生成一个16字节的随机值
  const buffer = new Uint8Array(16);
  window.crypto.getRandomValues(buffer);

  // 将随机值进行Base64编码
  const key = btoa(String.fromCharCode.apply(null, buffer));

  return key;
}
Enter fullscreen mode Exit fullscreen mode
import os
import base64

def generate_websocket_key():
    # 生成一个16字节的随机值
    buffer = os.urandom(16)

    # 将随机值进行Base64编码
    key = base64.b64encode(buffer).decode('utf-8')

    return key
print(generate_websocket_key())
Enter fullscreen mode Exit fullscreen mode
  • headers 是需要对UA进行校验的

解决代码示例

class ListPage:
    def __init__(self):
        self.base_url = 'wss://io.dexscreener.com/dex/screener/pairs/h24/{}?rankBy[key]=txns&rankBy[order]=desc&filters[liquidity][min]=100000&filters[marketCap][min]=200000&filters[txns][h24][min]=100'

    def generate_websocket_key(self):
        # 生成一个16字节的随机值
        buffer = os.urandom(16)
        # 将随机值进行Base64编码
        key = base64.b64encode(buffer).decode('utf-8')
        return key
    def open_connection(self,num):
        """
        建立连接
        """
        header = {
                # 用户唯一性校验+风控识别校验,可以理解为mac地址的作用,该值有一定的规范,是随机生成的,但是得符合一定的标准,在标准之外的随机数是不被认可的,无法建立连接
                "Sec-WebSocket-Key": self.generate_websocket_key(),
                # 和http请求的用途一样,用于校验客户端角色;主要风控
                "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
            }

        ws = websocket.WebSocket(sslopt={"cert_reqs": ssl.CERT_NONE})
        # remote = 'wss://io.dexscreener.com/dex/screener/pairs/h24/4??rankBy[key]=txns&rankBy[order]=desc&filters[liquidity][min]=100000&filters[marketCap][min]=200000&filters[txns][h24][min]=100'
        url = self.base_url.format(num)
        logger.info(f'request {num}: {url}')
        # 必须得携带头部信息,Python代码实现的wss通信自动生成的Sec-WebSocket-Key 很多服务器会不支持
        ws.connect(url, header=header)
        return ws

    def get_data(self, num):
        """
        发起请求获取数据
        """
        retry_num = 0
        while True:
            retry_num += 1
            if retry_num > 3:
                logger.error('重试超过次数!')
                return
            try:
                ws = self.open_connection(num)
                # message = input("Enter message: ")
                # 测了下好像不会过期(约20个小时)   该行代码有可能报错
                ws.send('pong')
                break
            except:
                logger.warning('wss通信失败,重试...')
                time.sleep(1)
        response = ws.recv()
        logger.info(f'response: {num}')
        data = json.loads(response)

Enter fullscreen mode Exit fullscreen mode
crawler Article's
30 articles in total
Favicon
The best web crawler tools in 2025
Favicon
Proxy IP and crawler anomaly detection make data collection more stable and efficient
Favicon
Session management of proxy IP in crawlers
Favicon
How Crawler IP Proxies Enhance Competitor Analysis and Market Research
Favicon
How to configure Swiftproxy proxy server in Puppeteer?
Favicon
Common web scraping roadblocks and how to avoid them
Favicon
什么是网络爬虫及其工作原理?
Favicon
网络爬虫架构设计
Favicon
Traditional crawler or AI-assisted crawler? How to choose?
Favicon
AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?
Favicon
Building a README Crawler With Node.js
Favicon
The Ultimate Instagram Scraping API Guide for 2024
Favicon
How to efficiently scrape millions of Google Businesses on a large scale using a distributed crawler
Favicon
A Step-by-Step Guide to Building a Scalable Distributed Crawler for Scraping Millions of Top TikTok Profiles
Favicon
Python爬虫如何爬wss数据
Favicon
Web Crawler in Action: How to use Webspot to implement automatic recognition and data extraction of list web pages
Favicon
Web Scraping vs. Crawling: What’s the Difference?
Favicon
Crawler Web dev.to using Colly when learning Golang
Favicon
Glue Crawlers: No GetObject, No Problem
Favicon
Simple tool crawl urls form domain
Favicon
用 2Captcha 通過 CAPTCHA 人機驗證
Favicon
The Difference Between Web Scraping vs Web Crawling
Favicon
Design a Web Crawler
Favicon
Build A Web Crawler To Find Any Broken Links on Your Site with Python & BeautifulSoup
Favicon
DRUM
Favicon
15 Best Website Downloaders & Website Copier – Save website locally to read offline
Favicon
Google News | Crawler
Favicon
[Beginner] How to build Youtube video crawler web application with Rails 6, Devise, Nokogiri and Bootstrap 4?
Favicon
TYPO3 Crawler with TYPO3 9 & 10 Support
Favicon
How to generate a Laravel sitemaps on the fly?

Featured ones: