dev-resources.site

for different kinds of informations.

Python爬虫如何爬wss数据

Published at

5/6/2023

Categories

python

crawler

websocket

Author

dragon72463399

Main Article

https://dev.to/dragon72463399/pythonpa-chong-ru-he-pa-wssshu-ju-468k

Categories

3 categories in total

Author

14 person written this

Python爬虫如何爬wss数据

目标站点

https://dexscreener.com

目标数据

列表页数据

问题

看到网上有不少推荐使用该库的aiowebsocket，测试下来该库无法实现，也是耽误了我一天的时间，各种调试，一开始只是以为是我没有过风控的问题，其实并不是，问题在库上；该项目已经4年没有更新了，废掉了，不建议使用

# from aiowebsocket.converses import AioWebSocket
#https://github.com/asyncins/aiowebsocket

Sec-WebSocket-Key 加密参数；该参数是根据一定规则随机生成的，只要按照规则生成即可，不按规则就无法连接成功；代码示例

function generateWebSocketKey() {
  // 生成一个16字节的随机值
  const buffer = new Uint8Array(16);
  window.crypto.getRandomValues(buffer);

  // 将随机值进行Base64编码
  const key = btoa(String.fromCharCode.apply(null, buffer));

  return key;
}

import os
import base64

def generate_websocket_key():
    # 生成一个16字节的随机值
    buffer = os.urandom(16)

    # 将随机值进行Base64编码
    key = base64.b64encode(buffer).decode('utf-8')

    return key
print(generate_websocket_key())

headers 是需要对UA进行校验的

解决代码示例

class ListPage:
    def __init__(self):
        self.base_url = 'wss://io.dexscreener.com/dex/screener/pairs/h24/{}?rankBy[key]=txns&rankBy[order]=desc&filters[liquidity][min]=100000&filters[marketCap][min]=200000&filters[txns][h24][min]=100'

    def generate_websocket_key(self):
        # 生成一个16字节的随机值
        buffer = os.urandom(16)
        # 将随机值进行Base64编码
        key = base64.b64encode(buffer).decode('utf-8')
        return key
    def open_connection(self,num):
        """
        建立连接
        """
        header = {
                # 用户唯一性校验+风控识别校验，可以理解为mac地址的作用，该值有一定的规范，是随机生成的，但是得符合一定的标准，在标准之外的随机数是不被认可的，无法建立连接
                "Sec-WebSocket-Key": self.generate_websocket_key(),
                # 和http请求的用途一样，用于校验客户端角色；主要风控
                "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
            }

        ws = websocket.WebSocket(sslopt={"cert_reqs": ssl.CERT_NONE})
        # remote = 'wss://io.dexscreener.com/dex/screener/pairs/h24/4??rankBy[key]=txns&rankBy[order]=desc&filters[liquidity][min]=100000&filters[marketCap][min]=200000&filters[txns][h24][min]=100'
        url = self.base_url.format(num)
        logger.info(f'request {num}: {url}')
        # 必须得携带头部信息，Python代码实现的wss通信自动生成的Sec-WebSocket-Key 很多服务器会不支持
        ws.connect(url, header=header)
        return ws

    def get_data(self, num):
        """
        发起请求获取数据
        """
        retry_num = 0
        while True:
            retry_num += 1
            if retry_num > 3:
                logger.error('重试超过次数！')
                return
            try:
                ws = self.open_connection(num)
                # message = input("Enter message: ")
                # 测了下好像不会过期(约20个小时)   该行代码有可能报错
                ws.send('pong')
                break
            except:
                logger.warning('wss通信失败，重试...')
                time.sleep(1)
        response = ws.recv()
        logger.info(f'response: {num}')
        data = json.loads(response)

crawler Article's

30 articles in total

The best web crawler tools in 2025

Proxy IP and crawler anomaly detection make data collection more stable and efficient

Session management of proxy IP in crawlers

How Crawler IP Proxies Enhance Competitor Analysis and Market Research

How to configure Swiftproxy proxy server in Puppeteer?

Common web scraping roadblocks and how to avoid them

什么是网络爬虫及其工作原理？

网络爬虫架构设计

Traditional crawler or AI-assisted crawler? How to choose?

AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?

Building a README Crawler With Node.js

The Ultimate Instagram Scraping API Guide for 2024

How to efficiently scrape millions of Google Businesses on a large scale using a distributed crawler

A Step-by-Step Guide to Building a Scalable Distributed Crawler for Scraping Millions of Top TikTok Profiles

Python爬虫如何爬wss数据

currently reading

Web Crawler in Action: How to use Webspot to implement automatic recognition and data extraction of list web pages

Web Scraping vs. Crawling: What’s the Difference?

Crawler Web dev.to using Colly when learning Golang

Glue Crawlers: No GetObject, No Problem

Simple tool crawl urls form domain

用 2Captcha 通過 CAPTCHA 人機驗證

The Difference Between Web Scraping vs Web Crawling

Design a Web Crawler

Build A Web Crawler To Find Any Broken Links on Your Site with Python & BeautifulSoup

15 Best Website Downloaders & Website Copier – Save website locally to read offline

Google News | Crawler

[Beginner] How to build Youtube video crawler web application with Rails 6, Devise, Nokogiri and Bootstrap 4?

TYPO3 Crawler with TYPO3 9 & 10 Support

How to generate a Laravel sitemaps on the fly?

Featured ones:

abubakersiddique761