Logo

dev-resources.site

for different kinds of informations.

使用 selenium 讀取需要登入會員的網頁

Published at
12/5/2024
Categories
selenium
python
Author
codemee
Categories
2 categories in total
selenium
open
python
open
Author
7 person written this
codemee
open
使用 selenium 讀取需要登入會員的網頁

使用 selenium 撰寫爬蟲讀取網頁內容是很常見的需求,不過你可能會遇到需要先登入帳號才能爬取內容的網頁,像是博客來上限制級的書就是這樣的例子,你會看到如下的交談窗:

Image description

由於登入帳號後會利用 cookie 記錄已登入資訊,為了讀取這樣的網頁,我們可以先手動登入,登入之後匯出 cookie 到檔案。之後利用 selenium 爬取網頁時,再將匯出的 cookie 重新加入,重新整理網頁即可正常讀取網頁內容了。

匯出 cookie

請先利用 selenium 開啟欲讀取的網頁,這裡以博客來為例:

>>> from selenium import webdriver
>>> driver = webdriver.Edge()
>>> driver.get('https://www.books.com.tw')
Enter fullscreen mode Exit fullscreen mode

這時請依照正常程序登入會員,然後安裝 cookie_editor 外掛:

Image description

請記得切回博客來的首頁,利用外掛以 JSON 格式匯出所有的 cookie:

Image description

它會將 cookie 內容複製到剪貼簿上,請自行貼到文字編輯器存檔。

重新加入 cookie

接著把 selenium 關閉,再重新開啟:

>>> from selenium import webdriver
>>> driver = webdriver.Edge()
>>> driver.get('https://www.books.com.tw')
Enter fullscreen mode Exit fullscreen mode

要加入 cookie,一定要在跟 Cookie 相同網域的網頁,所以要記得先開啟博客來網頁。接著就可以開啟儲存 cookie 的檔案,載入成為 Pytho 的字典:

>>> import json
>>> with open('cookies.json') as f:
...     cookies = json.load(f)
Enter fullscreen mode Exit fullscreen mode

再把 cookie 一一加回去:

>>> for cookie in cookies:
...     driver.add_cookie(cookie)
Enter fullscreen mode Exit fullscreen mode

這時你應該會看到以下的錯誤:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "C:\Users\meebo\code\python\poetry_env\py310\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 670, in add_cookie
    assert cookie_dict["sameSite"] in ["Strict", "Lax", "None"]
AssertionError
Enter fullscreen mode Exit fullscreen mode

這是因為 cookie-editor 匯出的資料中,在 sameSite 屬性中是以 null 表示 "no_restriction" 表示沒有限制一定要是同一個網站,但是 selenium 只認得 "Strict", "Lax", "None" 這三種限制,所以診斷為錯誤,我們必須手動修改 JSON 檔,自行把所有 sameSite 屬性值都改為 "None"(注意是字串喔),也請檢查把網域不是 ".books.com.tw" 的 cookie 刪除:

[
    {
        "domain": ".books.com.tw",
        "expirationDate": 1767941747.633402,
        "hostOnly": false,
        "httpOnly": false,
        "name": "_ga_TR763QQ559",
        "path": "/",
        "sameSite": null,
        "secure": false,
        "session": false,
        "storeId": null,
        "value": "GS1.1.1733381542.1.1.1733381747.0.0.0"
    },
    ...
    {
        "domain": ".books.com.tw",
        "expirationDate": 1748933733,
        "hostOnly": false,
        "httpOnly": false,
        "name": "__eoi",
        "path": "/",
        "sameSite": "no_restriction",
        "secure": true,
        "session": false,
        "storeId": null,
        "value": "ID=7f42c4647467b5fb:T=1733381733:RT=1733381733:S=AA-AfjbpJCe1kw2klEX0xW55n9CY"
    },
    ...
]
Enter fullscreen mode Exit fullscreen mode

修改完之後,重新載入並加入 cookie 就不會出錯了。

重新整理網頁讓 cookie 生效

加入 cookie 後,你看到的畫面仍然是未登入的畫面:

Image description

必須重新整理網頁才能讓 cookie 生效:

>>> driver.refresh()
Enter fullscreen mode Exit fullscreen mode

看到的就是登入會員的頁面了:

Image description

這樣就可以利用 selenium 讀取需要登入會員的頁面了。

最後要提醒的就是 cookie 都有時效,如果一段時間過後無法利用之前儲存的 cookie 登入,只要重新依據上述步驟重新取得 cookie 就可以了。

selenium Article's
30 articles in total
Favicon
How to solve the problem of limited access speed of crawlers
Favicon
Starting testing
Favicon
Comprehensive Guide to Waits in Selenium 4
Favicon
Effective Strategies for Managing Flaky Tests in Automated Test Suites
Favicon
Application of proxy IP in automated testing framework
Favicon
My First Steps with Playwright 🎭: A Tester’s Journey from Selenium
Favicon
Web/Mobile UI Test Automation Using Selenium & Appium For Food Ordering Solution
Favicon
Hi @All, I'm working devOps engineer in product based company and I want to is there any tool or way to get test cases from existing products as development from scratch is not possible on this stage to automate automation testing using Selenium Jenkins ?
Favicon
Selenium Vs. Cypress: What are the Key Differences?
Favicon
Improve User Experience & SEO: The Power of Automated Accessibility Testing
Favicon
Playwright vs Selenium WebDriver: Simplified. Which one to choose for your application automation needs?
Favicon
BrowserStack vs TestGrid: Which Cloud Testing Platform Suits Your Needs?
Favicon
Building Robust Web Automation with Selenium and Python
Favicon
Test Automation Frameworks- The Complete Guide
Favicon
The Input password field is not displaying password but mot throwing error
Favicon
使用 selenium 讀取需要登入會員的網頁
Favicon
Handling Dynamic Tables with Changing Rows and Columns
Favicon
Struggling with Selenium/WebdriverIO Updates? Let’s Break It Down Step-by-Step!
Favicon
Cypress vs Selenium: Which Testing Tool Is Right for You?
Favicon
How to Write Effective Test Cases
Favicon
Playwright vs Selenium: A Detailed Comparison
Favicon
Selenium webDriver Cheat sheet
Favicon
Selenium WebDriver steps and methods
Favicon
Selenium Testing: The Complete Step-by-Step Tutorial
Favicon
error when executing script in selenium using chromedriver
Favicon
Mastering Selenium C# with NUnit: In-Depth Guide to Page Object Model (POM) and Data Object Model (DOM)
Favicon
WebDriverIO Tutorial For Selenium Automation - A Complete Guide
Favicon
Selenium CAPTCHA Bypass: Tokens vs. Clicks — Which One’s Faster?
Favicon
Cloud Test Automation with Selenium: Revolutionizing Testing for Cloud-Based Applications
Favicon
Why You Should Learn Automation: A Guide to Saving Time as a Developer

Featured ones: