Logo

dev-resources.site

for different kinds of informations.

Send a From Header When You Crawl

Published at
11/23/2024
Categories
crawling
bestpractices
http
Author
calebhearth
Categories
3 categories in total
crawling
open
bestpractices
open
http
open
Author
11 person written this
calebhearth
open
Send a From Header When You Crawl

Sending a From header is part of building a polite crawler, along with respecting Robots.txt and sending a unique User-Agent. The From header simply contains an email address that can be used by the site’s owner to reach out if your bot is creating any issues for them.

RFC 9110 which describes HTTP Semantics says that a From header SHOULD be sent for robotic user-agents. MDN says that the From header “must” be sent in that circumstance, but doesn’t have a citation for a spec that defines that.

I’ve been sending From headers for a while now, and I’m even including it as a requirement in a Swift type-driven API client I’ve been using across a few projects. I’m probably sending it in more places than I need to or that it would be expected in, but as I use scraping and API requests to automate my /now page, it seems prudent in case that causes issues for the services I’m using.

Setting a From header is easy. Its syntax is From: <email>. If you’re building any kind of scraper.

Read More

Featured ones: