dev-resources.site
for different kinds of informations.
What happens when you type a URL into your browser?
From my experience, if you are a frontend dev being interviewed by a backend dev, which often happens when you plan to join a small company, there is quite high chance that you'll have to answer these two questions:
- what happens when you type a URL into your browser
- describe security attacks that you know.
Not so long ago I had to refresh my notes about it, and I would like to share them with you. So,
What happens when you type a URL into your browser?
I'll omit the basics that people usually remember and focus only on things that I usually have to re-read
TOC
- Browser looks up IP address for the domain
- User initiates communication (Application layer)
- Data encapsulation begins (Transport layer)
- Data is sent (Network layer)
- Link layer, Physical layer
- Data arrives at the server
- Extra details you might want to know
- The links I used
Browser looks up IP address for the domain
Each device on the internet has a unique IP address. An IP address contains 4 numbered parts, e.g. 203.0.113.0. When you're loading a webpage from a domain your browser has never visited before, your browser may need to make a DNS request to resolve IP address, associated with the domain name. This stack of protocols is used when a DNS request is sent through the Internet:
Then your browser will make an HTTPS request to fetch the webpage. This protocol stack is used when an HTTP request is sent through the Internet:
Nobody wants to do DNS requests too often, so different kinds of cache exist: browser cache, OS cache, router cache, ISP (Internet Service Provider) cache. If the browser cannot find the IP address at any of those cache layers, the DNS server on your corporate network or at your ISP does a recursive DNS lookup. A recursive DNS lookup asks multiple DNS servers around the Internet, which in turn ask more DNS servers for the DNS record until it is found.
User initiates communication (Application layer)
A user on one host sends a message or issues a command that must access a remote host. The application protocol associated with the command or message formats the packet so that it can be handled by the appropriate transport layer protocol, TCP or UDP. Besides HTTPS, there are thousands of other application-layer protocols: SMTP, IMAP, and POP3 for email; XMPP, IRC, ICQ for chat; Telnet, SSH, RDP for remote administration; etc.
When the data contains private information, it needs to be transported securely from the sender to the destination. The Transport Layer Security (TLS) protocol uses algorithms to encrypt the data, while certificate authorities help users trust the encryption.
When a user connects to a webpage, the webpage will send over its SSL certificate which contains the public key necessary to start the secure session. The two computers, the client and the server, then go through a process called a SSL/TLS handshake.
Many website hosting providers and other services will offer TLS/SSL certificates for a fee. These certificates will often be shared amongst many customers. More expensive certificates are available which can be individually registered to particular web properties.
Data encapsulation begins (Transport layer)
Data needs to be broken up into small packets, which are then reassembled at the destination. Two main transport protocols are TCP and UDP.
If TCP determines that IP packets are lost, duplicated, or out of sequence, it’ll request retransmission of missed data, correct out-of-order data. UDP offers no guarantees of delivery, ordering, or duplicate protection. Its lack of retransmission delays makes it suitable for real-time applications such as Voice over IP (VoIP), online games, live video streaming, video conferencing. All of these require transferring data as fast as possible, even if it results in a glitch or two. Knowing that TCP is primarily for functions where accuracy is prioritized over timeliness, this protocol is common for Internet essentials like sending emails, sharing files, text messaging and accessing web pages.
Both TCP and UDP divide the data received from the application layer into segments and attach a header to each segment. Both TCP and UDP headers contain the sending and receiving ports and checksum. The checksum data is used to determine whether data has transferred without error. TCP headers also include segment ordering information (don't try to memorise this table, it's just for illustrative purposes):
When TCP wants to establish connection, it sends a segment called a SYN to the peer TCP protocol running on the receiving host. The receiving TCP returns a segment called an ACK to acknowledge the successful receipt of the segment. The sending TCP sends another ACK segment, then proceeds to send the data. This exchange of control information is referred to as a three-way handshake. UDP does not use the three-way handshake.
Data is sent (Network layer)
In the majority of cases, this is dominated by Internet Protocol (IP). The Internet Protocol (IP) is the protocol that describes how to route messages from one computer to another computer on the network. Each message is split up into packets, and the packets hop from router to router on the way to their destination.
Computers send the first packet to the nearest router. You probably have a router in your home right now, and that's the first stop for your current computer's packets. When the router receives a packet, it looks at its IP header. The most important field is the destination IP address, which tells the router where the packet wants to end up.
The router has multiple paths it could send a packet along, and its goal is to send the packet to a router that's closer to its final destination. How does it decide? The router has a forwarding table that helps it pick the next path based on the destination IP address. That table does not have a row for every possible IP address; there are too many of them. Instead, the table has rows for IP address prefixes:
IP addresses are hierarchical. When two IP addresses start with the same prefix, that often means they're on the same large network. Router forwarding tables take advantage of that fact so that they can store far less information.
Once the router locates the most specific row in the table for the destination IP address, it sends the packet along that path. If all goes well, the packet should eventually arrive at a router that knows exactly where to send it.
Link layer, Physical layer
Ok, but who transmits the bits? To do that work, The network layer logic in a host or router must hand off the packet to the data link layer protocols, which, in turn, ask the physical layer to actually send the data.
Data-link layer protocols are classified as LAN protocols, WAN protocols, or protocols that can be used for both LANs and WANs. LANs connect devices within a limited area, such as an office, while WANs connect devices that are spread across a large area, such as an entire country or even the world.
Data arrives at the server
On the opposite side of the communication channel is the server, which serves the document as requested by the client. A server appears as only a single machine virtually; but it may actually be a collection of servers sharing the load (load balancing), or a complex piece of software interrogating other computers (like cache, a DB server, or e-commerce servers), totally or partially generating the document on demand.
A server is not necessarily a single machine, but several server software instances can be hosted on the same machine.
But how a remote computer can communicate with a program running on your computer? There are not enough IP addresses for that. So people came up with an idea of "ports". Basically, port is a number in a range from 0 to 65535, associated with a specific process or service. An app can open a port - to broadcast a message like if "a message comes for port 5190, it's for me".
Ports are software-based and managed by a computer's operating system. During data transfer, only a transport protocol such as TCP or UDP can indicate which port a packet should go to. TCP and UDP headers have a section for indicating port numbers.
Extra details you might want to know
IP addresses
One way to find out your computer's IP address is by searching Google for "IP address". Google knows your IP address, since your computer sends a message to the Google computers as soon as it loads google.com.
Your IP address might be different tomorrow than it is today. Each ISP has a range of addresses they can assign, and they might give you a different one of those addresses each time they see your computer pop up on the network. That's called a dynamic IP address.
Switching to a different Wi-Fi network will definitely give you a new IP address, since each Wi-Fi provider has its own range of addresses that it can give out. Computers that act as servers, like the computers that power Google.com, often have static IP addresses. That makes it easier for computers to quickly send search requests to the Google servers.
There are actually two versions of the Internet Protocol in use today:
- IPv4, the first version ever used on the Internet
- IPv6, a backwards-compatible successor.
Back when the Internet protocols were first invented, the creators didn't anticipate how popular it would become and that there would eventually be so many devices wanting to connect to the Internet. When it became obvious in the 1990s that the IPv4 addresses were running out, the IPv6 protocol was proposed with a much longer addressing scheme.
MAC addresses
A MAC address and an IP address each identify network devices, but they do the job at different levels. MAC address is assigned by the manufacturer of the hardware interface, while the IP address is assigned by the network administrator or Internet Service Provider (ISP).
The MAC address identifies the device locally, while the IP address identifies it globally. The MAC address is only relevant to the Local Area Network (LAN) to which it's connected and isn't part of the data stream when the packets leave the device's network.
Proxies
Between the Web browser and the server, numerous computers and machines relay the HTTP messages. Those operating at the application layers are generally called proxies. These can be transparent, forwarding on the requests they receive without altering them in any way, or non-transparent, in which case they will change the request in some way before passing it along to the server. Proxies may perform numerous functions:
- caching (the cache can be public or private, like the browser cache)
- filtering (like an antivirus scan or parental controls)
- load balancing (to allow multiple servers to serve different requests)
- authentication (to control access to different resources)
- logging (allowing the storage of historical information)
Ports and Firewalls
Some attackers try to send malicious traffic to random ports in the hopes that those ports have been left "open," meaning they are able to receive traffic. This action is somewhat like a car thief walking down the street and trying the doors of parked vehicles, hoping one of them is unlocked.
A firewall is a security system that blocks or allows network traffic based on a set of security rules. Firewalls usually sit between a trusted network and an untrusted network; often the untrusted network is the Internet.
Properly configured firewalls block traffic to all ports by default except for a few predetermined ports known to be in common use. For instance, a corporate firewall could only leave open ports 25 (email), 80 (web traffic), 443 (web traffic), and a few others, allowing internal employees to use these essential services, then block the rest of the 65,000+ ports.
If you are still reading, I'll talk about security and encryption in another post. This one got too lengthy to digest already!
The links I used
cloudflare, oracle, mozilla, hypertecsp, sciencefirect, guru99, simplilearn, wowza, imperva, quora, aws, trueconf, khanacademy - internet protocols, khanacademy - ip-v4-v6-addresses
Featured ones: