What's in an HTTP request?

来源:百度文库 编辑:神马文学网 时间:2024/07/02 17:53:20

Whenever your web browser fetches a file (a page, a picture, etc) from a web server, it does so using HTTP - that's "Hypertext Transfer Protocol".  HTTP is a request/response protocol, which means your computer sends a request for some file (e.g. "Get me the file 'home.html'"), and the web server sends back a response ("Here's the file", followed by the file itself).

That request which your computer sends to the web server contains all sorts of (potentially) interesting information.  We'll now examine the HTTP request your computer just sent to this web server, see what it contains, and find out what it tells me about you.

The raw information

The following HTTP request was received from IP address 115.181.64.82 (port 30892) by IP address 91.84.196.2 (port 80):

GET /dumprequest HTTP/1.1Host: djce.org.ukConnection: keep-aliveAccept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/533.9 (KHTML, like Gecko) Maxthon/3.0 Safari/533.9Accept-Encoding: gzip,deflateAccept-Language: zh-CNAccept-Charset: iso-8859-1,*,utf-8

The analysis

Source IP address, port and proxy

Source IP address: 115.181.64.82 Source port: 30892 Via: not present X-Forwarded-For: not present

In order to send the appropriate response back to your computer, the web server necessarily knows your computer's IP address, and a port number to which to send the response.  Your IP address seems to be 115.181.64.82, and the port number used was 30892.

On the other hand, there could be one or more proxy servers between your computer and the web server.  If the HTTP request includes the header "Via", or "X-Forwarded-For", then that's a strong indication that there is at least one proxy server somewhere along the line. 

If neither of those headers were present, that could mean that no proxy servers were involved, or it could mean that they just chose not to "reveal" themselves by adding those headers. 

In this case since there is neither a "Via" header nor a "X-Forwarded-For" header, there quite possibly isn't a proxy between your computer and the web server.  However, this isn't definite - it might be that there is a proxy, but it just chose not to add the "Via" / "X-Forwarded-For" headers.

Your IP address

For now we'll assume your IP address is 115.181.64.82.  Let's see what we know about that address.

(Note, this section is nothing to do with HTTP in particular; this is just an example of what information can be determined from an IP address).

IP address: 115.181.64.82 DNS name: none

Lots more interesting information can be learned from your IP address.  For example whereabouts you are on the Internet, (roughly) what city you're in, andwho your ISP is.

Destination IP address, port, host and protocol

Destination IP address: 91.84.196.2 Destination port: 80 Host: djce.org.uk Protocol: INCLUDED

These headers tell us which web server you were trying to contact.  If that seems odd, bear in mind that many web sites can be "hosted" on a single server, so when the request is received it needs to know which web site you were attempting to access.

The protocol used will almost always be either "HTTP/1.1" or "HTTP/1.0", and is a property of your computer's web browser and any proxies through which the request might have passed.

Requested URI

Requested URI: /dumprequest

Together with the 'Host' header and the destination port number (above), this specifies the document which should be retrieved. 

Given all these values we can determine that the URL of the document which is being retrieved is: http://djce.org.uk/dumprequest

Request method and content

Request method: GET Data: none

The request method is usually either "GET" or "POST".  Basically if you fill in and submit a form on a web page it might generate a POST request (or it might be "GET"), whereas if you just click on a link, or activate one of your browser's "bookmarks" or "favourites", then the request method will always be "GET".

Therefore, if it's "POST", we can tell that a form was definitely submitted.  The contents of the form would appear here, and there would also be some "Content-" headers describing the data.

Web browsers generate two kinds of "POST" data: either "multipart/form-data", which is used when uploading files to a web server, or the more common "application/x-www-form-urlencoded".

User agent

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/533.9 (KHTML, like Gecko) Maxthon/3.0 Safari/533.9 Accept: application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5 Accept-Charset: iso-8859-1,*,utf-8 Accept-Encoding: gzip,deflate Accept-Language: zh-CN

The User-Agent header describes your web browser.  Typically it contains the browser name and version (e.g. Firefox 1.0.7), your Operating System and version (e.g. Windows XP), and possibly additional information (such as which "service packs" you have installed).

The "Accept" headers describe what sort of things the web browser can handle, and what it would prefer to be given if there's a choice. 

The "Accept" header itself describes which document types the web browser can handle, so for example we can tell whether your browser is capable of handling "image/png" graphics.

The "Accept-Charset" header describes what character sets are acceptable, so we can make some guesses as to what part of the world you might be in, and what language you might speak.  For example, western European or north American users quite possibly only understand the "iso-8859-1", "us-ascii" and "utf-8" character sets, whereas "big5" would suggest that you might be Chinese.

"Accept-Encoding" describes the ability of your web browser to handle compressed transfer of documents.  Nothing too interesting there, but it's another snippet of information about the browser you're using.

"Accept-Language" is more interesting though; it tells us what language(s) you prefer to receive your documents in - again, if the web server offers a choice.  For example, if the header tells us that your preference is for "en-gb" followed by "en", that means you're probably an English-speaking Briton. "pt-br" on the otherhand would suggest a Portuguese-speaking Brazilian.

Referring page

Referer: not present

The "referer" header tells us which document referred you to us - in essence, if you followed a link to get to this page, it is the URL of the page you came fromto get here.

If on the other hand you didn't follow a link - maybe you clicked on a browser "bookmark", or maybe you just typed the address of this page directly into your browser - then the "referer" will be missing.  And yes, that isn't how it should be spelt.   :-(

Cookies

Cookie: not present

Every time a web server provides you with a response (a page, a graphic, etc), it has the opportunity to send your browser a "cookie".  These cookies are small pieces of information which your browser stores, and then sends back to that same web server whenever you subsequently request a document. 

So there's two important points here: (1) each cookie is only sent back to the same web site as it came from in the first place, and (2) the "contents" of the cookie (the data it contains) can only be made up of whatever information the web server already knew anyway.  For example, a web server can't just say "send me a cookie containing your e-mail address" unless that same web server had already sent you that information in the first place.

Connection control

Connection: keep-alive Keep-Alive: not present

These headers are used to fine-tune the network traffic between you and the web server.  They don't tell us much, except a little about the capabilities of your web browser.

Cache control

Pragma: not present Cache-Control: not present If-Modified-Since: not present

These headers control cacheing of the document.  By examining them the we can detect if you used your browser's "refresh" button to force the page to reload.

For example, Mozilla (Netscape 6) sets "Cache-Control" to "max-age=0" when you use the "reload" button.  MSIE 5.5 sets it to "no-cache" if you do a "hard" reload (while holding down the "control" key).

Authorisation

Username: not present

If you have "logged in" to a web site, your username appears here.

Note that this only applies to web sites which use proper HTTP authentication - typically, a "login" window pops up and you get three chances to enter your username and password, otherwise you see a page which says "Authentication Required" or similar.  It doesn't apply to web sites where the "login" is a separate page.

It's also possible to supply the username and password in the URL you tell your browser to visit - for example, http://user:password@www.example.com/.  In that case, the username would appear here too.

Summary

The most interesting pieces of information contained in the request are:

  • the IP address of you and/or your HTTP proxy
  • which document you requested
  • which version of which browser you're using
  • which page you came from to get here (if you followed a link)
  • your preferred language(s)
  • cookies

The "odd one out" in that list is "cookies".  That's because the cookies only send to the web server information which it had previously sent to you (and your browser accepted).  However, the problem is in knowing what it means.  The meaning of the cookie is only actually known to the web server.

If you can get your browser to show you your cookies, you might be able to make a good guess as to what it means - for example a cookie called "LastLoginName" with a value of "fred" probably means that when you last logged in on that site, you used the username "fred".  However, a cookie called "TGIDX" with a value of "wl4o6ulhw48lw845yh68hylohw45" is meaningless to everybody except the web server, so you really have no idea what information that cookie actually holds.

Reference

RFC 2616 - "Hypertext Transfer Protocol -- HTTP/1.1"

For some interesting pages you can use to examine your HTTP requests, visit the Utilities page.