The Dr. Bill FAQ

What do the different parts of a World Wide Web address mean?

Bill has often complained about there being redundant pieces in Web addresses (also called URLs, for Uniform Resource Locator), but each part has a distinct, important meaning. Let’s use the URL to this page as an example:

http://www.pushback.com/Wattenburg/FAQ/URL-parts.html

http
This is the type of address (protocol) that the address is. There are several possible values for this are http (Hyper Text Transfer Protocol, for web pages), ftp (File Transfer Protocol, for public file archive sites), telnet (used for plain text terminal connections when a telnet client is present and configured to work with your web browser), news (for linking to USENET newsgroups, when a newsgroup reader is present and configured to work with your web browser). There are others, but they are largely esoteric. The value of this part of the address determines exactly how the web browser talks to the server it is connecting to. If it is http, it speaks the “http“ language, and talks on port 80 (unless a different port is specified). If the protocol is ftp, then the browser talks to the server (either the same machine or a different one) with the ftp language, using a different port number. (Think of the port as a door—you could say that port 80 is the front door, and the ftp port (whatever it is) is the delivery entrance.) The case of this part doesn’t matter at all, but is traditionally all lower-case.
:
The colon separates the transfer protocol from the rest of the address.
//
The double slashes indicates that a machine name follows. In some forms of a URL (such as mailto:), it is omitted, along with the machine name.
www.pushback.com
This grouping is the complete domain name registered to me, consisting of the following parts. Again, the case of this doesn’t matter (the specifications demand that all software disregard the case of the domain name), but is usually all lower-case. If you see a colon and a number after this, but before the slash, it is the port number I describe above.
www
This appears at the beginning of virtually every Web address, but isn’t strictly required. If the first of three or more parts to the machine name (e.g. www.pushback.com, www.leland.stanford.edu), then it is the name assigned to a physical or virtual machine on the net. In many cases, the physical machine will have a primary name, and then several additional aliases, such as ftp, news, mail, etc. For a Web site, the machine name can be nearly anything except the same as one of these other protocols, so you can have things like home.netscape.com, info.apple.com, etc. Read the section on simplification below for more on this, and the exceptions.
.
The lowly period separates the parts of a machine address from one another.
pushback
This is the core part of the domain. It is entirely unique within the type of domain it is a member of (see next item). On its own, it doesn’t mean much, and needs to be combined with the domain suffix to gain identity.
com
The type of domain. In this case, it stands for commercial, and for gaining the privilege of using this popular group, I pay the price of $50 per year to the domain registry. Technically, it is only supposed to be for commercial enterprises, not individual persons. The registry won’t even sell you a .com domain unless you fill out the “organization type” line in the application. (How did I get one? This is an electronic publication, is it not?) The other common and significant classes of domains are: .gov (for government), .edu (for educational institutions), and .net (for network service providers—those that provide the infrastructure of the Internet).
Still another class of domain types is divided up using political divisions. For instance, I could have gotten pushback.ca.us as a domain if I didn’t want to use (or couldn’t use) .com, .edu, .net, or .gov.
There are controversial proposals to greatly expand the number of these types, but I think the proposed types are going to create more problems than they solve, and will serve no other purpose than to make Web addresses far more confusing than they are now, and line the pockets of whatever number of organizations the U.N. selects to manage the new names. They want to add groups such as .nom (for nominative, or personal sites), .store (for stores selling stuff on the web), and a few other ones to boot. Yes, it’s kind of like an area code, but since most of us remember the middle part of a domain name the most (e.g. pushback), can you imagine how confusing it will be when you have to distinguish a dozen domain suffixes? Lets, see, is it ibm.com because they are a commercial enterprise, ibm.store because they sell computers on-line, or ibm.web just because? The stated aim of the new proposal is to escape the shortage of names. But as you can see, all it will do is cause numerous fights over name confusion (this is good news for the lawyers, though; when they discover their shenanigans with the superfund have dried up, they’ll have millions of web content providers to attach themselves to).
/
The forward slash (as opposed to “ \ ”, the backward slash, or backslash) separates the domain name from the remainder of the address, and within the rest of the URL, separates each directory from another, and from the file name (if any). If the URL ends with only a domain or directory, it is a good idea to include a final, trailing slash to signal to the server that you are after a directory, not a file.
Warning! Anything beyond this point is case-sensitive on almost any Web server! (The exceptions are servers running on Microsoft Windows or Apple’s MacOS, and Unix servers running Apache with a special case-insensitive module installed. The file system used on all Unix servers treats readme.txt as a completely separate file from README.TXT, and allows both to exist side-by-side. If you ask for faq.html, and only FAQ.html exists on a Unix web server, you will get an error. Web servers running on Macintosh or Windows systems don’t distinguish, and treat both readme.txt and README.TXT as the same file. But since you can’t easilly tell if any particular Web server is running on Unix or not, you need to assume that all URL addresses are case-sensitive.
Wattenburg
This is a directory on the web server, just like any directory you would have on your computer.
FAQ
This is a subdirectory under “Wattenburg”.
URL-parts.html
This is a file name. The part after the period is the suffix, and is used by the server to determine what type of file it is serving on the web. Text files are served up differently than binary files, and each file needs to be labeled by the server so that the browser knows how to process it properly (.txt files are displayed exactly, but .htm and .html files are parsed before being displayed, so that the browser may know whether to change the font, embed a graphic, or turn some text into a link. One other note. Some files on Web servers may not have an extension at all, and may at first appear to be a directory. Generally, such files are special Web server programs, and if you put a trailing slash after these, the URL won’t work. Isn’t this confusing?

(Bill—I’ll bet that is more than you ever wanted to know about Web addresses, but you did ask!)

Simplifying things

Now that you know how the whole thing works, you can understand why a simplification is possible. Most modern browsers (Netscape Navigator and Microsoft Internet Explorer, versions 2.0 and greater, and possibly some of the 1.x versions, in both cases) will allow you to leave out the protocol and machine separator portions of the URL, and make an educated guess as to the proper protocol and port numbers to use. If you just enter “www.pushback.com” into one of these browsers, you will get to my site just fine. Same thing if you enter “info.apple.com”. But if you enter something like “ftp.apple.com”, and expect to get an http connection, guess again. The browser sees the ftp, and assumes you really want “ftp://ftp.apple.com”.

Rarely, you will see a site that doesn’t have any machine name as part of it, as in “http://microsoft.com“. This is not browser-dependent, but relies on some very specific network configurations that aren’t made to most web servers. So, yes, Bill, you really do need that “www.” at the front of URLs. But you can shorten “http://www.pushback.com/” to just “www.pushback.com”, and not run into any problems at all.

The same shorthand goes for URLs that don’t require a machine name. For instance, there are some newsgroups that are hosted by a publicly (read: free) accessible news server, and a fully-qualified path would look like “news://msnews.microsoft.com/microsoft.public.word”. But for other newsgroup links where the group is only available to paying customers of Internet service providers, the form would be “news:microsoft.public.word”. In this case, your browser uses the default newsgroup reader configured in its preferences.

This page was last modified on .

Introduction

More FAQs