The Dr. Bill FAQ
What do the different parts of a World Wide Web address mean?
Bill has often complained about there being
redundant pieces in Web addresses (also called URLs, for Uniform Resource Locator), but
each part has a distinct, important meaning. Let’s use the URL to this page as an example:
http://www.pushback.com/Wattenburg/FAQ/URL-parts.html
- http
- This is the type of address (protocol) that the address is. There are several possible
values for this are http (Hyper Text Transfer Protocol, for web pages), ftp
(File Transfer Protocol, for public file archive sites), telnet (used for
plain text terminal connections when a telnet client is present and configured to work
with your web browser), news (for linking to USENET newsgroups, when a
newsgroup reader is present and configured to work with your web browser). There are
others, but they are largely esoteric. The value of this part of the address determines
exactly how the web browser talks to the server it is connecting to. If it is http, it
speaks the “http“ language, and talks on port 80 (unless a different port is
specified). If the protocol is ftp, then the browser talks to the server (either the same
machine or a different one) with the ftp language, using a different port number. (Think
of the port as a door—you could say that port 80 is the front door, and the ftp port
(whatever it is) is the delivery entrance.) The case of this part doesn’t matter at all,
but is traditionally all lower-case.
- :
- The colon separates the transfer protocol from the rest of the address.
- //
- The double slashes indicates that a machine name follows. In some forms of a URL (such as mailto:), it is
omitted, along with the machine name.
- www.pushback.com
- This grouping is the complete domain name registered to me, consisting of the following
parts. Again, the case of this doesn’t matter (the specifications demand that all software
disregard the case of the domain name), but is usually all lower-case. If you see a colon
and a number after this, but before the slash, it is the port number I describe above.
- www
- This appears at the beginning of virtually every Web address, but isn’t strictly
required. If the first of three or more parts to the machine name (e.g. www.pushback.com,
www.leland.stanford.edu), then it is the name assigned to a physical or virtual machine on the net.
In many cases, the physical machine will have a primary name, and then several additional aliases,
such as ftp, news, mail, etc. For a Web site, the machine name can be nearly anything
except the same as one of these other protocols, so you can have things like
home.netscape.com, info.apple.com, etc. Read the section on simplification below for more
on this, and the exceptions.
- .
- The lowly period separates the parts of a machine address from one another.
- pushback
- This is the core part of the domain. It is entirely unique within the type of domain it
is a member of (see next item). On its own, it doesn’t mean much, and needs to be combined
with the domain suffix to gain identity.
- com
- The type of domain. In this case, it stands for commercial, and for gaining the
privilege of using this popular group, I pay the price of $50 per year to the domain
registry. Technically, it is only supposed to be for commercial enterprises, not
individual persons. The registry won’t even sell you a .com domain unless you fill out the
“organization type” line in the application. (How did I get one? This is an
electronic publication, is it not?) The other common and significant classes of domains
are: .gov (for government), .edu (for educational
institutions), and .net (for network service providers—those that
provide the infrastructure of the Internet).
- Still another class of domain types is divided up using political divisions. For
instance, I could have gotten pushback.ca.us as a domain if I didn’t want
to use (or couldn’t use) .com, .edu, .net, or .gov.
- There are controversial proposals to greatly expand the number of these types, but I
think the proposed types are going to create more problems than they solve, and will serve
no other purpose than to make Web addresses far more confusing than they are now, and line
the pockets of whatever number of organizations the U.N. selects to manage the new names.
They want to add groups such as .nom (for nominative, or personal sites), .store (for
stores selling stuff on the web), and a few other ones to boot. Yes, it’s kind of like an
area code, but since most of us remember the middle part of a domain name the most (e.g.
pushback), can you imagine how confusing it will be when you have to distinguish a dozen
domain suffixes? Lets, see, is it ibm.com because they are a commercial
enterprise, ibm.store because they sell computers on-line, or ibm.web
just because? The stated aim of the new proposal is to escape the shortage of names. But
as you can see, all it will do is cause numerous fights over name confusion (this is good
news for the lawyers, though; when they discover their shenanigans with the superfund have
dried up, they’ll have millions of web content providers to attach themselves to).
- /
- The forward slash (as opposed to “ \ ”, the backward slash, or backslash)
separates the domain name from the remainder of the address, and within the rest of the
URL, separates each directory from another, and from the file name (if any). If the URL
ends with only a domain or directory, it is a good idea to include a final, trailing slash
to signal to the server that you are after a directory, not a file.
- Warning! Anything beyond this point is case-sensitive on almost any Web server! (The
exceptions are servers running on Microsoft Windows or Apple’s MacOS, and Unix servers
running Apache with a special case-insensitive module installed. The file system
used on all Unix servers treats readme.txt as a completely separate file from README.TXT,
and allows both to exist side-by-side. If you ask for faq.html, and only FAQ.html exists
on a Unix web server, you will get an error. Web servers running on Macintosh or Windows
systems don’t distinguish, and treat both readme.txt and README.TXT as the same file. But
since you can’t easilly tell if any particular Web server is running on Unix or not, you need to
assume that all URL addresses are case-sensitive.
- Wattenburg
- This is a directory on the web server, just like any directory you would have on your
computer.
- FAQ
- This is a subdirectory under “Wattenburg”.
- URL-parts.html
- This is a file name. The part after the period is the suffix, and is used by the server
to determine what type of file it is serving on the web. Text files are served up
differently than binary files, and each file needs to be labeled by the server so that the
browser knows how to process it properly (.txt files are displayed exactly, but .htm and
.html files are parsed before being displayed, so that the browser may know whether to
change the font, embed a graphic, or turn some text into a link. One other note. Some
files on Web servers may not have an extension at all, and may at first appear to be a
directory. Generally, such files are special Web server programs, and if you put a
trailing slash after these, the URL won’t work. Isn’t this confusing?
(Bill—I’ll bet that is more than you ever wanted to know about Web addresses, but
you did ask!)
Simplifying things
Now that you know how the whole thing works, you can understand why a simplification is
possible. Most modern browsers (Netscape Navigator and Microsoft Internet Explorer,
versions 2.0 and greater, and possibly some of the 1.x versions, in both cases) will allow
you to leave out the protocol and machine separator portions of the URL, and make an
educated guess as to the proper protocol and port numbers to use. If you just enter
“www.pushback.com” into one of these browsers, you will get to my site just
fine. Same thing if you enter “info.apple.com”. But if you enter something like
“ftp.apple.com”, and expect to get an http connection, guess again. The browser
sees the ftp, and assumes you really want “ftp://ftp.apple.com”.
Rarely, you will see a site that doesn’t have any machine name as part of it, as in
“http://microsoft.com“. This is not browser-dependent, but relies on some very
specific network configurations that aren’t made to most web servers. So, yes, Bill, you
really do need that “www.” at the front of URLs. But you can shorten
“http://www.pushback.com/” to just “www.pushback.com”, and not run
into any problems at all.
The same shorthand goes for URLs that don’t require a machine name. For instance, there
are some newsgroups that are hosted by a publicly (read: free) accessible news server, and
a fully-qualified path would look like
“news://msnews.microsoft.com/microsoft.public.word”. But for other newsgroup
links where the group is only available to paying customers of Internet service providers,
the form would be “news:microsoft.public.word”. In this case, your browser uses
the default newsgroup reader configured in its preferences.
This page was last modified on .