Lab: Search Engine Follies

CS 102

Mar. 8, 1999

The purpose of this lab is to study some of the techniques used in narrowing the number of keyword ``hits'' one receives from a search engine. In this lab, we'll look at some fairly mechanical techniques.

Data Collection

Let's say we're interested in the topic of personal privacy. Using AltaVista, perform the following searches, recording the number of hits received from each search:

  1. personal privacy

  2. Personal Privacy (Note: the two words are capitalized. Be observant of that in the succeeding searches.)

  3. +personal +privacy

  4. +Personal +Privacy

  5. Click on the Advanced link (near the top on the right) and in the Boolean expression field type in the search personal NEAR privacy.

  6. In Advanced search, Personal NEAR Privacy

  7. Near the top on the right, click the Search link to return to the basic search page. Perform this search: "personal privacy"

  8. "Personal Privacy"

  9. title:"personal privacy" (If that results in no hits, try
    +title:personal +title:privacy)

Analysis of Data

  1. You now have a set of searches and the number of hits each search returned. Why did they return varying numbers of hits? Following the help link, study the help page and explain the varying number of hits returned by each search. Discuss which search or searches are most likely to return the most useful results.

  2. Using the results of the final search, record the URLs of, and study, the first five Web pages returned by the search. On your own, establish five criteria for evaluating these pages and apply these criteria to the pages in order to rank them.

Thomas P. Kelliher
Mon Mar 8 10:39:36 EST 1999
Tom Kelliher