Lab: Search Engine Follies

CS 102

Sept. 19, 2001

The purpose of this lab is to study some of the techniques used in narrowing the number of keyword ``hits'' one receives from a search engine. In this lab, we'll look at some fairly mechanical techniques.

Data Collection

Let's say we're interested in the topic of personal privacy. Using AltaVista, (www.altavista.com) perform the following searches, recording the number of hits received from each search:

  1. personal privacy

  2. Personal Privacy (Note: the two words are capitalized. Be observant of that in the succeeding searches.)

  3. +personal +privacy

  4. +Personal +Privacy

  5. Click on the Advanced Search link (bottom right-hand corner of the Search for box) and in the Boolean expression field type in the search personal NEAR privacy.

  6. In Advanced search, Personal NEAR Privacy

  7. Near the top of the page, on the left, click the Home link to return to the basic search page. Perform this search: "personal privacy"

  8. "Personal Privacy"

  9. title:"personal privacy" (If that results in no hits, try
    +title:personal +title:privacy)

Analysis of Data

  1. You now have a set of searches and the number of hits each search returned. Why did they return varying numbers of hits? Following the help link, study the help page and explain the varying number of hits returned by each search. Discuss which search or searches are most likely to return the most useful results.

  2. Using the results of the final search, record the URLs of, and study, the first five Web pages returned by the search. On your own, establish five criteria for evaluating these pages and apply these criteria to the pages in order to rank them in terms of authority. Rank them again in terms of usefulness.



Thomas P. Kelliher
Tue Sep 18 13:45:14 EDT 2001
Tom Kelliher