CS250      Lab3 - Regular Expressions

Objectives:

  1. Regular expressions, as the name suggests, represent regular languages.  JFLAP allows us to enter a regular expression, build an equivalent NFA, and test out the language represented by the expression.  Just select the Regular Expression button on the initial menu, after entering the regular expression select Convert to NFA.  Select the Do All button and then the Export button.  You should now have an equivalent NFA which you can test.

     
    For each regular expression, enter it in JFLAP (be sure not to include spaces in the regular expression), convert to an NFA, use multiple run tester to list five input strings in the language (if there exists at least five).  Finally try to clearly explain the regular language that it represents:
    i) a*b* + c
    ii) (ab+l)b  -- Hint:  l is represented by ! in JFLAP
    iii) (cd+b)*

     


  2.  
    Write a regular expression for each of the following languages and test it for correctness in JFLAP.
    i) The language with an even number of a's followed by an odd number of b's
    ii) The language with zero or more a's followed by three or fewer b's
    iii) The language in which every a must have a b adjacent to it on both sides.

     

  3. The program "egrep" is an acronym for "extended global regular expressions print" and may be used to search for a string or a more complex pattern in a file. Regular expressions provide a convenient, compact way of expressing patterns. The internal workings of egrep are based on finite automata.

    The command format is:
           
    egrep ‘regexp’ file
    where file is a file name and regexp is a regular expression whose format will be described shortly. Egrep returns all lines in the file which contain a match for the regular expression.

    The format of a regular expression in egrep is as follows:

    expression

    egrep notation

    r*

    r*

    r+

    r+

    r +l

    r?

    r+s

    r|s

    rs

    rs

    ( r )

    ( r )

    char c

    c

    special char

    \c

    any symbol

    .

    beginning-of-line

    ^

    end-of-line

    $

    any character listed

    […]

    any character not listed

    [^…]

    Login to phoenix and try the following:

    egrep ‘depend’ /usr/share/dict/words
    egrep ‘^y.*y$’ /usr/share/dict/words
    egrep ‘y.*y’ /usr/share/dict/words
    egrep ‘^rec(ei|ie)ve$’ /usr/share/dict/words
    egrep ‘^s..u.t..e$’ /usr/share/dict/words
    egrep ‘(^| )the +the( |$)’ ~jillz/cs250/lab3/testfile1
    egrep '[qQ][^u]' /usr/share/dict/words
     

    Clearly and succinctly describe the pattern that is being expressed by each of the regular expressions above.
     

     


  4.  
    Write egrep commands for the following:
    • All lines that contain the letter a and the letter b (either lower or upper case).
    • All lines that contain the word "a" (either lower or upper case) followed by a word beginning with a vowel.

    Test your commands on ~jillz/cs250/lab3/testfile1 and any test file of your own.
    Use the script command to capture your tests to turn in to me.


     

  5. Try the regex [0-9]?[0-9]:[0-9][0-9](am|pm) on the ~jillz/cs250/lab3/testfile2
     
    Correct the regex above so that it does not match illegal times like 99:99pm

     

     

  6. Regular expressions are part of the pattern matching facility of the language perl.  Two perl programs are available for you to try:
    ~jillz/cs250/lab3/convert  and ~jillz/cs250/lab3/mkreply.  Copy those files to your own directory.
    The first program takes user input which is a temperature in either celcius or fahrenheit and converts it to the other scale.
    The line $input = ~ m/^([-+]?[0-9]+)([CF])$/)  reads a line of input and matches it to the regular expression given.
    After the match, the special variables $1 and $2 will contain the parts of the inputs that matched the parenthesized groups in the expression.  So if the user entered -32C then $1 would contain -32 and $2 would contain C.

    Test out the program with the command:
    perl convert
     
    Modify the regular expression in convert so that the temperatures may be decimal values and the C and F may be either lower or upper case.  Also allow there to be possible whitespace between the number and the C or F. 


     

  7. The program mkreply takes an email as input and modifies the text to put it in the form of an email reply.  Copy the file mailTest from ~jillz/cs250/lab3/mailTest  and test this program with that input file using the command

    perl mkreply < mailTest

    This program uses the match command ~ m which we have already seen, and also uses the match and substitute command
    ~s/regex / subst /  which substitutes that matched input with the subst text.

    Notice the line  ~s/^/|> /
    This command matches ^ (which denotes the beginning of a line) and substitutes it with |> which has the effect of inserting that text at the beginning of those lines.

     
    Modify mkreply so the the $subject does not contain any preceding Re:
    For example, "Re: Re: Re: This is the subject" would only store "This is the subject".