Monday, September 27, 2004 More practice writing shell scripts with a focus on problem solving. Here are some additional hints for debugging shell scripts: In previous exercises, I suggested that you add print statements of the form 'echo $variable' to examine the value of key variables at strategic points in your code. I also gave you tips about things to check when you encounter errors, e.g., make sure that your DON'T use the dollar sign when setting a variable but that you DO use it when you want to know the value of a variable. The 'csh' command takes a number of options but two of them are particularly useful in debugging. The 'echo' option specified by '-x' displays each line of the script after variable substitution but before execution. The 'verbose' option specified by '-v' displays each line of the script just as you typed it. You can use these options individually or in combination. You can apply these options in a number of ways. You can add them to the directive that appears at the beginning of a shell script as in: #!/bin/csh -xv You can also call 'csh' with these options and the name of the file containing the script you're trying to debug: % csh -xv script You can use 'set' and 'unset' to turn the corresponding shell variables 'echo' and 'verbose' on and off selectively, for example, in a particular block of code where you think you have a problem: % cat script #!/bin/csh ... ... set verbose set echo ... ... <<< suspected problem area ... unset verbose unset echo ... ... Try these two options out so you'll be prepared when you run into a really nasty bug. Now let's take a quick review of the major flow-of-control constructs. 'foreach' - do something for each item in a list of items foreach variable ( list ) something something end Here we write a 'foreach' loop by interacting with the shell: % set r = 0 % foreach n ( 1 2 3 4 5 6 7 8 9 ) foreach? @ r += $n foreach? end % echo $r 45 If this 'foreach' loop was in a script, it would look like: #!/bin/csh set r = 0 foreach n ( 1 2 3 4 5 6 7 8 9 ) @ r += $n end echo $r If we wanted to get input from the user, it would look like: #!/bin/csh set r = 0 foreach n ( $argv ) @ r += $n end echo $r Note carefully where variables are preceded by a dollar sign and where they aren't. Is the usage in 'foreach' loops consistent with the usage in 'set' and '@'? Suppose you want to print out 2nd, 3rd, ... last argument divided by the 1st argument. In particular, you don't want to print out the 1st argument. Will a 'foreach' loop work in this case? What does the following 'foreach' loop do? foreach file ( `ls` ) echo $file end 'if' - do something on the condition that if ( condition ) something or if ( condition ) then something something endif or if ( condition ) then something something else if ( condition ) then something something endif Note that the locations of the keywords 'if', 'then', 'else' and 'endif' are critical. For example, every line but one in the following conditional statement has a syntax error. Which line has the correct syntax? if ( condition ) <<< missing 'then' keyword then something something <<< misplaced 'then' & two commands else if ( condition ) then something <<< command following 'then' keyword if ( condition ) something <<< this would be all right if alone endif % set x = 0 % if ( $x == 0 ) echo "Big fat zero!" Big fat zero! % if ( ! ( $x > 0 ) && ! ( $x < 0 ) ) echo "Big fat zero!" Big fat zero! % set x = 1 % if ( $x != 0 ) echo "Ain't zero!" Ain't zero! % if ( ! ( $x == 0 ) ) echo "Ain't zero!" Ain't zero! % if ( ! $x ) echo "Big fat zero!" % set x = 0 % if ( ! $x ) echo "Big fat zero!" Big fat zero! What's going on in the last two conditionals? Almost invariably when you set out to write a script you'll discover that you need a couple of new things. The basic skeleton may be a simple 'foreach' loop, but there will be some tricky little steps required to fetch the data, convert it into an appropriate format, perform operations on the converted data, and then spit it out the results to various files or to the standard output. Before you start writing any code think about how you would go about solving the following problem. Just read the first paragraph and then imagine that you are performing the various steps yourself. Make an outline of your solution before you begin to write or even think about writing any code. 1. Write a script 'sorthat' that sorts the words in a specified file into three new files: a file named 'nums' containing all the words corresponding to numbers or words beginning with numbers, a file named 'alphs' containing all the words that don't begin with a number and either begin with the letter 'L' (whether uppercase or lowercase) or some letter that appears earlier in the alphabet than 'L', and finally a file named 'bets' containing all the words that begin with the letter 'M' or some letter that appears later in the alphabet than 'M'. You can use the 'tr' (translate) command to convert the characters in the file to lowercase and eliminate any characters other than numbers or lowercase letters. You might want to go back and look at the ciphers and secrets exercise (09-20-04.txt) to recall how to use 'tr' to preprocess the input file. Comparing alphanumeric strings turns out to be a little awkward (the pun is definitely intended) in the C-shell. The operators '<', '>', '>=' and '<=' only work on numbers, and, while '==' and '=~' work with strings, they don't allow us to establish an ordering among strings. The 'awk' utility comes to the rescue: % set u = aardvark % set v = albatross % echo $u $v | awk '{ print ( $1 < $2 ? $1 : $2 ) }' aardvark It looks like we're just using '<' which we had available in the C-shell, but the 'awk' '<' is more powerful than the 'csh' '<'. The above invocation is equivalent to the following which employs slightly more conventional syntax: % echo $u $v | awk '{ if ( $1 < $2 ) print $1 ; else print $2 }' aardvark We can actually simplify the 'awk' script since all we really care about is whether or not the first argument 'u' is less than the second argument 'v'. For this all we need is the following: % echo $u $v | awk '{ print ( $1 < $2 ) }' 1 % echo $v $u | awk '{ print ( $1 < $2 ) }' 0 The 'awk' command is pretty complicated (it has its own specialized scripting language) and I don't recommend that you spend a lot of time learning about its intricacies apart from picking up the few idioms that I've shown you in various scripts. You can save this as a shell script and give it a mnemonic name to make it a little more convenient to use. Note that strings (words) that begin with any digit are always lexicographically 'less' than strings that begin with any alphabetic character. % cat stringlessthan #!/bin/csh echo $1 $2 | awk '{ print ( $1 < $2 ) }' % stringlessthan aardvark albatross 1 % stringlessthan aardvark 17 0 Now armed with the 'stringlessthan' command this exercise ought to be relatively straightforward. 'while' - do something as long as some condition is met while ( condition ) something something end % set n = 10 % set i = 1 % set r = 0 % while ( $i < $n ) while? @ r += $i while? @ i ++ while? end % echo $r 45 What roles do the variables 'n', 'i' and 'r' play in this script? You can implement a 'foreach' loop using a 'while' loop. Rewrite the following 'foreach' loop as a 'while' loop: #!/bin/csh set r = 0 foreach n ( $argv ) @ r += $n end echo $r You can 'nest' loops; you can have 'foreach' loops inside of 'while' loops or visa versa. The 'break' command allows you to terminate the inner-most loop in which the 'break' appears. Here's a simple example illustrating the use of the 'break' statement in checking a list of numbers for primes: % cat primes #!/bin/csh foreach n ( $argv ) @ d = $n / 2 while ( $d > 1 ) @ r = $n % $d if ( $r == 0 ) break @ d -- end if ( $d == 1 ) echo $n end What if we didn't have the 'break' command? How else might we exit the loop upon determining that the number 'n' is not a prime? 2. Write a command 'diagonal' that takes a single integer argument and outputs a diagonal matrix with ones along the diagonal and zeros off the diagonal. You'll need one new command to write this script. The command 'printf' allows you to print formatted text. While 'echo 1' prints a '1' it also prints a carriage return (newline). The command 'printf "1 "' prints a 1 followed by a space but no newline. If you want to print a new line you would include a newline character ('\n') in the string argument to 'printf' as in 'printf "0 \n"'. You can read more about 'printf' using 'info' but the above description will suffice for implementing the 'diagonal' script. % diagonal 3 1 0 0 0 1 0 0 0 1 % diagonal 5 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 Command substitution The backtick or backquote enables you to assign a string or variable the output of a command. % echo "`date +%h` is the `date +%m`th month." Sep is the 09th month. When assigned to a variable, the output of a command results in the variable being assigned a list (or array) of words. % ls 09-15-04.txt 09-20-04.txt 09-22-04.txt 09-24-04.txt 09-27-04.txt % set exercises = ( `ls` ) % echo $exercises 09-15-04.txt 09-20-04.txt 09-22-04.txt 09-24-04.txt 09-27-04.txt As with any list you can determine how many items it contains or refer to any particular item in the list by its index. % echo $#exercises 5 % echo $exercises[1] 09-15-04.txt % echo $exercises[2] 09-20-04.txt You can also refer to a subsequence of the words in the list by specifying a beginning and ending index. % echo $exercises[2-4] 09-20-04.txt 09-22-04.txt 09-24-04.txt % echo $exercises[3-$#lst] 09-22-04.txt 09-24-04.txt 09-27-04.txt The 'shift' command shifts all the items in a list to the left. The first item is discarded and the list is shortened by one. % set lst = ( 1 2 3 4 ) % echo $lst[1] 1 % echo $#lst 4 % shift lst % echo $lst[1] 2 % echo $#lst 3 Note that I wrote 'shift lst' and not 'shift $lst'. Is this consistent with how 'set', '@' and 'foreach' work? Let's return to a problem that I posed earlier. Suppose you want to use a 'foreach' loop to print out the 2nd, 3rd, ..., up to the last argument on the command line divided by the 1st argument, and, in particular, you don't want to print out the 1st argument. Do this now using the 'shift' command. Some advanced topics for the determined and intrepid student. Using subroutines in shell scripts - advanced topic #1 You've already been using subroutines in shells; every time you invoke a command inside of a script you're executing a subroutine. In some cases, we execute a command for its 'side effects'. For example, you might execute 'mkdir images' to create a directory. The creation of the new directory is called a 'side effect'. In other cases, e.g., command substitution (above), we execute a command for the output returned to the shell. Unless you take pains to avoid it, the output of commands is directed to the standard output and you see the output scrolling across your screen. It doesn't matter whether you execute a command directly in the shell or indirectly by executing a script which then executes other commands: % cat testout #!/bin/csh echo "Starting 'testout'" subtest echo "Finishing 'testout'" % cat subtest #!/bin/csh echo "Executing 'subtest'" % testout Starting 'testout' Executing 'subtest' Finishing 'testout' You can, of course, redirect output to files using '>' or '>>' and you can pipe the output from one command into another command as in one-line script 'ls | wc -l' (which counts the number of files and directories in the current working directory). You can also use the backtick operator as just mentioned to set a variable to be a list as in 'set var = `ls`' or specify the list in a 'foreach' loop as in the idiomatic usage 'foreach var ( `ls` ) ... end'. In the following, I illustrate how use the output generated by one command can determine the flow of control of another command by using conditional statements. First we define a command that checks to see if its only argument is prime. % cat isprime #!/bin/csh set n = $argv set p = 1 @ d = $n / 2 while ( $d > 1 && $p ) @ r = $n % $d if ( $r == 0 ) set p = 0 @ d -- end if ( $p ) then echo 1 else echo 0 endif The echo commands are placed so that 'isprime' outputs a one (1) if its sole argument is prime and a zero (0) otherwise. The 'p' is used as a flag to exit the loop when a divisor is found (thus indicating that the input is composite and therefore not a prime) and subsequently to determine whether it should output a one or zero. The 'isprime' command is now a general utility that we can use in any number of contexts. We can use the 'isprime' command in a conditional statement: % if ( `isprime 3` ) echo "Prime!" Prime! % if ( ! `isprime 4` ) echo "Composite!" Composite! We can also use 'isprime' to write a more compact version of the 'primes' command: % cat primes.sub #!/bin/csh foreach n ( $argv ) if ( `isprime $n` ) echo $n end Exiting from a shell script using 'exit' - advanced topic #2 The 'exit' command allows you to exit from a script at any point. 3. Rewrite the 'isprime' command without the 'p' variable using 'exit' to terminate the loop when a divisor other than one is found. Getting user input in a shell script - advanced topic #3 The pseudo variable '$<' substitutes a line read from the standard input, with no further interpretation. It can be used to read from the keyboard in a shell script. See if you can figure out what the following script does: % cat guess #!/bin/csh set n = `date +"%M"` while ( 1 ) set m = $< if ( $m > $n ) then echo "Too high!" else if ( $m < $n ) then echo "Too low!" else echo "You got it!" exit endif end If you're ambitious you might write a shell script that plays a a game of tic-tac-toe with a user. Using the exit status of a command - advanced topic #4 Every Unix command returns an exit status. By convention, if the command is successful, it returns an exit status of zero. If the command fails, then it returns an exit status greater than zero with the exact value depending on how it failed. For example, the 'grep' command returns one (1) if it can't find the pattern it's looking for and two (2) if it can't find the file supplied as its second argument. In the C-shell the 'status' variable, is set to the exit status of the last command executed: % grep foreach isprime while ( $d > 1 ) % echo $status 0 % grep foreach isprime % echo $status 1 In writing a C-shell script you can use the keyword 'exit' with an argument to make the script return a particular exit status based on whatever you define as success or failure. 4. Rewrite the 'isprime' command yet again, this time write it to have an exit status of zero (success) if its only argument is prime and one (failure) otherwise. You solution should contain the invocations 'exit 0' and 'exit 1' and no 'echo' statements. Here is what we might expect of your new version of 'isprime': % isprime 3 % echo $status 0 % isprime 4 % echo $status 1 There is a special syntax that allows you to use the exit status of a command in an 'if' statement; you use curly braces instead of the usual parentheses. The logic of using commands as conditional is a little twisted however. Zero is false in a conditional while one is true. % if ( 0 ) echo "No!" % if ( ! 0 ) echo "No Way!" No Way! % if ( 1 ) echo "Way!" Way! In evaluating a command, if the command is successful (that is to say it returns an exit status of zero), then the curly braces tell the shell to return a one. If the command fails (it returns an exit status greater than zero), then the curly braces tell the shell to return a zero. % if { isprime 3 } echo "Prime!" Prime! Now we can rewrite the 'primes' command as follows: % cat primes #!/bin/csh foreach n ( $argv ) if { isprime $n } echo $n end