CS 141 Introduction to AI Greenwald
TBA
10:00 PM, Mar 5, 2009
|
Project 2: Word Due: |
2 What to Do
2.1 Part I: Word Search Puzzle Generation as a CSP
2.2 Part II: Word Search Puzzle Generation as Local Search
A word search is a common childhood puzzle in which a set of words is hidden within a group of letters placed on a grid. Each word appears in a straight line either horizontally, vertically, or diagonally, and either forwards or backwards. For example, TRUCK and SHIP are hidden within the back wheel of the train in the above figure.
Your goal in this project is not to write a word search solver, but rather a word search generator. You will be provided with a set of words and a m × n board, and you must create a word search containing the given words. First, you will approach this problem as one of constraint satisfaction, where you simply must find a feasible configuration of the words on the board. Second, you will approach this problem as one of optimization, where you will be asked to optimize some objective function (e.g., maximize the number of words or minimize the size of the board).
Formulate the word search puzzle generation problem as a CSP. As a first step, formalize the problem of generating a word search puzzle given W words and an m × n board as a CSP. What are the variables? What are their domains? What are the constraints? Consider at least two formulations of the problem and speculate on the relative strengths and weakness of each. Which do you expect to work better, and why?
Formulate word search puzzle generation as a local search problem. All CSPs can be formulated as optimization problems. Using the standard recipe, reformulate the word search puzzle generation problem as an optimization problem. To do so, you must describe the states in your search space and an objective function. The standard objective is to minimize the number of “conflicts.” What kind of conflicts arise in word search puzzle generation? Further, define a neighborhood relation so that word search puzzle generation can be viewed as a local search problem.
This project consists of two parts. The first is to solve word search puzzle generation formulated as a CSP using CSP heuristics. The second is to solve word search puzzle generation formulated as a local search problem using local search techniques.
Recall that any CSP can be posed as a search problem on a tree, where each state corresponds to a partial assignment, the initial state is the empty assignment, the goal states are the complete and consistent assignments, and the successor relation extends partial assignments in a consistent manner. With this in mind, you are ready to begin implementing a word search puzzle generator.
Warmup. This first step is a sanity check to make sure that you initialize each variable’s domain properly.
In one of the classes, create a collection called Collection<TYPE> domain; where TYPE is whatever type of values the variables in your CSP can be assigned. Where you place this collection depends on what you have chosen to be your variables in the CSP.
Initialize the domain for each variable. This is probably most easily done in the constructor for that variable.
As a test, within the solve() method in the WordSearchGenerator class, print out the domain for each variable. The solve() method can be accessed by executing the main() method from the WordSearchGenerator class. Verify that your domains are what you expect them to be.
Now that you have created a domain for each variable, you are ready to implement algorithms that assign values from these domains to the variables.
Implement backtracking. This algorithm is depth-first search on a tree, where at each node, one of the as yet unassigned variables is assigned a value. For more information about the backtracking algorithm, see the course notes or page 142 of Russell and Norvig.
Implement the backtracking algorithm. Note: In order to reduce the amount of memory required, each search node should not contain a copy of the entire board. Instead, you should make incremental changes to the board as you traverse the tree. When you backtrack, undo these board changes.
Create at least three test problems in the puzzles directory and note the performance of your algorithm on each test instance. Document both the time the algorithm required to terminate, as well as the number of nodes visited before a solution was found.
Implement the minimum-remaining values heuristic. The minimum-remaining-values (MRV) heuristic helps the backtracking algorithm determine which unassigned variable to assign a value to next. It chooses the variable with the smallest domain. For more information on the MRV heuristic, see the course notes or page 143 of Russell and Norvig.
Implement the MRV heuristic. Make the heuristic configurable, so that you have the option of reverting back to the original, more naive variable selection method as desired.
As above, run your program on the word search test instances you created in the puzzles directory and note the runtime of your algorithm and number of nodes visited for each instance. Do the problems that the heuristic performs particularly well or poorly on share any common characteristics?
Implement the least-constraining-value heuristic. This heuristic helps the backtracking algorithm determine which value to assign to a given variable. The algorithm chooses the value that eliminates the fewest number of possible values for the other unassigned variables. For more information on the least-constraining-value heuristic, see the course notes or page 144 of Russell and Norvig.
Implement the least-constraining-value heuristic. Make this heuristic configurable so that you can revert back to randomly selecting a value to assign to the variable.
As above, run your program on the word search test instances you created in the puzzles directory and note the runtime of your algorithm for each instance. Try using different combinations of MRV and LCV (both, neither, etc.). If the performance of LCV along some dimension (time or number of nodes visited) is worse with this heuristic, try to figure out why that might be? Do you have any ideas for a different heuristic that might perform better?
Implement forward checking. Modify your program further to implement forward checking. In this modification, whenever a variable is assigned a value, the domains of all other unassigned variables are checked for feasibility. Any value to an unassigned variable that is not feasible is removed from the domain. Forward checking should help reduce the number of nodes visited. For more information on forward checking, see the course notes or page 144 of Russell and Norvig. After implementing this modification, run your program on your test instances and document its performance. Provide any insights you might have about the performance of forward checking in the word search generation problem.
Extra Credit: Implement arc consistency. Modify your program so that it performs arc consistency checks. In this modification, each pair of unassigned variables is checked for consistency, as follows: consider assigning unassigned variable x a value v in its domain. If there is another unassigned variable whose domain becomes empty when x is assigned v, then v is not a feasible assignment for x, and should be removed from x’s domain. With arc consistency, even fewer nodes should be expanded. For more information on arc consistency, see the course notes or page 145 of Russell and Norvig. After implementing arc consistency, run your program on each of your test instances and record the results.
A local search method can be used to solve a CSP by moving from state to state with the goal of minimizing the number of conflicts. For example, to use simulated annealing to solve a CSP, you might allow many conflicts initially, but as your search proceeds, you should allow fewer and fewer conflicts until eventually there are none left at all (hopefully). To avoid a situation in which a solution with no conflicts is not found, it may be necessary to restart simulated annealing multiple times or tweak the scheduled rate of exploration.
Generate Word Search Puzzles using Simulated Annealing. Your next task is to develop a variant of simulated annealing and apply it to the word search generation CSP posed as an optimization problem. Roughly speaking, each iteration of your algorithm should choose a word (presumably one involved in many conflicts) and place it in another location with (hopefully) few conflicts. For example, your algorithm could choose and place words probabilistically, being more likely to choose words involved in many conflicts, and more likely to place words in locations that create few conflicts. The details of your algorithm (e.g., how it makes probabilistic choices) are up to you, but be sure to document your choices in your README file.
Optimize your Word Search Generator. At this point, you have implemented two programs that generates word search puzzles. You may notice, however, that your word search puzzles are not always very interesting. For example, if you are given 4 words of length less than or equal to 4 to insert in a 4x4 board, your program might place each one in its own row beginning in the leftmost column. One way you could imagine generating more interesting word search puzzles is by maximizing the number of intersections between words.
Modify your local search algorithm so that at each iteration it chooses a word (presumably one involved in many conflicts but few intersections) and place that word in another location with (hopefully) few conflicts and many intersections. As above, the details of your algorithm (e.g., how it makes probabilistic choices) are up to you, but be sure to document your choices in your README file.
Summarize your results. Specifically, provide examples of word search puzzles generated by your two local search algorithms. Is one set of examples more interesting than the other?
Please hand in the following:
All the code necessary to solve Parts I and II of the assignment, README files, and any extra credit.
A short report tabulating the results of running each successive modification of your CSP search algorithm on your test instances, and summarizing the output of your two local search algorithms. The file table.tex in the support code, part of which compiles as shown, is a convenient format for reporting statistics on your implementations of the CSP heuristics.
|
Note: All written work must be submitted electronically, as a pdf document.
For a quick refresher on how to hand in assignments, please refer back to the course missive.
To copy the support code into your /u/yourlogin/course/cs141/projects/wordsearch directory, run the script /course/cs141/bin/cs141install wordsearch. This will install all of the support code in your /course/cs141/projects/wordsearch directory, including some test puzzles. Each puzzle file contains a word bank, which is a collection of words that you can try to fit into some sized board. These files have the extension .wrd and contain a single word on each line. Feel free to manually create your own .wrd files.
To test your algorithm, one simple way is to start the GUI’s main method, located in the class WordGUI. From here you can change the board size and load in a word bank. Once you have implemented a solve() method in WordSearchGenerator, you can also use this GUI to display your solution.
We now present a list of all the primary classes and a brief description of their most important methods.
WordSearchGenerator: This class contains the solve() method. The solve method takes as input a WordSearchProblem, and outputs a WordSearchSolution. The problem is a given list of words that must be inserted into a m × n board, and the solution is a data structure that does precisely that. The solve() method also takes some boolean variables as input, which specify whether MRV, LCV, or forward checking will be used to solve the problem.
WordBank: This class contains the collection of words that can be added to the word search board. When performing MRV, it might be useful to add a method that determines which particular Word to place on the Board.
Word: This is a simple data structure that contains a single word of the word search problem. It has a member variable called location, which can be used to keep track of where in the board this Word appears. Depending on your implementation, you may have to add a method (or more) to this class to intelligently determine which Location to assign to the given Word.
Location: A location is used to determine where a word appears in the board. It consists of two primary member variables: a Point representing the starting position of the word, and a Direction stating whether the word is oriented north, east, south, west, northwest, etc.
Letter: This is a simple wrapper class for a character that goes on the word search Board. Because char values cannot be stored in java Collections, if needed, they can be put in here first.
Board: A word search board is an m × n grid that typically contains a letter at each square. Boards can either be rectangular or toroidal, but you only need to solve rectangular problems for this assignment, and only rectangular boards can be created with the GUI. This Board class contains a number of useful methods such as insertWord(), which can be used to insert a Word into the Board.
BoardChanges: When a word is inserted into or removed from the board, the board’s BoardChanges are returned. This represents the letters that were added or removed from the board upon this word insertion or deletion, as well as their respective board positions. You shouldn’t have to make changes to this class. One important means of using BoardChanges is through an undo() method in the Board class. This method takes in a set of just-made BoardChanges and reverts the board back to its state before the changes were made.
WordSearchProblem: This is a simple data structure that contains a Board and a WordBank. This is passed to the WordSearchGenerator.
WordSearchSolution: This is a simple data structure that is returned by the solve() method in WordSearchGenerator. It contains both a Board and a WordBank, either of which can be used to represent the assignment of words to the board. There is also an optional list of BoardChanges that can be held to show the incremental steps made by your solve() method. The GUI reads from this list of BoardChanges, so the GUI will not be functional if you don’t keep track of the changes. Note, however, that keeping track of individual board changes takes up memory, and may reduce the size of the problems you’re able to solve. If you’d like to use the GUI but don’t want to store everything, you can insert BoardChanges at certain intervals, or just insert the final solution into this list of BoardChanges.
Optimizer: This class provides structures which store the location of letters in a WordBank, and how many of each letter are at each position. Additionally, it provides methods which tell you the number of intersections and the number of conflicts for a word at a given location. You are free to use or modify this code, or start from scratch and write your own optimizer class.
For the optimization part of the assignment you will be continuously changing the Locations of Words in the WordBank. Because you need to temporarily allow multiple letters to reside at the same Location, the Board class won’t necessary be useful. You can use the HashMaps provided in the support code, or write your own data structure to keep track of the Locations of letters.