Lecture 09: Adversarial Search

TBA

Contents

1 Overview 1 2 Definition and Example 1 3 Minimax Search 3

3.1 RecursiveMiniMax ..................................... 3

4 αβPruning 4

4.1 Recursive αβPruning .................................... 7

4.2 Complexity ......................................... 8 5 Evaluation Functions 8 A Depth-First Search 10

A.1Minimax,Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

A.2 αβPruning,Revisited ................................... 10

1 Overview

This lecture is concerned with search spaces that model parlor games such as go, chess, checkers, othello, andbackgammon. At an abstractlevel, weareconcerned with two-player, zero-sumgames ofperfectinformationinwhich theplayers’ movesaresequential. Adversarial search algorithmsare designed to return optimal paths, or winning strategies, through game trees, assuming the players are adversaries—rational and self-interested: i.e., they play to win.

2 Definition and Example

A game tree is a 7-tuple Γ = P,X,S,T,δ,l,v, where

  • P is a set of n players
  • X is a finite set of states
  • S X is a nonempty set of start states
  • T X is a nonempty set of terminal states
  • δ : X 2X is a state transition function δ(x)is the set of successor states of x
  • l : X P labels state x with the player who moves at x
  • v : T [1,1]n maps terminal states into real-valued vectors vi(x)[1,1] is the payoff to player i at state x

n

Zero-sumgamesrequirethat =0,forall x T. In two-player, zero-sum games the two

i=1 vi(x)playersareviewed asadversaries; oneplayeristhemaximizer(Max); the other is the minimizer (Min). By definition, vMAX(x)= vMIN(x). Thus, it suffices to use one value to represent the value of a state: if v(x)> 0, then Max is the winner; if v(x)< 0, then Min is the winner; if v(x)=0, then the game is a draw.

A

+1 +1

Figure 1: Game tree for 2,2-Nim. Max nodes are squares; Min nodes are circles. The minimax value of this game is 1: there is a winning strategy for Min.

As an example, consider the zero-sum game of perfect information, m,p-NIM. Initially, there are p piles of m matches. To move, a player removes any number of matches from exactly one pile. The losing(orwinning) playeriss/hewhoremovesthe final match. Thegametreerepresentation of 2,2-NIM(fortwoplayers) isdepictedinFigure 1, whereitis modeled asfollows:

  • P = {Max,Min}
  • X = Y × P: e.g.,((1,1),Max)X Y = {(2,2),(2,1),(2,0),(1,1),(1,0),(0,0)} canonical representation where a b for allpairs(a,b)
  • S = {((2,2),Max),((2,2),Min)}
  • T = {((0,0),Max),((0,0),Min)}
  • δ((2,2),Max)= {((2,1),Min),((2,0),Min)}, δ((2,1),Min)= {((1,1),Max),((1,0),Max),((2,0),Max)}, ...
  • l((2,2),Max)= Max, l((2,1),Min)= Min, l((2,0),Min)= Min, ...
  • v((0,0),Max)=(+1,1), v((0,0),Min) =(1,+1)

3 Minimax Search

The minimax value of a game tree is the value of the root node x, whenever Min moves first, computed as the minimum value of x’s successors, which are in turn computed as the maximum

1

value of x’ssuccessors’successors,and soon.If theminimaxvalueof agametreeis+1,thenthere exists a winning strategy for max; if the minimax value is 1, then there exists a winning strategy for Min;ifthevalueof theroot nodeis0,thenneitherplayerhasawinning strategy. Thedefinition of the minimax value of agametree suggests abreadth-first-search style computation. Inpractice, the minimax algorithmtraverses nodesindepth-first-search(DFS) orderto ensurethat spaceis managed efficiently. Wepresenttwoimplementationsof minimaxsearch: the firstisrecursive(with backtracking) and the secondis an explicitDFSimplementation(seeAppendixA.1).

3.1 Recursive MiniMax

This recursive minimax algorithm traverses nodes in depth-first-search order, with backtracking (see Table 1). As the algorithm is recursive, it searches smaller and smaller game trees. Hence, we introducethefollowing notation:given(game) treeΓ, weletΓz denote the sub(game) tree of Γ rooted at node z.

Theinitial callinitializes α = 1(lowerbound)and β = +1(upperbound). The algorithm updates α values at Max nodes and β values at Min nodes. In the base case, the terminal nodes’ values are returned. In the inductive step, two cases arise. If the root node x is a Min (Max)node, then asitssuccessor’sareevaluated,theirvaluesare “backed up”—“minned” (“maxed”)with thevalue of x—until x’s value is computed, at which point this updated value, β (α), is returned.

MiniMax updates α-and β -values asshowninTable2during execution onthe2,2-NIMgametree in Figure 1.

1The definition of the maximin value of a game is analogous to that of minimax, whenever the root node is labeled Max.

MiniMaxx,α,β)

Inputs game tree Γx rooted at x
lower bound α
upper bound β
Output minimax value

1. if x T

(a)
if l(x)= Min, return vMIN(x)
(b)
if l(x)= Max, return vMAX(x)

2. if l(x)= Min

(a) for all y δ(x)

i. β = min{β,MiniMaxy,α,β)}

(b) return β

3. if l(x)= Max

(a) for all y δ(x)

i. α = max{α,MiniMaxy,α,β)}

(b) return α

Table 1: Recursive MiniMax with Backtracking.

αβPruning

Although the minimax solution always exists (Zermelo, 1912), it is intractable to compute the minimax solution in most interesting game trees. (On the contrary, in game trees like tic-tactoe for which computing the minimax solution is tractable could be said to be uninteresting for precisely that reason!)

Sophisticatedgame-playingprogramsprune the search space. The αβPruning algorithmimproves upon MiniMax: it is a technique that provably computes the MiniMax value of the game, but tends to visit only a fraction of the nodes of the MiniMax procedure because it prunes redundant subtrees.

Avariation is a completepath,from rootnodetoleaf node, throughthegame tree. Theprincipal variation is the path through the game tree along which the minimax value is discovered. In particular, the value of the leaf node on the principal variation is the minimax value of the game.

At most, adversarial search algorithms need only consider those nodes that are candidates for the principal variation. The key idea underlying αβPruning is the following: any subgame tree whose root node is provably not on the principal variation can be pruned.

Example 1 Consider the chess game tree depicted in Figure 2. Once node B is determined to lead to checkmate for Max, there is no need to evaluate any of node A’s other children, since node B—a winfor Max—is atleast aspreferableto Max as node C. Therefore, node C can bepruned.

1. A : α = max{−1,MiniMax(2,2-NimB,1,+1)}

(a)
B : β = min{+1,MiniMax(2,2-NimD,1,+1)}
i. D : α = max{−1,MiniMax(2,2-NimI ,1,+1)} A. I : β = min{+1,MiniMax(2,2-NimN ,1,+1)} I. N : return v = +1 B. I : β = min{+1,+1}= +1 C. I : return β = +1 ii. D : α = max{−1,+1}= +1 iii. D : α = max{+1,MiniMax(2,2-NimJ ,+1,+1)} A. J : return v = 1
iv.
D : α = max{+1,1}= +1
v.
D : return α = +1
(b)
B : β = min{+1,+1}= +1
(c)
B : β = min{+1,MiniMax(2,2-NimE,1,+1)}
i. E : α = max{−1,MiniMax(2,2-NimK ,1,+1)} A. K : return v = 1 ii. E : α = max{−1,1}= 1 iii. E : return α = 1
(d)
B : β = min{+1,1}= 1
(e)
B : β = min{−1,MiniMax(2,2-NimF ,1,1)}
i. F : α = max{−1,MiniMax(2,2-NimL,1,1)} A. L : β = min{+1,MiniMax(2,2-NimO,1,1)} I. O : return v = +1 B. L : β = min{+1,+1}= +1 C. L : return β = +1 ii. F : α = max{−1,+1}= +1 iii. F : return α = +1
(f)
B : β = min{−1,+1}= 1
(g)
B : return β = 1
  1. A : α = max{−1,1}= 1
    1. A : α = max{−1,MiniMax(2,2-NimC,1,+1)}
        1. C : β = min{+1,MiniMax(2,2-NimG,1,+1)}
        2. i. G : α = max{−1,MiniMax(2,2-NimM ,1,+1)}
        3. A. M : return v = 1
        4. ii. G : α = max{−1,1}= 1
        5. iii. G : return α = 1
      1. C : β = min{+1,1}= 1
        1. C : β = min{−1,MiniMax(2,2-NimH ,1,1)}
        2. i. H : return v = +1
      2. C : β = min{−1,+1}= 1
      3. C : return β = 1
  2. A : α = max{−1,1}= 1
  3. A : return α = 1

Table 2: MiniMax on 2,2-NIM. 5

A

Checkmate: +1

Figure 2: An example of αβPruning. Node C is not on the principal variation, since B is an optimal alternative for Max, so it can be pruned.

Technically, v(A) = max{v(B),v(C)}≥ v(B) =+1and v(A) +1; it follows that

v(A)= +1, regardless of the value of C.

Thisexamplecapturesthefollowing intuition: anode’svalue(e.g., C)cannot impact the minimax value ofitsparent(e.g., A)if its parent’s minimax value is known. The subtrees of a node whose value is already known can be pruned, since these subtrees are not on the principal variation.

Figure 3: Another example of αβPruning. Node C is not on the principal variation, since B is a preferable alternative for Max, so node E can be pruned.

Example 2 Consider the game tree depicted in Figure 3. Once node D is determined to yield value 1/4, there is no further need to evaluate any of node C’s children, since node B, with value 1/2, is at least as preferable to Max as node C. Therefore, node E can be pruned.

Technically, since v(C) = min{v(D),v(E)}≤ v(D) and v(D) v(B), it follows that

v(C)v(B). Therefore, v(A)= max{v(B),v(C)}= v(B), regardless ofthe value of E.

Thisexamplecapturesthefollowingintuition: anode(e.g., C) is guaranteed not tobe onthe principal variationif any ofthenode’s ancestors(e.g., A) presents a better alternative (e.g., B) for either player. The subtrees of such nodes can be pruned, since these subtrees are not on the principal variation.

4.1 Recursive αβPruning

αβPruning associates an α-value and a β-value with every game:

  • The α-value at Max node x is the maximum of two sets of values: (i) the α-values of all (complete) variations rooted at any of x’s ancestors, and(ii) the α-values of all(complete) variations rooted at node x. Hence, an α-value is a lower bound on the minimax value of the game: Max can guarantee that he wins at least this value.
  • The β-value at Min node x is the minimum of two sets of values: (i) the β-values of all variations rooted at any of x’s ancestors, and(ii) the β-values of all variations rooted at node

x. Hence, a β-value is an upperbound on the minimax value of thegame: Min canguarantee that he loses no more than this value.

For example, an α-value of 0.4 can be interpreted to mean that Max will get paid at least $0.40; while a β-value of 0.75 can be interpreted to mean that Max willgetpaid at most$0.75(i.e., Min will pay at most $0.75).

It is an invariant of αβPruning that α<β; otherwise, if α = β, the game is solved. The α and β values are computed by updating α-values at Max nodes and β-values at Min nodes, as in MiniMax. Pruning happens as follows:

  • If after updating at a Max node x, α strictly exceeds β, then all subtrees rooted at x can be pruned, since there exists a preferable alternative for Min elsewhere in the game. In this case, β (the valueMin can achieve elsewhere) is returned.
  • If after updating at a Min node y, β falls strictly below α, then the subtree rooted at y can bepruned, sincethere exists apreferable alternativefor Max elsewhere in the game. In this case, α (the valueMax can achieve elsewhere) is returned.

The recursive version of αβPruning appearsinTable 3. Theinitial call to this algorithminitializes α = 1(lower bound) and β = +1(upperbound). Let x denotethe root nodeof thegametree. If x is a terminal node, the value of x is returned. Otherwise, if x is nonterminal, then αβMiniMax proceeds to recursively call itself on each of x’s children in turn, updating the value of α or β depending on whether x is labeled Max or Min. With each recursive call, α is initialized as the maximum value that can be achieved at any of Max’s choice points along the path to x, and β is initialized as the minimum value that can be achieved at any of Min’s choice points along the path to x. These values are updated as variations rooted at x are completed. If after updating at a Max node, the value of α exceeds the value of β, then min can achieve β rather than α along an alternative path. No further variations in this subtree need be completed, so the algorithm breaks out of the loop and returns β. (Ordinarily, an updatedvalue of α is returned at Max nodes.) The algorithm operates symmetrically on Min nodes.

αβPruning updates α-and β-values as follows during its execution on the 2,2-NIM game tree in Figure 1. This trace differs from the MiniMax trace in Table 2 in that five nodes are pruned. Pruning is evidenced by a Max (Min)node returning a β (α)value rather than an α (β)value.

αβPruningx,α,β)

Inputs game tree Γx rooted at x
lower bound α
upper bound β
Output value in [α,β]

1. if x T

(a)
if l(x)= Min, return vMIN(x)
(b)
if l(x)= Max, return vMAX(x)

2. else if l(x)= Min

(a) for all y δ(x)

i. let β = min{β,αβpruningy,α,β)}

ii. if β α, return α

(b) return β

3. else if l(x)= Max

(a) for all y δ(x)

i. let α = max{α,αβpruningy,α,β)}

ii. if α β, return β

(b) return α

Table 3: Recursive αβPruning.

4.2 Complexity

In thebest case, αβpruningdiscoverstheprincipalvariationonits firsttraversal of thetree,andits

effective branching factor 2 is b. In this case, its complexity is only O(bd/2)—substantial savings over the minimax algorithm, which visits O(bd) nodes. In the worst-case, however, αβpruning prunes no nodes at all.

The order in which nodes are explored greatly impacts the savings that can be achieved via αβpruning. One reasonable ordering heuristic is to first perform iterative deepening search, and then to order successor nodes according to the backed-up values returned by the search at the previous depth.

5 Evaluation Functions

In spite of its effectiveness, αβpruningdoes notprovidesufficientpruningto computethe minimax solution in chess, for example. In general, in large game trees, search proceeds to some limited

2An effective branching factor b is that of a uniform tree of depth d if it were to contain N nodes. In other words, given that an algorithm expands N nodes, assuming depth d, b is the solution to the following equation: N =1+b+ ... +(b)d .

1. A : α = max{−1,αβPruning(2,2-NimB,1,+1)}

(a)
B : β = min{+1,αβPruning(2,2-NimD,1,+1)}
i. D : α = max{−1,αβPruning(2,2-NimI ,1,+1)} A. I : β = min{+1,αβPruning(2,2-NimN ,1,+1)} I. N : return v = +1 B. I : β = min{+1,+1}= +1 C. I : return β = +1 ii. D : α = max{−1,+1}= +1 iii. D : return β = +1
(b)
B : β = min{+1,+1}= +1
(c)
B : β = min{+1,αβPruning(2,2-NimE,1,+1)}
i. E : α = max{−1,αβPruning(2,2-NimK ,1,+1)} A. K : return v = 1 ii. E : α = max{−1,1}= 1 iii. E : return α = 1
(d)
B : β = min{+1,1}= 1
(e)
B : return α = 1
  1. A : α = max{−1,1}= 1
    1. A : α = max{−1,αβPruning(2,2-NimC ,1,+1)}
        1. C : β = min{+1,αβPruning(2,2-NimG,1,+1)}
        2. i. G : α = max{−1,αβPruning(2,2-NimM ,1,+1)}
        3. A. M : return v = 1
        4. ii. G : α = max{−1,1}= 1
        5. iii. G : return α = 1
      1. C : β = min{+1,1}= 1
      2. C : return α = 1
  2. A : α = max{−1,1}= 1
  3. A : return α = 1

Table 4: αβPruning on 2,2-NIM. In step 1(a)iii, node J is pruned; in step 1e, nodes F, L, and O are pruned; and in Step 3c, node H is pruned.

depth or for some limited time, at which point expansion of the search tree is truncated, and heuristic estimates of the minimax value are computed via an evaluation function and backed up the tree.

Most evaluation functions for the game of chess take into account material values: e(pawn)= 1, e(bishop)= e(knight)= 3, e(rook)= 5, and e(queen)= 9. Let W be the set of white’s pieces; let B betheset ofblack’spieces. Now w(n)= pW e(p)and b(n)= pB e(p). Onesimpleheuristic is to evaluate nodes as follows, assuming Max is white:

w(n)b(n)

e(n)= w(n)+b(n)

The value e(n)=1predictsasurewinforwhite; thevalue e(n)= 1predictsa surewinforblack; the value e(n)=0predictsadraw.

But evaluation functions return heuristic estimates, which are not perfect. Thus, determining the depth or time at which to truncate search in game trees is a delicate matter. If e(n) is changing rapidly, then single movesdramatically affect the(apparent) value of n. One popular heuristic is to allow search to proceed until quiescence. Another difficulty is that search algorithms cannot recognize drastic changes in the values of nodes if delay tactics push drastic moves beyond the horizon; this is called the horizon effect. Oneproposed solutiontothisproblemisto engagein a secondary searchbeyond the seeminglybest node(this techniqueis called singular extension.) If it is determined that this path degrades, then secondary search is performed on the second-best node; but it is impractical to conduct secondary search on all nodes.

A Depth-First Search

Inthis appendix, wepresentexplicit(i.e., non-recursive)depth-first-search versions ofthe MiniMax and αβ-Pruning adversarial search algorithms.

A.1 Minimax, Revisited

MiniMax-DFS (seeTable 6) traverses thegame treeindepth-firstfashion, visiting all nonterminal nodes twice. The firsttimeit visits anonterminal node,its valueis unknown; toevaluatethenode, its children are pushed onto the stack. The second time it visits a nonterminal node, its value is known; thus,itispopped offthestack, anditsvalueisbacked up toitsparent(SeeTable 7)(unless it is the root node). When MiniMax-DFS visits the root node for the second time, it terminates. Atthispoint,thevalueof theroot nodeistheminimaxvalueof thegame, since all ofits children’s values have been backed up, all of its children’s children’s values have also been backed up, and so on. Otherwise, all of its children would not have been popped off the stack, and it could not have beenvisitedforthesecond time; similarly, all ofitschildren’schildrenwould nothavebeenpopped off the stack, and their children could not have been visited for the second time, and so on.

MiniMax maintains the stack depicted in Table 5 during its execution on the 2,2-NIM game tree in Figure 1. Nodes are subscripted with their values.

A.2 αβPruning, Revisited

Pseudocode that implements αβPruning via explicit depth-first search is shown in Table 8. The depth-first search algorithm maintains both α-and β-values at all nodes. When a node is popped off the stack, a pruning test is immediately performed. (This test is described in the following paragraph.) If the node cannot be pruned, then the algorithm proceeds as in MiniMax-DFS. If it is a terminal node, it is evaluated, and its value is stored at the corresponding α-and β-values.

A−∞ B+A−∞ D−∞ B+A−∞ I+D−∞ B+A−∞ N+1 I+D−∞ B+A−∞ I+1 D−∞ B+A−∞ D+1 B+A−∞ J1 D+1 B+A−∞ D+1 B+A−∞ B+1 A−∞ E−∞ B+1 A−∞ K1 E−∞ B+1 CA−∞ E1 B+1 A−∞ B1 A−∞ F−∞ B1 A−∞ L+F−∞ B1 A−∞ O+1 L+F−∞ B1 A−∞ L+1 F−∞ B1 A−∞ F+1 B1 A−∞ B1 A−∞ A1 C+A1 G−∞ C+A1 M1 G−∞ C+A1 G1 C+A1 C1 A1 H+1 C1 A1 C1 A1 A1

Table 5: MiniMax-DFS on 2,2-NIM.

Otherwise, a test is performed to determine whether the node’s α-and β-values are initialized. If so, this visitis the second; all ofits children’s valueshavebeenbacked up, soits valueis nowbacked up, and it is deleted from the stack. If the node is not yet initialized, its values are initialized to those of its parents, and its children are prepended to the stack.

Thepruning testisperformed at all nodes otherthanthe rootnode. Thetest simply asks whether a node’s α-value is greater than or equal to its β-value, in which case the node can be pruned. As in the pseudocode, let n denote the node in question and m its parent node. Before any of m’s children are evaluated, α(m) (m); otherwise the subtree rooted at m would have been pruned already. If m is labeled Min, then as m’s children are evaluated, the value of β potentially decreases. If ever β(m)(m), then there exists an alternative formax at least as profitable; or, if β(m)= α(m), then the minimax value ofm is known. On the other hand, if m is labeled Max, then as m’s children are evaluated, the value of α potentially increases. If ever α(m)(m), then there exists an alternative for Min at least as desirable; or, if α(m)= β(m), then the minimax value of m is known. In all of these cases, the subtree rooted at m can be pruned.

MiniMax-DFS,x)

Inputs game tree Γ, root node x
Output minimax value of game tree
Initialize O = {x}, v(n)= for n �∈ T

while (O is not empty) do

  1. choose first node n O
    1. if n T
      1. if l(n)= Min, let v(n)= vMIN(n)
      2. if l(n)= Max, let v(n)= vMAX(n)
  2. if v(n)=

(a) if n x=

i. let m = δ1(n)

ii. BackUp1,n,m)

(b) delete n from O

4. else if v(n)=

(a) if l(n)= Min, v(n)=+if l(n)= Max, v(n)= −∞

(b) prependδ(n)to front of O return v(x)

Table 6: MiniMax Depth-First Search.

BackUp1,n,m) Inputs game tree Γ, node n, parent m Output updated game tree values

1. if l(m)= Min, v(m)= min{v(m),v(n)}if l(m)= Max, v(m)= max{v(m),v(n)}

Table 7: BackUp1 Subroutine: “backs up” values 1-level—from child to parent.

αβPruning-DFS,x) Inputs game tree Γ, root node x Output minimax value of game tree Initialize O = {x}, α(n)= β(n)= for n �∈ T

while (O is not empty) do

  1. choose first node n O
    1. if n = x
      1. let m = δ1(n)
      2. if α(m)β(m),delete n from O, continue
    1. if n T
      1. if l(n)= Min, let β(n)= α(n)= vMIN(n)
      2. if l(n)= Max, let α(n)= β(n)= vMAX(n)
  2. if α(n)�⊥ and β(n)=

=

(a) if n x=

i. let m = δ1(n)

ii. αβBackUp1(Γ,n,m)

(b) delete n from O

5. else if α(n)= β(n)=

(a)
α(n)= 1 and β(n)=+1
(b)
if n x

=

i. let m = δ1(n)

ii. α(n)= max{α(n)(m)}β(n)= min{β(n)(m)}

(c) prependδ(n)to front of O

if l(x)= Min, return β(x) if l(x)= Max, return α(x)

αβBackUp1,n,m) Inputs game tree Γ, node n, parent m Output updated α and β values

1. if l(m)= Min, β(m)= min{β(m)(n)}if l(m)= Max, α(m)= max{α(m)(n)}

Table 8: αβPruning Depth-First Search.

13