1 Overview 1 2 Definition and Example 2
2.1 TSP ............................................. 2
2.2 TSP asSearch ....................................... 3
2.2.1 HamiltonianPaths ................................. 3
2.2.2 HamiltonianCircuits ................................ 4
2.3 TSP asCombinatorialOptimization ........................... 4
3.1 Best-ImprovementSearch ................................. 5
3.2 First-ImprovementSearch ................................. 6 4 Simulated Annealing 7 5 Local Beam Search 9 6 Genetic Algorithms 9
Thetopic of thislectureislocal search methodsforcombinatorial(i.e.,discrete) optimization—the problemof findingaglobally optimal stateinvery large search spaces. Such optimizationproblems pervade many fields, including engineering, the natural sciences, and management:
• electrical engineering: integrated circuit design
• biology and chemistry: gene sequencing, protein folding
• management science: inventory planning, procurement of raw materials, manufacturing and distribution of finishedproducts
Informally, the following features characterize combinatorial optimization problems:
• combinatorial(i.e., very large) state space • cost(value) functiontobe minimized(maximized)
When these additional features hold, local search is often the solution technique of choice:
In contrast to typical search problems, the solution to a combinatorial optimization problem does not usuallyincludethepathto agoal, only thegoalitself. Instead, everystate represents a complete solution to the problem, although not necessarily an optimal one. This observation suggests the use of iterative improvement algorithms to solve such optimization problems: initialize the algorithm at some(possibly random) state, anditerativelyperturb the current solution, updating it if any of the tested perturbations yields an improvement.
Local search algorithms—the focus of this lecture—seek iterative improvements in the local neighborhood of the current solution. We study: (i) hill-climbing algorithms, which greedily improve uponthecurrentsolution;(ii) simulated annealing, which temporarily explores changes worsethan the current solution,but ultimately only exploitsimprovementsinthe current solution;(iii) local beam search, which combines aspects of hill-climbing and simulated annealing, and (iv) genetic algorithms, which can be viewed as a form of parallel local search.
An optimization problem consists of a state space X and an objective function Obj : X → R, where a(goal) state x ∗ is either a(global) minimum or a maximum(i.e., it is an optimum). Given
∗
an optimization problem, x ∈ X is a minimum iff Obj(x ∗) ≤ Obj(x), for all x ∈ X; similarly,
∗
x ∈ X is a maximum iff Obj(x ∗) ≥ Obj(x), for all x ∈ X. This lecture is tailored toward minimization problems and algorithms; maximization problems are handled analogously.
A local search problem is an optimization problem together with a neighborhood structure. The function N : X → 2X returns the neighborhood of state x, which typically consists of simple perturbations of x for which Obj(x)can be computed efficiently, perhaps incrementally. The state x ∈ X is a local minimum iff Obj(x)≤ Obj(y), for all y ∈ N(x). Local search algorithms, which seekto optimizein neighborhoods(i.e.,locally), returnlocal optima.
Figure 1 presents aninstance of thetraveling salespersonproblem(TSP), a classicNP-hard 1 combinatorial optimization problem. Providence, Boston, Hartford, Washington D.C., and New York City and theirrespectivedistances(asthecrow flies) aredepicted. Apath through thegraph that visits all cities exactly once and returns to its origin is called a tour. The objective in TSP is to find a tour of minimal distance. The optimal tour in Figure 1 is PBHWNP of distance 795.
1 An NP-hard problem is at least as hard as the problems in the complexity class known as NP. NP stands for nondeterministic polynomial time. No (deterministic) polynomial time solution is known for any of the problems in this class, or for any NP-hard problems as a result.
41
302 154
204
Figure 1: Traveling Salesperson Problem.
BeforeformulatingTSP as alocal searchproblem(states are tours), weformulateit as a typical search problem in which tours are constructed incrementally. Tours constructed via search are often used to initialize iterative improvement methods.
Given a set of cities Y of cardinality N, and given the corresponding distances between cities (d(x,y), for all x,y ∈ Y), we formulate TSP as a search problem in two ways. In the first formulation, states are Hamiltonian paths: paths that do not visit any city twice. In the second formulation, states are Hamiltonian circuits: cycles that do not visit any city twice.
Let the set of states X consists of allHamiltonianpaths: i.e.,paths thatdo not visit any city twice. Formally, X = {(x1,...,xn) | n =1,...,N +1,xi ∈ Y for all 1 ≤ i ≤ n, xi �= xj for all 1 ≤ i �= j≤ n, unless i =1, j= N +1}
The set of goal states includes all states of length N +1. The set of start states includes all states oflength1. Thetransitionfunctionbetweenstatesisdefined asfollows: atstate(x1,...,xn),
{(x1,...,xn,xn+1) | xn+1 =�xi ∈ Y, for all 1 ≤ i ≤ n,}, if n<N δ(x1,...,xn)= {(x1,...,xn,x1) | if n = N
Figur e 2 depicts the first three levels of the search space in this formulation for the traveling salesperson problem depicted in Figure 1.
One simple heuristic to solve TSP in this formulation is “nearest-neighbor.” Beginning at city x, the salesperson visits city y s.t. d(x,y) is minimal; from city y, s/he visits city z s.t. d(y,z) is minimal, so long as doing so does not create a cycle of length less than N, in which case s/he visits a city of the next least distance; and so on. Applying this heuristic to the search problem in Figure 2 generates the suboptimal path PBHNWP of distance 796.
PB PH PNPW
384
302
204 ...
.. ...
.
94
94
190 100 100
PBH PBN PBW PHB PHN PHW PNB PNH PNW
.
.
.
.
.
.
.
.
.
... .. ... .. ... .. ... .. ... .. ... .. ... .. ... .. ... ..
Figure 2: Search in TSP: States are Hamiltonian Paths.
Let the set of states X consists of all Hamiltonian circuits: i.e., cycles that do not visit any city twice. Formally,
X = {(x,y1,...,yn,x) | n =1,...,N −1,x �= yi �= yj ∈ Y for all 1 ≤ i �j≤ n}
= Inthiscase,thetransitionfunctionbetweenstatesisdefinedasfollows: at state(x,y1,...,yi,yi+1,...,yn,x), δ(x,y1,...,yn,x)= �yi �x ∈ Y for all 1 ≤ i ≤
{(x,y1,...,yi,y,yi+1 ...,yn,x) |y == n}
at cost d(yi,y)+d(y,yi+1)−d(yi,yi+1). The set of goal states includes all states of length N +1. The set of start states includes all states of length 2.
PP
PBP 82
PBHP
503 396 389
Optimal
Figure 3: Search in TSP: States are Hamiltonian Circuits.
A greedy heuristic for solving TSP in this formulation is simply: insert a city into the tour that causes the smallest increase in the length of the tour. This heuristic finds an optimal tour in our example. (See Figure 3.)
TSP can also be formulated as a local search problem, by defining states as tours. There are many possible operations for computing local neighborhoods. One example is inversion: given a tour, consider swappingthe order of visiting each consecutivepair of cities on the tour(except the origin). More generally, an operation called 2opt eliminates any two edges, say(x1,x2) and(y1,y2), and reconnectsthe nodesinthe opposite way, as(x1,y2)and(y1,x2). Inversion and 2opt are useful in practice because their computation is cheap—just subtract the distances of the deleted edges and add the distances of the new edges.
How might a local search algorithm behave on our sample TSP? Suppose the initial state is PWBHNP ofdistance1099. ApplyinginversionyieldsPBWHNP ofdistance991,PWHBNP ofdistance 1097, and PWBNHP, of distance 1106. A local search algorithm would accept PBWHNP, since 991 < 1099; but, it could also reasonably accept PNBHWP since 1097 < 1099. Assume PBWHNP is accepted. Applying inversion againyieldsPWBHNP(the original tour),PBHWNP ofdistance 795, and PBWNHP of distance 804. Now, if the tour PBHWNP is accepted, the algorithm would verify thatit encountered alocal(infact,global) minimum andhalt. (SeeFigure 4.)
PWBHNP
PWHBNP PBWHNP PWBNHP
PBHWNP PBWNHP 795 804 Optimal
Figure 4: Local Search in TSP.
Hill-climbing is a family of local search algorithms that consider changes in the local neighborhood of a given state and accept changes that improve upon the current solution. Best-improvement and first-improvement are examples of hill-climbing algorithms. Other variants include GSAT and WalkSAT, which are specialpurpose algorithmsfor solving satisfiability(seeLecture#7).
The mainideabehindbest-improvement searchis to consider all statesin thelocal neighborhood of thecurrent state, and toaccept onethatbestimprovestheobjectivefunction. Best-improvement search terminates at alocal(but not necessarilyglobal) minimum. Thus,theperformance ofbestimprovement search is rarely globally optimal in landscapes characterized by foothills.
Best(X,Obj,N,x) Inputs local search problem random start state x Output best state visited x
′
Initialize x �= x
′
while (x �x)= do
′
3. choose x in A return x
Table 1: Best-Improvement Search.
Unlike best-improvement search, first-improvement search does not consider all neighbors of the current solution. Onthecontrary,itconsidersitsneighborsat randomand acceptsthe first onethat demonstrates an improvement. It halts if ever it loses its patience before finding an improvement. First-improvement search is also called stochastic local search and randomized hill-climbing.
First(X,Obj,N,p,x,ǫ)
| Inputs | local search problem |
| patience time limit p | |
| random start state x | |
| rate of exploration ǫ | |
| Output | best state visited x |
| Initialize | t = 0 |
while (t<p)do
1. for some y ∈ N(x)
(c) else increment t return x
Table 2: First-Improvement Search.
Given infinite patience, stochastic local search terminates at a local optimum with probability 1. But rather than run the algorithm with too much patience, it is usually run repeatedly with random-restarts. Giveninfinitepatienceand aninfinitenumberof randomrestarts, stochasticlocal
Random(X,Obj,N,p,n,ǫ)
Inputs local search problem patience time limit p number of restarts n rate of exploration ǫ
∗
Output best state visited x
Initialize Obj(x ∗)= ∞
for i =1 to n
1. choose start state x ∈ X
2. z = First(X,Obj,N,p,x,ǫ)
∗
3. if Obj(z)< Obj(x ∗),x = z
∗
return x
Table 3: Stochastic Local Search with Random-Restarts.
Simulated annealing generalizes stochastic local search. With some probability, say p, simulated annealing accepts state changes that adversely affect the value of the objective function. In other words, in accordance with p, it encourages exploration of harmful states; otherwise, it exploits successful states.
A probability p of exploration can be determined in several ways. One approach is simply to fix p at some small value ǫ> 0. Another option is to let p decrease with time: p ∼ 1/t. Yet another idea is to let p decrease as Δ(x,y) = Obj(y)−Obj(x) increases, where x is the current solution and y ∈ N(x).
Updating in the spirit of this third idea can be achieved as follows: if Δ(x,y) < 0(i.e., y is an improvement), then let p = 1: i.e., exploit; if Δ(x,y) ≥ 0 (i.e., y is not an improvement), let
−Δ(x,y)
p = e. In this way, small increases in the value of the objective function imply large values of p (exploration is likely;exploitation is not), whereas large increases in the value of the objective functionimply small values of p (exploitationislikely;explorationisnot). (Aspresented, notethat all ties are broken in favor of y, since e0 =1.)
Simulated annealing is a local search algorithm based on a combination of ideas: (i) exploration decreases with time and(ii) explorationis morelikelyifit makes things only slightly worse, andless likely ifit makesthings significantly worse. Specifically, simulated annealing relies onthefollowing
| exploration probabilities: for T > 0, | ||
|---|---|---|
| � | ||
| p= | e−Δ(x,y)/T 1 | if Δ(x,y)≥ 0 otherwise |
Simulated annealingderivesitsnamefromtheinterpretation of T as a temperature, which through slow cooling strengthens metal during its production. Thus, exploration is more likely than ex
SA(X,Obj,N,m,x,C)
| Inputs | local search problem |
| number of iterations m | |
| random start state x | |
| cooling schedule C | |
| Output | best state visited x ∗ |
| Initialize | x ∗ = x, T |
for i =1 to m
1. for some y ∈ N(x)
∗
(d) if Obj(x)< Obj(x ∗),x = x
2. decay T according to schedule C
∗
return x
Table 4: Simulated Annealing.
ploitation initially, when T (and hencep)is high, but later in the search when T (and hencep)is low, exploitation is more likely than exploration. (See Figure 5.)
The originalcooling scheduleproposedbyKirkpatrickistolet Tt+1 = T0α⌊t/l⌋,for T0 ≥ 0,α ∈ (0,1), and1 ≤ l ≤ m, where m isthenumberofiterations. ForT0 =10, α =0.99, l =10, andΔ(x,y)=1, the cooling schedule and exploration probabilities are shown in Table 5.
| t | ⌊t/l⌋ | α⌊t/l⌋ | Tt | pt |
|---|---|---|---|---|
| 1 | 0 | 1 | 10 | 0.9 |
| 10 | 1 | 0.99 | 9.9 | 0.9 |
| 100 | 10 | 0.9 | 9 | 0.895 |
| 1000 | 100 | 0.367 | 3.67 | 0.76 |
| 10000 | 1000 | 4.32× 10−5 | 4.32× 10−4 | 0.0 |
| 100000 | 10000 | 0.0 | 0.0 | 0.0 |
Table 5: A Sample Cooling Schedule with Exploration Probabilities
Note that, if T0 = ∞, then simulated annealing behaves like a random walk. On the other hand, if T0 = 0, then simulated annealing reduces to first-improvement search. Thus, simulated annealing terminates at the local optimum that arises as Ti → 0 and m →∞, with probability 1.
Boltzmann Distribution
0.2
0
0 0.2 0.4 0.6 0.8 1
Figure 5: Boltzmann Distribution.
Local beam search is a local search algorithm that operates like best-improvement search, except that its stores a beam of w improvements at eachiterationinstead ofjust1. More specifically,the beam is initialized to contain w random states; then, all the successors of each state on the beam are generated; next, the beam is set to contain the best w states among those on the beam and their successors. Thisgenerate and setprocess continues until nofurtherimprovements arefound.
Exercise: Explainhowlocalbeam searchdiffersfrom runningw parallelbest-improvementsearches.
In practice, when a search space contains clumps of low-valued nodes concentrated in certain areas,localbeamsearch can reduceto an even moreexpensive version ofbest-improvement search, searching essentially the same part of the space w timesinstead ofjust once. An alternativeidea, known as stochastic local beam search, again maintains a beam of width w, but the nodes to be put on the beam are chosen stochastically with probabilities that depend on their values.
Genetic algorithms (GAs) are a class of parallel local search algorithms based on the principles of natural selection and survival of the fittest. GAs maintain populations (or generations) of individuals (or chromosomes) made up of genes, in which each individual represents a solution. New populations are bred from old using three operations: selection (a form of exploitation), or asexual reproduction, in which individuals survive in proportion to their respective fitness; crossover (aform ofexploration), which mimics sexual reproduction,in which thegenetic material of two individuals is exchanged; and mutation, which alters genetic material, thereby preventing the population from stagnating.
A fitness function determines the probability with which an individual survives from one generation to the next. The following examples apply to maximization problems: let x0,...,xN denote individuals sortedbyquality(i.e., Obj(xi))from greatest to least:
1. Standard method: Obj(xi)
fi = �N j=0 Obj(xj )
The generic form of a genetic algorithm is depicted in Table 6.
GA(X,Obj,N,n,c,m,f)
| Inputs | local search problem |
| number of iterations n | |
| crossover probability c | |
| mutation probability m | |
| fitness function f | |
| Output | fit individual |
| Initialize | population G of N individuals |
1. for i =1 to n do
i. select: generate two parents x and y in G according to f
ii. crossover: with probability c, swap contiguous bits in x and y
iii. mutation: with probability m, mutate bits in x and y
iv. insert offspring of x and y into G ′
(c) let G = G ′
2. return individual in G of maximum fitness, or return an individual in G generated at random, according to f
Table 6: Genetic Algorithm.
10