Simulated Annealing

Unit 4: Optimization and Constraint Satisfaction — Section 4.2

A blacksmith heats a piece of steel to glowing red, then cools it slowly. If cooled too fast, atoms freeze into a disordered structure full of defects. Cooled slowly, they settle into a perfect crystalline lattice — the lowest-energy, strongest arrangement. Simulated annealing borrows this physical process to escape the local-maximum trap that defeats hill climbing.

See how simulated annealing uses temperature to explore the optimization landscape more freely than hill climbing.

Simulated Annealing

In Section 4.1, you saw that hill climbing gets stuck at local maxima because it never accepts a move that makes things worse. Simulated annealing fixes exactly this problem by sometimes accepting downhill moves — but with a probability that decreases as the algorithm runs.

The Metallurgy Inspiration

Physical annealing works in two phases:

Heating phase: Atoms absorb energy and move rapidly, breaking out of whatever structure they were locked into
Cooling phase: Atoms gradually lose energy and settle into lower-energy, more stable arrangements
Final state: If cooling is slow enough, atoms reach the global minimum-energy crystalline structure

The key insight: controlled randomness at high temperature allows the system to escape suboptimal configurations. As temperature drops, the system becomes increasingly conservative, refining whatever good structure it has found.

Simulated annealing maps this process to optimization:

Temperature controls the probability of accepting worse solutions
Energy corresponds to the objective function value
Cooling schedule gradually reduces temperature over time

The Simulated Annealing Algorithm

Simulated Annealing:

Start at an initial state with temperature T set to a high value
At each step, randomly select one neighbor of the current state
If the neighbor is better, move there (as in hill climbing)
If the neighbor is worse by an amount ΔE, move there anyway with probability e^(ΔE/T)
Decrease T according to the cooling schedule
Repeat until T reaches zero (or a minimum threshold)
Return the best state encountered

import math, random

def simulated_annealing(problem, schedule):
    current = problem.initial_state()
    best = current
    T = schedule.initial_temp

    for t in range(schedule.max_steps):
        T = schedule.temperature(t)
        if T == 0:
            return best
        neighbor = random.choice(problem.get_neighbors(current))
        delta_e = problem.objective(neighbor) - problem.objective(current)
        if delta_e > 0:
            current = neighbor          # better -- always accept
        elif random.random() < math.exp(delta_e / T):
            current = neighbor          # worse -- accept probabilistically
        if problem.objective(current) > problem.objective(best):
            best = current
    return best

Simulated Annealing: A probabilistic local search algorithm that accepts worse solutions with probability e^(ΔE/T), where ΔE is the change in objective value and T is a temperature parameter that decreases over time. Named by analogy to the metallurgical annealing process.

The Acceptance Probability

The mathematical heart of simulated annealing is the Metropolis criterion:

Acceptance Probability:

P(accept worse move) = e^(ΔE / T)

Where:

ΔE = objective(neighbor) − objective(current) < 0 (the neighbor is worse)
T = current temperature (positive)
e ≈ 2.718 (Euler’s number)

This formula has elegant behavior:

Matrix showing acceptance probabilities for different temperature and energy change combinations

Figure 4.2: The Metropolis criterion — the probability of accepting a worse state depends on the temperature and the magnitude of the energy change.

When T is high (early): e^(ΔE/T) ≈ 1 — almost any move is accepted, enabling broad exploration
When T is low (late): e^(ΔE/T) ≈ 0 — almost no worse moves are accepted; the algorithm behaves like hill climbing
When ΔE is small: More likely to accept (a slight degradation costs little)
When ΔE is large: Much less likely to accept (a large degradation is expensive)

Reading the acceptance probability:

Suppose the current solution has energy 100 and the neighbor has energy 103 (worse by ΔE = -3).

At temperature T = 10: P = e^(-3/10) = e^(-0.3) ≈ 0.74 — accept 74% of the time
At temperature T = 1: P = e^(-3/1) = e^(-3) ≈ 0.05 — accept only 5% of the time
At temperature T = 0.1: P = e^(-3/0.1) = e^(-30) ≈ 0.000000001 — essentially never

This graceful transition from exploration to exploitation is what makes SA effective.

Cooling Schedules

The cooling schedule determines how quickly T decreases and critically affects performance.

Cooling Schedule: A function that specifies the temperature at each step of simulated annealing. The schedule must be slow enough to escape local optima but fast enough to be practical.

Three common schedules:

Schedule	Formula	Characteristics	Typical Use
Linear	T(t) = T₀ − αt	Simple; reaches zero quickly	Small problems
Exponential	T(t) = T₀ × α^t	Most common; α typically 0.85–0.99	General purpose
Logarithmic	T(t) = T₀ / log(1 + t)	Very slow; guarantees convergence	Theoretical benchmark

Schedule

Formula

Characteristics

Typical Use

Linear

T(t) = T₀ − αt

Simple; reaches zero quickly

Small problems

Exponential

T(t) = T₀ × α^t

Most common; α typically 0.85–0.99

General purpose

Logarithmic

T(t) = T₀ / log(1 + t)

Very slow; guarantees convergence

Theoretical benchmark

In practice, exponential cooling is the standard choice. A cooling rate α between 0.95 and 0.99 provides enough time to explore while still converging in a reasonable number of steps.

If you set the cooling rate α = 0.999 (very slow cooling), simulated annealing will explore the landscape extensively but may take millions of steps. If you set α = 0.8 (fast cooling), it converges quickly but may miss better solutions.

How would you decide on a cooling rate for a new problem? What information would help you choose?

SA on the Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a canonical application for simulated annealing.

SA for TSP:

State: An ordered tour visiting all cities once
Neighbor: Reverse a random segment of the tour (2-opt swap)
Objective: Total tour distance (minimize — use negative distance for "energy")

Early phase (high temperature): SA accepts many bad swaps, exploring wildly across the solution space Middle phase (moderate temperature): SA occasionally accepts bad swaps, refining promising regions Late phase (low temperature): SA rarely accepts bad swaps, fine-tuning the best tour found

Result: SA routinely finds tours within 1–3% of optimal for hundreds of cities — solutions that would require astronomical computation to guarantee with systematic search.

Algorithm Properties

Property	Simulated Annealing
Complete?	Yes, with logarithmic cooling (theoretical guarantee)
Optimal?	Approaches optimal with sufficiently slow cooling
Time complexity	Depends on cooling schedule; often O(max_steps)
Space complexity	O(1) — only current state and best state stored
Best for	Large spaces with many local optima

Property

Simulated Annealing

Complete?

Yes, with logarithmic cooling (theoretical guarantee)

Optimal?

Approaches optimal with sufficiently slow cooling

Time complexity

Depends on cooling schedule; often O(max_steps)

Space complexity

O(1) — only current state and best state stored

Best for

Large spaces with many local optima

The theoretical guarantee of SA — that logarithmic cooling converges to the global optimum — is mostly of academic interest. In practice, logarithmic cooling is too slow to be useful. The real value of SA is empirical: on many hard optimization problems, exponential cooling finds near-optimal solutions far faster than any alternative, including random-restart hill climbing.

A Brief Overview of Genetic Algorithms

Simulated annealing is not the only strategy for escaping local optima. Genetic algorithms (GAs) take a fundamentally different approach, maintaining an entire population of candidate solutions and combining them to produce better offspring.

Genetic Algorithm: An optimization algorithm inspired by natural selection that maintains a population of candidate solutions and iteratively applies selection, crossover, and mutation operators to evolve better solutions.

The GA process each generation:

Selection: Choose the fittest individuals from the current population (they "reproduce")
Crossover: Combine two parent solutions to produce a child that inherits features from both
Mutation: Randomly alter one or more genes in a child solution (maintains diversity)
Replacement: New population of children replaces (or partially replaces) the old population

Genetic Algorithm for Binary Optimization:

If solutions are encoded as binary strings:

Parent 1: 1 1 0 0 1 1 0 0 Parent 2: 0 0 1 1 0 0 1 1

Single-point crossover at position 4: Child: 1 1 0 0 0 0 1 1

A mutation might flip one bit: Mutated: 1 1 0 0 0 1 1 1

GAs are powerful when solutions have a natural compositional structure — when good partial solutions can be meaningfully combined. They work well on problems with complex, multimodal landscapes where maintaining diversity across a population prevents premature convergence.

The trade-off: GAs require more configuration (population size, crossover rate, mutation rate) and more computational resources than SA, which only maintains a single solution at each step.

Comparing Local Search Techniques

Technique	Explores how	Escapes local optima	Memory
Hill Climbing	Best neighbor	No (random restart only)	O(1)
SA	Random neighbor + probabilistic acceptance	Yes (via temperature)	O(1)
Genetic Algorithm	Population evolution + crossover	Yes (via diversity)	O(population size)

Technique

Explores how

Escapes local optima

Memory

Hill Climbing

Best neighbor

No (random restart only)

O(1)

Random neighbor + probabilistic acceptance

Yes (via temperature)

O(1)

Genetic Algorithm

Population evolution + crossover

Yes (via diversity)

O(population size)

Simulated annealing provides the best balance of simplicity and power among local search algorithms. It requires no more memory than hill climbing but can escape local optima through its temperature mechanism. For most single-solution optimization tasks, SA is the first algorithm to try beyond hill climbing. Genetic algorithms become compelling when solutions have natural building blocks that combine well, or when the problem naturally fits a population-based framing.

Test your understanding of simulated annealing and cooling schedules.

Next: 4.3 Constraint Satisfaction Problems →

Based on the UC Berkeley CS 188 Online Textbook by Nikhil Sharma, Josh Hug, Jacky Liang, and Henry Zhu, licensed under CC BY-SA 4.0.

This work is licensed under CC BY-SA 4.0.