0%

AI - Search II

Review

Definition: search problem

  • \(S_{start}\): staring state
  • Actions(s): possible actions
  • Cons(s,a): action cost
  • Succ(s,a): successor
  • IsEnd(s): reached end state?

Objective: find the minimum cost path from \(S_start\) to an \(s\) that satisfying IsEnd(s).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

class TransportationProblem(object):
def __init__(self, N):
# N = number of blocks
self.N = N
def startState(self):
return 1
def isEnd(self, state):
return state == self.N

def succAndCost(self, state):
# return list of (action, newState, cost) triples
result = []
if state+1<=self.N:
result.append(('walk', state+1, 1))
if state*2<=self.N:
result.append(('tram', state*2, 2))
return result

Learning problem

Now suppose we don’t know what the costs are, but we observe someone getting from 1 to n via some sequence of walking and tram-taking. Can we figure out what the costs are? This is the goal of learning.

Learning Problem

Let’s cast the problem as predicting an output y given an input x. Here, the input x is the search problem (visualized as a search tree) without the costs provided. The output y is the desired solution path. The question is what the costs should be set to so that y is actually the minimum cost path of the resulting search problem.

Structured Perceptron
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import sys
sys.setrecursionlimit(10000)

### Model (search problem)

class TransportationProblem(object):
def __init__(self, N, weights):
# N = number of blocks
# weights = weights of different actions
self.N = N
self.weights = weights
def startState(self):
return 1
def isEnd(self, state):
return state == self.N
def succAndCost(self, state):
# return list of (action, newState, cost) triples
result = []
if state+1<=self.N:
result.append(('walk', state+1, self.weights['walk']))
if state*2<=self.N:
result.append(('tram', state*2, self.weights['tram']))
return result

def dynamicProgramming(problem):
cache = {} # state -> futureCost(state), action, newState, cost
def futureCost(state):
# Base case
if problem.isEnd(state):
return 0
if state in cache: # Exponential savings
return cache[state][0]
# Actually doing work
result = min((cost+futureCost(newState), action, newState, cost) \
for action, newState, cost in problem.succAndCost(state))
cache[state] = result
return result[0]

state = problem.startState()
totalCost = futureCost(state)

# Recover history
history = []
while not problem.isEnd(state):
_, action, newState, cost = cache[state]
history.append((action, newState, cost))
state = newState

return (futureCost(problem.startState()), history)


### Main
def predict(N, weights):
# f(x)
# Input (x): N (number of blocks)
# Output (y): path (sequence of actions)
problem = TransportationProblem(N, weights)
totalCost, history = dynamicProgramming(problem)
return [action for action, newState, cost in history]

def generateExamples():
trueWeights = {'walk': 1, 'tram': 5}
return [(N, predict(N, trueWeights)) for N in range(1, 30)]

def structuredPerceptron(examples):
weights = {'walk': 0, 'tram': 0}
iteration = 100
for t in range(iteration):
numMistakes = 0
for N, trueActions in examples:
# Make a prediction
predActions = predict(N, weights)
if predActions != trueActions:
numMistakes += 1
# Update weights
for action in trueActions:
weights[action] -= 1
for action in predActions:
weights[action] += 1
print('Iteration {}, numMistakes = {}, weights = {}'.format(t, numMistakes, weights))
if numMistakes == 0:
break

examples = generateExamples()
print('Training dataset:')
for example in examples:
print(' ', example)
structuredPerceptron(examples)
Exploring states

UCS: explore states in order of PastCost(s)
A start: explore in order of PastCost(s) + h(s) - A heuristic h(s) is any estimate of FutureCost(s).

  • First, some terminology: PastCost(s) is the minimum cost from the start state to s, and FutureCost(s) is the minimum cost from s to an end state. Without loss of generality, we can just assume we have one end state. (If we have multiple ones, create a new official goal state which is the successor of all the original end states.)
  • Recall that UCS explores states in order of PastCost(s). It’d be nice if we could explore states in order of PastCost(s) + FutureCost(s), which would definitely take the end state into account, but computing FutureCost(s) would be as expensive as solving the original problem.
  • A star relies on a heuristic h(s), which is an estimate of FutureCost(s). For A star to work, h(s) must satisfy some conditions, but for now, just think of h(s) as an approximation. We will soon show that A star will explore states in order of PastCost(s) + h(s). This is nice, because now states which are estimated (by h(s)) to be really far away from the end state will be explored later, even if their PastCost(s) is small.

F = G + H

One important aspect of A star is f = g + h. The f, g, and h variables are in our Node class and get calculated every time we create a new node. Quickly I’ll go over what these variables mean.

  • F is the total cost of the node.
  • G is the distance between the current node and the start node.
  • H is the heuristic — estimated distance from the current node to the end node.

Consistent heuristics

Consistent heuristics
  • We need h(s) to be consistent, which means two things. First, the modified edge costs are non-negative (this is the main property). This is important for UCS to find the minimum cost path (remember that UCS only works when all the edge costs are non-negative).
  • Second, h(send) = 0, which is just saying: be reasonable. The minimum cost from the end state to the end state is trivially 0, so just use 0.

Correctness of A*

If h is consistent, A star returns the minimum cost path.

Admissibility

A heuristic h(s) is admissible if

\[h(s) \leq FutureCost(s)\]

Theorem: consistency implies admissibility

If a heuristic h(s) is consistent, then h(s) is admissible.

Relaxation

With an arbitrary configuration of walls, we can’t compute FutureCost(s) except by doing search. However, if we just relaxed the original problem by removing the walls, then we can compute FutureCost(s) in closed form: it’s just the Manhattan distance between \(s\) and \(s_{end}\). Specifically, ManhattanDistance((r1, c1), (r2, c2)) = |r1 − r2| + |c1 − c2|.

  • More formally, we define a relaxed search problem as one where the relaxed edge costs are no larger than the original edge costs.
  • The relaxed heuristic is simply the future cost of the relaxed search problem, which by design should be efficiently computable.
A star Search Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
import heapq, collections, re, sys, time, os, random

# Data structure for supporting uniform cost search.
class PriorityQueue:
def __init__(self):
self.DONE = -100000
self.heap = []
self.priorities = {} # Map from state to priority

def update(self, node):
state, newPriority = node.position, node.f
oldPriority = self.priorities.get(state)
if oldPriority == None or newPriority < oldPriority:
self.priorities[state] = newPriority
heapq.heappush(self.heap, node)
return True
return False

def removeMin(self):
while len(self.heap) > 0:
node = heapq.heappop(self.heap)
priority, state = node.f, node.position
if self.priorities[state] == self.DONE:
continue # Outdated priority, skip
self.priorities[state] = self.DONE
return node
return None


class Node():
"""A node class for A* Pathfinding"""

def __init__(self, parent=None, position=None):
self.parent = parent
self.position = position

self.g = 0
self.h = 0
self.f = 0

def __eq__(self, other):
return self.position == other.position

def __lt__(self, other):
return self.f < other.f

def __repr__(self):
return "Node (position: {}, g: {}, h: {}, f: {})".format(self.position, self.g, self.h, self.f)


def astar(maze, start, end):
"""Returns a list of tuples as a path from the given start to the given end in the given maze"""

# Create start and end node
start_node = Node(None, start)
start_node.g = start_node.h = start_node.f = 0
end_node = Node(None, end)
end_node.g = end_node.h = end_node.f = 0

# Initialize both open and closed list
frontier = PriorityQueue()
frontier.update(start_node)

# Loop until you find the end
while True:

# Get the current node
current_node = frontier.removeMin()

# Found the goal
if current_node == end_node:
path = []
current = current_node
while current is not None:
path.append(current.position)
current = current.parent
return path[::-1] # Return reversed path


# Generate children
children = []
for new_position in [(0, -1), (0, 1), (-1, 0), (1, 0), (-1, -1), (-1, 1), (1, -1), (1, 1)]: # Adjacent squares

# Get node position
node_position = (current_node.position[0] + new_position[0], current_node.position[1] + new_position[1])

# Make sure within range
if node_position[0] > (len(maze) - 1) or node_position[0] < 0 or node_position[1] > (len(maze[len(maze)-1]) -1) or node_position[1] < 0:
continue

# Make sure walkable terrain
if maze[node_position[0]][node_position[1]] != 0:
continue

# Create new node
new_node = Node(current_node, node_position)


# Create the f, g, and h values
new_node.g = current_node.g + 1
new_node.h = ((new_node.position[0] - end_node.position[0]) ** 2) + ((new_node.position[1] - end_node.position[1]) ** 2)
new_node.f = new_node.g + new_node.h

frontier.update(new_node)



def main():

maze = [[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

start = (0, 0)
end = (7, 6)

path = astar(maze, start, end)
print(path)


if __name__ == '__main__':
main()

# [(0, 0), (1, 1), (2, 2), (3, 3), (4, 3), (5, 4), (6, 5), (7, 6)]

Reference