AI - Search II | RUOCHI.AI

Review

Definition: search problem

\(S_{start}\): staring state
Actions(s): possible actions
Cons(s,a): action cost
Succ(s,a): successor
IsEnd(s): reached end state?

Objective: find the minimum cost path from \(S_start\) to an \(s\) that satisfying IsEnd(s).


class TransportationProblem(object):
    def __init__(self, N):
        # N = number of blocks
        self.N = N
    def startState(self):
        return 1
    def isEnd(self, state):
        return state == self.N

    def succAndCost(self, state):
        # return list of (action, newState, cost) triples
        result = []
        if state+1<=self.N:
            result.append(('walk', state+1, 1))
        if state*2<=self.N:
            result.append(('tram', state*2, 2))
        return result

Learning problem

Now suppose we don’t know what the costs are, but we observe someone getting from 1 to n via some sequence of walking and tram-taking. Can we figure out what the costs are? This is the goal of learning.

Learning Problem

Let’s cast the problem as predicting an output y given an input x. Here, the input x is the search problem (visualized as a search tree) without the costs provided. The output y is the desired solution path. The question is what the costs should be set to so that y is actually the minimum cost path of the resulting search problem.

Structured Perceptron

import sys
sys.setrecursionlimit(10000)

### Model (search problem)

class TransportationProblem(object):
    def __init__(self, N, weights):
        # N = number of blocks
        # weights = weights of different actions
        self.N = N
        self.weights = weights
    def startState(self):
        return 1
    def isEnd(self, state):
        return state == self.N
    def succAndCost(self, state):
        # return list of (action, newState, cost) triples
        result = []
        if state+1<=self.N:
            result.append(('walk', state+1, self.weights['walk']))
        if state*2<=self.N:
            result.append(('tram', state*2, self.weights['tram']))
        return result

def dynamicProgramming(problem):
    cache = {} # state -> futureCost(state), action, newState, cost
    def futureCost(state):
        # Base case
        if problem.isEnd(state):
            return 0
        if state in cache: # Exponential savings
            return cache[state][0]
        # Actually doing work
        result = min((cost+futureCost(newState), action, newState, cost) \
                for action, newState, cost in problem.succAndCost(state))
        cache[state] = result
        return result[0]

    state = problem.startState()
    totalCost = futureCost(state)

    # Recover history
    history = []
    while not problem.isEnd(state):
        _, action, newState, cost = cache[state]
        history.append((action, newState, cost))
        state = newState

    return (futureCost(problem.startState()), history)


### Main
def predict(N, weights):
    # f(x)
    # Input (x): N (number of blocks)
    # Output (y): path (sequence of actions)
    problem = TransportationProblem(N, weights)
    totalCost, history = dynamicProgramming(problem)
    return [action for action, newState, cost in history]

def generateExamples():
    trueWeights = {'walk': 1, 'tram': 5}
    return [(N, predict(N, trueWeights)) for N in range(1, 30)]

def structuredPerceptron(examples):
    weights = {'walk': 0, 'tram': 0}
    iteration = 100
    for t in range(iteration):
        numMistakes = 0
        for N, trueActions in examples:
            # Make a prediction
            predActions = predict(N, weights)
            if predActions != trueActions:
                numMistakes += 1
            # Update weights
            for action in trueActions:
                weights[action] -= 1
            for action in predActions:
                weights[action] += 1
        print('Iteration {}, numMistakes = {}, weights = {}'.format(t, numMistakes, weights))
        if numMistakes == 0:
            break

examples = generateExamples()
print('Training dataset:')
for example in examples:
    print('  ', example)
structuredPerceptron(examples)

A Star Search

Exploring states

UCS: explore states in order of PastCost(s)
A start: explore in order of PastCost(s) + h(s) - A heuristic h(s) is any estimate of FutureCost(s).

First, some terminology: PastCost(s) is the minimum cost from the start state to s, and FutureCost(s) is the minimum cost from s to an end state. Without loss of generality, we can just assume we have one end state. (If we have multiple ones, create a new official goal state which is the successor of all the original end states.)
Recall that UCS explores states in order of PastCost(s). It’d be nice if we could explore states in order of PastCost(s) + FutureCost(s), which would definitely take the end state into account, but computing FutureCost(s) would be as expensive as solving the original problem.
A star relies on a heuristic h(s), which is an estimate of FutureCost(s). For A star to work, h(s) must satisfy some conditions, but for now, just think of h(s) as an approximation. We will soon show that A star will explore states in order of PastCost(s) + h(s). This is nice, because now states which are estimated (by h(s)) to be really far away from the end state will be explored later, even if their PastCost(s) is small.

F = G + H

One important aspect of A star is f = g + h. The f, g, and h variables are in our Node class and get calculated every time we create a new node. Quickly I’ll go over what these variables mean.

F is the total cost of the node.
G is the distance between the current node and the start node.
H is the heuristic — estimated distance from the current node to the end node.

Consistent heuristics

We need h(s) to be consistent, which means two things. First, the modified edge costs are non-negative (this is the main property). This is important for UCS to find the minimum cost path (remember that UCS only works when all the edge costs are non-negative).
Second, h(send) = 0, which is just saying: be reasonable. The minimum cost from the end state to the end state is trivially 0, so just use 0.

Correctness of A*

If h is consistent, A star returns the minimum cost path.

Admissibility

A heuristic h(s) is admissible if

\[h(s) \leq FutureCost(s)\]

Theorem: consistency implies admissibility

If a heuristic h(s) is consistent, then h(s) is admissible.

Relaxation

With an arbitrary configuration of walls, we can’t compute FutureCost(s) except by doing search. However, if we just relaxed the original problem by removing the walls, then we can compute FutureCost(s) in closed form: it’s just the Manhattan distance between \(s\) and \(s_{end}\). Specifically, ManhattanDistance((r1, c1), (r2, c2)) = |r1 − r2| + |c1 − c2|.

More formally, we define a relaxed search problem as one where the relaxed edge costs are no larger than the original edge costs.
The relaxed heuristic is simply the future cost of the relaxed search problem, which by design should be efficiently computable.

A star Search Algorithm

import heapq, collections, re, sys, time, os, random

# Data structure for supporting uniform cost search.
class PriorityQueue:
    def  __init__(self):
        self.DONE = -100000
        self.heap = []
        self.priorities = {}  # Map from state to priority

    def update(self, node):
        state, newPriority = node.position, node.f
        oldPriority = self.priorities.get(state)
        if oldPriority == None or newPriority < oldPriority:
            self.priorities[state] = newPriority
            heapq.heappush(self.heap, node)
            return True
        return False

    def removeMin(self):
        while len(self.heap) > 0:
            node = heapq.heappop(self.heap)
            priority, state = node.f, node.position
            if self.priorities[state] == self.DONE: 
                continue  # Outdated priority, skip
            self.priorities[state] = self.DONE
            return node
        return None


class Node():
    """A node class for A* Pathfinding"""

    def __init__(self, parent=None, position=None):
        self.parent = parent
        self.position = position

        self.g = 0
        self.h = 0
        self.f = 0

    def __eq__(self, other):
        return self.position == other.position

    def __lt__(self, other):
        return self.f < other.f

    def __repr__(self):
        return "Node (position: {}, g: {}, h: {}, f: {})".format(self.position, self.g, self.h, self.f)


def astar(maze, start, end):
    """Returns a list of tuples as a path from the given start to the given end in the given maze"""

    # Create start and end node
    start_node = Node(None, start)
    start_node.g = start_node.h = start_node.f = 0
    end_node = Node(None, end)
    end_node.g = end_node.h = end_node.f = 0

    # Initialize both open and closed list
    frontier = PriorityQueue()
    frontier.update(start_node)

    # Loop until you find the end
    while True:

        # Get the current node
        current_node = frontier.removeMin()

        # Found the goal
        if current_node == end_node:
            path = []
            current = current_node
            while current is not None:
                path.append(current.position)
                current = current.parent
            return path[::-1] # Return reversed path


        # Generate children
        children = []
        for new_position in [(0, -1), (0, 1), (-1, 0), (1, 0), (-1, -1), (-1, 1), (1, -1), (1, 1)]: # Adjacent squares

            # Get node position
            node_position = (current_node.position[0] + new_position[0], current_node.position[1] + new_position[1])

            # Make sure within range
            if node_position[0] > (len(maze) - 1) or node_position[0] < 0 or node_position[1] > (len(maze[len(maze)-1]) -1) or node_position[1] < 0:
                continue

            # Make sure walkable terrain
            if maze[node_position[0]][node_position[1]] != 0:
                continue

            # Create new node
            new_node = Node(current_node, node_position)


            # Create the f, g, and h values
            new_node.g = current_node.g + 1
            new_node.h = ((new_node.position[0] - end_node.position[0]) ** 2) + ((new_node.position[1] - end_node.position[1]) ** 2)
            new_node.f = new_node.g + new_node.h

            frontier.update(new_node)
         


def main():

    maze = [[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

    start = (0, 0)
    end = (7, 6)

    path = astar(maze, start, end)
    print(path)


if __name__ == '__main__':
    main()  

    # [(0, 0), (1, 1), (2, 2), (3, 3), (4, 3), (5, 4), (6, 5), (7, 6)]

Reference

https://medium.com/@nicholas.w.swift/easy-a-star-pathfinding-7e6689c7f7b2
CS221 course note and ppt https://stanford-cs221.github.io/autumn2019/