Week one: Introduction to Computational Thinking and Data Science(1)
-
Optimization and the Knapsack Problem
-
0/1 Knapsack Problems
-
Optimization Model
- An objective function that is to be maximized or minimized:
- Minimize time spent traveling from NY to Boston
- A set of constraints (possibly empty) that must be honored:
- Can not spend more than $100
- Must be in Boston before 5:00 p.m.
- An objective function that is to be maximized or minimized:
-
Knapsack problem
-
variables:
I[<value,weight>]
-
L
: the set of available itmes -
V
: value(0,1) to indicate whether or not items are taken. -
Find a L that maximizes(both):
\(\sum_{i=0}^{n-1}V[i]*I[i].value\)
\[\sum_{i=0}^{n-1}V[i]*I[i].weight<W limit\]
-
-
-
Greedy Algorithms
-
put the best available item in knapsack
-
best mean:
- Most valuable?
- Least expensive?
- Highest value/units?
-
# -*- coding: utf-8 -*- """ Created on Sat Mar 30 11:31:52 2019 @author: Administrator """ class Food(object): def __init__(self, item, value, calories): self.item = item self.value = value self.calories = calories def getValue(self): return self.value def getCalories(self): return self.calories def density(self): return self.getValue()/self.getCalories() def __str__(self): return self.item +':<' + str(self.value)+','+ str(self.calories) + '>' def buildMenu(items, values, calories): menu=[] for i in range(len(values)): menu.append(Food(items[i],values[i],calories[i])) return menu def greedy(items, maxCost, keyFunction): itemsCopy = sorted(items, key= keyFunction, reverse = True) result = [] totalValue, totalCost = 0.0, 0.0 for i in range(len(itemsCopy)): if (totalCost + itemsCopy[i].getCalories()) <= maxCost: result.append(itemsCopy[i]) totalValue += itemsCopy[i].getValue() totalCost += itemsCopy[i].getCalories() return (result, totalValue) def testGreedy(items, constraint, keyFunction): taken, val = greedy(items, constraint, keyFunction) print('Total value of items taken =', val) for item in taken: print('', item) def testGreedys(foods, maxUnits): print('Use greedy by value to allocate', maxUnits, 'calories') testGreedy(foods, maxUnits, Food.getValue) print('\nUse greedy by cost to allocate', maxUnits, 'calories') testGreedy(foods, maxUnits, lambda x: 1/Food.getCalories(x)) print('\nUse greedy by density to allocate', maxUnits, 'calories') testGreedy(foods, maxUnits, Food.density) items = ['wine', 'beer', 'pizza', 'burger', 'fries', 'cola', 'apple', 'donut', 'cake'] values = [89,90,95,100,90,79,50,10] calories = [123,154,258,354,365,150,95,195] foods = buildMenu(items, values, calories) testGreedys(foods, 1000)
-
lambda
-
lambda<id1,id2,…idn>:
-
lambda x: x-1 Out[1]: <function __main__.<lambda>(x)> f1 = lambda x : x-1 f1(3) Out[3]: 2 f2 = lambda x,y: x+y f2(3,4) Out[5]: 7 f3 = lambda x,y: 'factor' if (x%y ==0) else 'not factor' f3(5,8) Out[7]: 'not factor' f3(3,9) Out[8]: 'not factor' f3(9,3) Out[9]: 'factor'
-
-
-
-
Decision Trees and Dynamic Programming
-
Brute Force Algorithms
-
Search Tree: run through all 0/1 possibility
-
0/1 Knapsack problem
def maxValue(listitems, avail): if listitems == [] or avail == 0: result= (0,()) elif listitems[0].getCost() > avail: #fisrt item exceeds the calorie limit result = maxValue(listitems[1:], avail) #set the constraints else: item = listitems[0] # take the first item out totalleftValue, lefttaken = maxValue(listitems[1:], avail - item.getCost()) # take the left branch totalleftValue += item.getValue() totalrightValue, righttaken = maxValue(listitems[1:], avail) # take the right branch recursively if totalleftValue > totalrightValue: # top-down to the deepest level and compare value result = (totalleftValue, lefttaken +(item,)) # return the best approach with the first item else: result = (totalrightValue, righttaken) # return the best aprroach without the first item return result def testmaxValue(foods, maxUnits): print('Use search tree to allocate', maxUnits, 'calories') val, taken = maxValue(foods, maxUnits) print('Total value of items taken =', val) for item in taken: print(' ', item)
-
# generate all combinations of N items def powerSet(items): N = len(items) # enumerate the 2**N possible combinations for i in range(2**N): combo = [] for j in range(N): # test bit jth of integer i if (i >> j) % 2 == 1: combo.append(items[j]) yield combo def yieldAllCombos(items): """ Generates all combinations of N items into two bags, whereby each item is in one or zero bags. Yields a tuple, (bag1, bag2), where each bag is represented as a list of which item(s) are in each bag. """ N = len(items) for i in range(3**N): bag1 = [] bag2 = [] for j in range(N): if (i // 3**j) % 3 == 0: bag1.append(items[j]) if (i // 3**j) % 3 == 1: bag2.append(items[j]) yield bag1, bag2
This is a hard exercise indeed. But things will be much easier when you know the following.
Let’s assume we have 8 digits, each digit can hold either 1 or 0, how many combinations can be made ? .. let’s draw a table :
d8 d7 d6 d5 d4 d3 d2 d1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 .. .. .. 1 1 1 1 1 1 1 1
The number or rows (which is the same as the number of different combinations) is equal to 2**8 (8 is the number of digits).
So, basically .. until now we are just talking about the number of combinations. However, this is well related to binary numbering system.
In binary system, a number is represented in 1’s and 0’s. The digits have weights which mean, the number under d8 in the previous example will be multiplied by 128 (so if under d8 there is 1, it will worth 128, otherwise it will worth 0) .. and the number under d7 will be multiplied by 64 .. the multiplicands are powers of 2 and are decreasing from left to right .. until d1 which is multiplied by 1 (20). So what we get if all the 8 digits are 1’s ? 1128 + 164 + 132 + 116 + 18 + 14 + 12 + 11 = 255. So, how many numbers can be represented in those 8 digits ? actually it’s the same questions as “how many combinations can be made using 8 digits” and the answer is 28.
If you understand what I have said so far, then the rest will be easy.
The code in powerSet make use of this similarity between binary numbers and the idea of the number of combinations. So, if we have N items, we could have 2**N different combinations, we can also represent each combination using a different binary number represented in N digits.
So if the items are pizza, cheese, apple, water, juice, meat, egg, rice we can write the table again but a little bit different
pizza cheese apple water juice meat egg rice 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 ... ... 1 1 1 1 1 1 1 1
1 means that we will take that item, 0 means that we won’t take it. At the same time, we can represent each different combination by it’s corresponding binary value. So taking “rice” only is equal to 1, taking rice and egg is equal to 3 .. and etc.
So what does the code says ? First, it iterates through all 2*N possible combinations (possible binary numbers) Second, for each combination of them (represented in its binary form) check for the 1’s inside the number and add the corresponding items to the *combo list. But how does it do it ? In a very smart way.
The shift operator » shifts all digits to the right X times, where X is the number on the right of the » operator. For example, 8 (which is equal to 1000) » 1, will shift the 4 digits 1 step to the right to be (0100) which is equal to 4. Even numbers in binary always have 0 as the first digit on the right whereas odd numbers have 1 as the first number on the right. So, 8 (1000) has 0 on the right .. when we shift it right by 1 .. 4(0100) has 0 on the right, 2(0010) has 0 on the right .. 1(0001) has 1 on the right. So, if we check if the number is odd or not, we can know if there is 1 on the right or not.
Using this idea, the code tries to shift each different combination by numbers from 0 to N, each time it checks if it’s odd or not by checking if there’s a remainder of 1 (if (i » j) % 2 == 1) .. if so, then the number of times we shifted the current combination (j in this case) is the original position of the 1 in the combination .. which we can use as an index for the corresponding item we want to take.
So if the current combination is 1000 (8) and by shifting it 3 times to the right we have 0001 which is odd, we know that 1 in 1000 is the 3rd bit (starting from 0 from the right to the left). if the list of items are [rice, meat, egg, juice] then items[3] is juice.
So, for each different combination, we use the binary representation of that combination and use the previous method to search for the 1’s in the number and add the corresponding items to the list of the items to be taken.
It’s quite long and complex especially for those who never got exposed to binary numbers before .. but this is the trick.
The solution to the exercise is quite tricky .. I’m not going to post anything about it here but the idea is so similar but is still tricky.
Good luck.
-
-
Recursive Fib
-
Recursive Fibonnaci
def fib(n): if n == 0 or n == 1: return 1 else: return fib(n-1) + fib(n-2)
-
Without a dictionary, it takes a long time:
def testmaxValue(foods, maxUnits): print('Use search tree to allocate', maxUnits, 'calories') val, taken = maxValue(foods, maxUnits) print('Total value of items taken =', val) for item in taken: print(' ', item) import random def buildLargeMenu(numItems, maxVal, maxCost): items = [] for i in range(numItems): items.append(Food(str(i),random.randint(1,maxVal), random.randint(1,maxCost))) return items for numItems in (5, 10, 15, 20, 25, 30, 35, 40, 45): print('Try a menu with', numItems, 'items') items = buildLargeMenu(numItems, 90, 250) testmaxValue(items, 750)
-
-
Dynamic Programming
-
Fibonnaci with dictionary
def fib_efficient(n,d): if n in d: return d[n] else: ans = fib_efficient(n-1,d) + fib_efficient(n-2,d) d[n] = ans return ans d = {1:1, 2:2}
-
memoization
-
Trade a time for space
-
Create a dictionary to record what we’ve done.
-
Prerequisite:
-
Optimal substructure: a globally optimal solution to local subproblems(recursive calls)
-
Overlapping subproblems: solving the same problem multiple times or the computation takes the same path.(available weight)
-
-
-
Knapsack 0/1 Problem Exercise
import random def buildLargeMenu(numItems, maxVal, maxCost): items = [] for i in range(numItems): items.append(Food(str(i),random.randint(1,maxVal), random.randint(1,maxCost))) return items def fastMaxVal(toConsider, avail, memo={}): if (len(toConsider), avail) in memo: #(items left to be considered, available weight) result = memo[(len(toConsider), avail)] elif toConsider == [] or avail == 0: result = ((),0) elif toConsider[0].getCost() > avail: result = fastMaxVal(toConsider[1:], avail, memo) else: item = toConsider[0] taken, values = fastMaxVal(toConsider[1:], avail-item.getCost(), memo) values += item.getValue() nottaken, notvalues = fastMaxVal(toConsider[1:], avail, memo) if values > notvalues: result = (taken +(item,), values) else: result = (nottaken, notvalues) memo[(len(toConsider), avail)] = result return result def testMaxVal(foods, maxUnits, algorithm): print('Menu contains', len(foods), 'items') print('Use search tree to allocate', maxUnits, 'calories') taken,val = algorithm(foods, maxUnits) print('Total value of items taken =', val) for item in taken: print(' ', item) for numItems in (5, 10, 15, 20, 25, 30, 35, 40, 45, 50,900): items = buildLargeMenu(numItems, 90, 250) testMaxVal(items, 750, fastMaxVal)
-
-
-
Graph Problems
-
Graph Model
- Set of nodes (vertices) ◦ Might have properties associated with them
- Set of edges (arcs) each consisting of a pair of nodes ◦ Undirected (graph) ◦ Directed (digraph) ◦ Source (parent) and destination (child) nodes ◦ Unweighted or weighted
-
Path Optimization
-
Adjacency matrix
- Rows: source nodes
- Columns: destination nodes
- Cell[s,d] = 1/values/lists[] if there is an edge from s to d, otherwise 0
-
Adjacency list
- Associate with each node a list of destination nodes
class Node(object): def __init__(self,name): self.name = name def getName(self): return self.name def __str__(self): return self.name class Edge(object): def __init__(self, src, dest): self.src = src self.dest = dest def getSource(self): return self.src def getDestination(self): return self.dest def __str__(self): return self.src.getName() + '->' + self.dest.getName() class Digraph(object): def __init__(self): self.edges = {} #a dictionary maps values of node to lists of edge def addNode(self, node): self.edges[node] = [] def addEdge(self, edge): src = edge.getSource() dest = edge.getDestination() self.edges[src].append(dest) def childrenOf(self, node): return self.edges[node] def getNode(self, name): for n in self.edges: if n.getName() == name: return n raise NameError(name) def __str__(self): result = '' for src in self.edges: for dest in self.edges[src]: result = result + src.getName() + '->'\ + dest.getName() + '\n' return result[:-1] #omit final newline class Graph(Digraph): def addEdge(self, edge): rev = Edge(edge.getDestination(), edge.getSource()) Digraph.addEdge(self, rev) def buildCityGraph(graphType): g = graphType() for name in ('Boston', 'Providence', 'New York', 'Chicago', 'Denver', 'Phoenix', 'Los Angeles'): #Create 7 nodes g.addNode(Node(name)) g.addEdge(Edge(g.getNode('Boston'), g.getNode('Providence'))) g.addEdge(Edge(g.getNode('Boston'), g.getNode('New York'))) g.addEdge(Edge(g.getNode('Providence'), g.getNode('Boston'))) g.addEdge(Edge(g.getNode('Providence'), g.getNode('New York'))) g.addEdge(Edge(g.getNode('New York'), g.getNode('Chicago'))) g.addEdge(Edge(g.getNode('Chicago'), g.getNode('Phoenix'))) g.addEdge(Edge(g.getNode('Chicago'), g.getNode('Denver'))) g.addEdge(Edge(g.getNode('Denver'), g.getNode('Phoenix'))) g.addEdge(Edge(g.getNode('Denver'), g.getNode('New York'))) g.addEdge(Edge(g.getNode('Los Angeles'), g.getNode('Boston'))) return g print(buildCityGraph(Graph))
-
Shortest Path
-
Depth-first Search(DFS)
- Similar to left-first depth-first method of tree search
- MAIN Difference: have cycles
class Node(object): def __init__(self,name): self.name = name def getName(self): return self.name def __str__(self): return self.name class Edge(object): def __init__(self, src, dest): self.src = src self.dest = dest def getSource(self): return self.src def getDestination(self): return self.dest def __str__(self): return self.src.getName() + '->' + self.dest.getName() class Digraph(object): def __init__(self): self.edges = {} #a dictionary maps values of node to lists of edge def addNode(self, node): self.edges[node] = [] def addEdge(self, edge): src = edge.getSource() dest = edge.getDestination() self.edges[src].append(dest) def childrenOf(self, node): return self.edges[node] def getNode(self, name): for n in self.edges: if n.getName() == name: return n raise NameError(name) def __str__(self): result = '' for src in self.edges: for dest in self.edges[src]: result = result + src.getName() + '->'\ + dest.getName() + '\n' return result[:-1] #omit final newline class Graph(Digraph): def addEdge(self, edge): rev = Edge(edge.getDestination(), edge.getSource()) Digraph.addEdge(self, rev) def buildCityGraph(graphType): g = graphType() for name in ('Boston', 'Providence', 'New York', 'Chicago', 'Denver', 'Phoenix', 'Los Angeles'): #Create 7 nodes g.addNode(Node(name)) g.addEdge(Edge(g.getNode('Boston'), g.getNode('Providence'))) g.addEdge(Edge(g.getNode('Boston'), g.getNode('New York'))) g.addEdge(Edge(g.getNode('Providence'), g.getNode('Boston'))) g.addEdge(Edge(g.getNode('Providence'), g.getNode('New York'))) g.addEdge(Edge(g.getNode('New York'), g.getNode('Chicago'))) g.addEdge(Edge(g.getNode('Chicago'), g.getNode('Phoenix'))) g.addEdge(Edge(g.getNode('Chicago'), g.getNode('Denver'))) g.addEdge(Edge(g.getNode('Denver'), g.getNode('Phoenix'))) g.addEdge(Edge(g.getNode('Denver'), g.getNode('New York'))) g.addEdge(Edge(g.getNode('Los Angeles'), g.getNode('Boston'))) return g def printPath(path): """Assumes path is a list of nodes""" result = '' for i in range(len(path)): result = result + str(path[i]) if i != len(path) - 1: result = result + '->' return result def DFS(graph, start, end, path, shortest, toPrint = False): path = path + [start] if toPrint: print('Current DFS path:', printPath(path)) if start == end: return path for node in graph.childrenOf(start): if node not in path: # avoid cycles if shortest == None or len(path) < len(shortest): newPath = DFS(graph, node, end, path, shortest, toPrint) if newPath != None: shortest = newPath elif toPrint: print('Already visited', node) return shortest def shortestPath(graph, start, end, toPrint = False): return DFS(graph, start, end, [], None, toPrint) def testSP(source, destination): g = buildCityGraph(Digraph) sp = shortestPath(g, g.getNode(source), g.getNode(destination), toPrint = True) if sp != None: print('Shortest path from', source, 'to', destination, 'is' , printPath(sp)) else: print('There is no path from', source, 'to', destination) testSP('Boston', 'Phoenix')
-
Breadth-first Search(BFS)
- store paths by using temppaths
- cannot minimize the sum of the weights of the edges
def BFS(graph, start, end, toPrint = False): initPath = [start] pathQueue = [initPath] # stored possible breadth pathes while len(pathQueue) != 0: tmpPath = pathQueue.pop(0) #for each path, add nextNode till lastNode == end print('Current BFS path:', printPath(tmpPath)) lastNode = tmpPath[-1] if lastNode == end: return tmpPath else: for nextNode in graph.childrenOf(lastNode): newPath = tmpPath + [nextNode] pathQueue.append(newPath) return None def shortestPath(graph, start, end, toPrint = False): """Assumes graph is a Digraph; start and end are nodes Returns a shortest path from start to end in graph""" return BFS(graph, start, end, toPrint) testSP('Boston', 'Phoenix')
-
-
-