base.html
1  {% extends "bootstrap/base.html" %} 
index.html
1  {% extends "base.html" %} 
1  docker run p 8502:8501 name=pets v "/home/models/pets/:/models/pets/1" e MODEL_NAME=pets tensorflow/serving 
This will copy the model from /home/models/pets/
which in your desktop to the path /models/pets/1
in docker
The port 8501
is defined by docker and you can change the 8502
to any port you can you used.
Flask
to process HTTP Requests and do model inference in docker.1  import os 
1  {% extends "base.html" %} 
To create a compound type, there are really only three essential building blocks. Any decent programming language provides these building blocks in some way:
datatype bindings
. In objectoriented languages with classes like Java, oneof types are achieved with subclassing, but that is a topic for much later in the course.Record types are “eachof” types where each component is a named field
.
1  {foo : int, bar : int*bool, baz : bool*int} 
In ML, we do not have to declare that we want a record type with particular field names and field types — we just write down a record expression and the typechecker gives it the right type.
Now that we know how to build record values, we need a way to access their pieces. For now, we will use #foo e
where foo
is a field name.
In fact, this is how ML actually defines tuples: A tuple is a record. That is, all the syntax for tuples is just a convenient way to write down and use records. The REPL just always uses the tuple syntax where possible, so if you evaluate {2=1+2, 1=3+4} it will print the result as (7,3). Using the tuple syntax is better style, but we did not need to give tuples their own semantics: we can instead use the “another way of writing” rules above and then reuse the semantics for records.
This is the first of many examples we will see of syntactic sugar
. We say, Tuples are just syntactic sugar for records with fields named 1, 2, …, n.
1  val z = (3,7) : int * int 
We now introduce datatype bindings, our third kind of binding after variable bindings and function bindings.
1  datatype mytype = TwoInts of int * int 
Roughly, this defines a new type where values have an int * int or a string or nothing. Any value will also be tagged
with information that lets us know which variant it is: These tags, which we will call constructors, are TwoInts
, Str
, and Pizza
.
A constructor is two different things. First, it is either a function for creating values of the new type (if the variant has of t for some type t) or it is actually a value of the new type (otherwise). In our example, TwoInts is a function of type int*int > mytype, Str is a function of type string>mytype, and Pizza is a value of type mytype. Second, we use constructors in caseexpressions as described further below.
1  datatype mytype = TwoInts of int * int 
Directed graphical models (a.k.a. Bayesian networks) are a family of probability distributions that admit a compact parametrization that can be naturally described using a directed graph.
The general idea behind this parametrization is surprisingly simple. Recall that by the chain rule, we can write any probability $p$ as:
A compact Bayesian network is a distribution in which each factor on the right hand side depends only on a small number of ancestor variables $x_{A_i}$:
For example, in a model with five variables, we may choose to approximate the factor $p(x_5 \mid x_4, x_3, x_2, x_1)$ with $p(x_5 \mid x_4, x_3)$. In this case, we write $x_{A_5} = \{x_4, x_3\}$.
As an example, consider a model of a student’s grade on an exam. This grade depends on the exam’s difficulty $d$ and the student’s intelligence $i$; it also affects the quality $l$ of the reference letter from the professor who taught the course. The student’s intelligence $i$ affects the SAT score $s$ as well. Each variable is binary, except for $g$, which takes 3 possible values.
The joint probability distribution over the 5 variables naturally factorizes as follows:
The graphical representation of this distribution is a DAG that visually specifies how random variables depend on each other. The graph clearly indicates that the letter depends on the grade, which in turn depends on the student’s intelligence and the difficulty of the exam.
Another way to interpret directed graphs is in terms of stories for how the data was generated. In the above example, to determine the quality of the reference letter, we may first sample an intelligence level and an exam difficulty; then, a student’s grade is sampled given these parameters; finally, the recommendation letter is generated based on that grade.
Formally, a Bayesian network is a directed graph $G = (V,E)$ together with
Thus, a Bayesian network defines a probability distribution $p$. Conversely, we say that a probability $p$ factorizes over a DAG $G$ if it can be decomposed into a product of factors, as specified by $G$.
It is not hard to see that a probability represented by a Bayesian network will be valid: clearly, it will be nonnegative and one can show using an induction argument (and using the fact that the CPDs are valid probabilities) that the sum over all variable assignments will be one. Conversely, we can also show by counterexample that when contains cycles, its associated probability may not sum to one.
To summarize, Bayesian networks represent probability distributions that can be formed via products of smaller, local conditional probability distributions (one for each variable). By expressing a probability in this form, we are introducing into our model assumptions that certain variables are independent.
This raises the question: which independence assumptions are we exactly making by using a Bayesian network model with a given structure described by $G$? This question is important for two reasons: we should know precisely what model assumptions we are making (and whether they are correct); also, this information will help us design more efficient inference algorithms later on.
Let us use the notation $I(p)$ to denote the set of all independencies that hold for a joint distribution $p$. For example, if $p(x,y) = p(x) p(y)$, then we say that $x \perp y \in I(p)$.
It turns out that a Bayesian network $p$ very elegantly describes many independencies in $I(p)$; these independencies can be recovered from the graph by looking at three types of structures.
For simplicity, let’s start by looking at a Bayes net $G$ with three nodes: $A$, $B$, and $C$. In this case, essentially has only three possible structures, each of which leads to different independence assumptions. The interested reader can easily prove these results using a bit of algebra.
Common parent. If $G$ is of the form $A \leftarrow B \rightarrow C$, and $B$ is observed, then $A \perp C \mid B$. However, if $B$ is unobserved, then $A \not\perp C$. Intuitively this stems from the fact that $B$ contains all the information that determines the outcomes of $A$ and $C$; once it is observed, there is nothing else that affects these variables’ outcomes.
Cascade.: If $G$ equals $A \rightarrow B \rightarrow C$, and $B$ is again observed, then, again $A \perp C \mid B$. However, if $B$ is unobserved, then $A \not\perp C$. Here, the intuition is again that $B$ holds all the information that determines the outcome of $C$; thus, it does not matter what value $A$ takes.
The latter case requires additional explanation. Suppose that $C$ is a Boolean variable that indicates whether our lawn is wet one morning; $A$ and $B$ are two explanations for it being wet: either it rained (indicated by $A$), or the sprinkler turned on (indicated by $B$). If we know that the grass is wet ($C$ is true) and the sprinkler didn’t go on ($B$ is false), then the probability that $A$ is true must be one, because that is the only other possible explanation. Hence, $A$ and $B$ are not independent given $C$.
These structures clearly describe the independencies encoded by a threevariable Bayesian net.
We can extend them to general networks by applying them recursively over any larger graph. This leads to a notion called $d$separation (where $d$ stands for directed).
Let $Q$, $W$, and $O$ be three sets of nodes in a Bayesian Network $G$. We say that $Q$ and $W$ are $d$separated given $O$ (i.e. the variables $O$ are observed) if $Q$ and $W$ are not connected by an active path. An undirected path in $G$ is called active given observed variables $O$ if for every consecutive triple of variables $X,Y,Z$ on the path, one of the following holds:
In other words: A trail $X1, \cdots, X_n$ is active given Z if:
In this example, $X_1$ and $X_6$ are $d$separated given $X_2, X_3$.
However, $X_2, X_3$ are not $d$separated given $X_1, X_6$. There is an active pass which passed through the Vstructure created when $X_6$ is observed.
For example, in the graph below, $X_1$ and $X_6$ are $d$separated given $X_2, X_3$. However, $X_2, X_3$ are not $d$separated given $X_1, X_6$, because we can find an active path $(X_2, X_6, X_5, X_3)$
The notion of $d$separation is useful, because it lets us describe a large fraction of the dependencies that hold in our model. Let $I(G) = \{(X \perp Y \mid Z) : \text{$X,Y$ are $d$sep given $Z$}\}$ be a set of variables that are $d$separated in $G$.
If $p$ factorizes over $G$, then $I(G) \subseteq I(p)$. In this case, we say that $G$ is an $I$map (independence map) for $p$.
In other words, all the independencies encoded in $G$ are sound: variables that are $d$separated in $G$ are truly independent in $p$. However, the converse is not true: a distribution may factorize over $G$, yet have independencies that are not captured in $G$.
In a way this is almost a trivial statement. If $p(x,y) = p(x)p(y)$, then this distribution still factorizes over the graph $y \rightarrow x$, since we can always write it as $p(x,y) = p(x\mid y)p(y)$ with a CPD $p(x\mid y)$ in which the probability of $x$ does not actually vary with $y$. However, we can construct a graph that matches the structure of $p$ by simply removing that unnecessary edge.
Two equivalent views of graph structure:
From the graph,
Then, we can get
Therefore, the raito of two class is:
Indtroduce Bernoulli
(or others) to calcute probabilities
1  import numpy as np 
1  def uniform(x, a, b): 
1  s = np.random.uniform(0,1,1000) 
[<matplotlib.lines.Line2D at 0x11eae0cc0>]
1  def bernoulli(p, k): 
for k = 0, 1, 2, …, n, where
1  def const(n, r): 
1  s = np.random.binomial(10, 0.8, 1000) 
where $[x = i]$ evaluates to 1 if $x = i$, 0 otherwise. There are various advantages of this formulation
1  def categorical(p, k): 
for nonnegative integers $x_1, \cdots, x_k$.
The probability mass function can be expressed using the gamma function as:
1  def factorial(n): 
1  np.random.multinomial(20, [1/6.]*6, size=10) 
array([[1, 1, 2, 3, 8, 5], [2, 4, 3, 3, 6, 2], [1, 6, 3, 2, 3, 5], [5, 3, 4, 4, 2, 2], [3, 8, 4, 2, 0, 3], [2, 4, 1, 5, 1, 7], [6, 3, 2, 4, 3, 2], [8, 2, 1, 1, 4, 4], [3, 6, 4, 1, 4, 2], [3, 2, 3, 3, 6, 3]])
1  def gamma_function(n): 
1  s = np.random.beta(2, 5, size=1000) 
Gamma distribution will be beta distribution, if $\frac{Gamma(a,1)}{Gamma(a,1) + Gamma(b,1)}$ is same with $Beta(a,b)$.
The exponential distribution and chisquared distribution are special cases of the gamma distribution.
A random variable X that is gammadistributed with shape α and rate β is denoted:
The corresponding probability density function in the shaperate parametrization is:
1  def gamma_function(n): 
1  a, b = 2., 2. 
[<matplotlib.lines.Line2D at 0x12cb80c50>]
where $\{x_k\}_{k=1}^{k=K}$ belong to the standard $K1$ simplex, or in other words:
The normalizing constant is the multivariate beta function, which can be expressed in terms of the gamma function
Dirichlet分布可以看做是分布之上的分布。如何理解这句话，我们可以先举个例子：假设我们有一个骰子，其有六面，分别为{1,2,3,4,5,6}。现在我们做了10000次投掷的实验，得到的实验结果是六面分别出现了{2000,2000,2000,2000,1000,1000}次，如果用每一面出现的次数与试验总数的比值估计这个面出现的概率，则我们得到六面出现的概率，分别为{0.2,0.2,0.2,0.2,0.1,0.1}。现在，我们还不满足，我们想要做10000次试验，每次试验中我们都投掷骰子10000次。我们想知道，骰子六面出现概率为{0.2,0.2,0.2,0.2,0.1,0.1}的概率是多少（说不定下次试验统计得到的概率为{0.1, 0.1, 0.2, 0.2, 0.2, 0.2}这样了）。这样我们就在思考骰子六面出现概率分布这样的分布之上的分布。而这样一个分布就是Dirichlet分布。
1  def normalization(x, s): 
1  s = np.random.dirichlet((0.2, 0.3, 0.5), 20).transpose() 
1  s.shape 
(3, 20)
1  plt.barh(range(20), s[0]) 
<BarContainer object of 20 artists>
1  def exponential(x, lamb): 
1  s = np.random.exponential(scale = 0.5, size=1000) 
1  def gaussian(x, n): 
1  mu, sigma = 0, 0.1 # mean and standard deviation 
[<matplotlib.lines.Line2D at 0x11122d4e0>]
1  s = np.random.poisson(5, 10000) 
If $Z_1, \cdots, Z_k$ are independent, standard normal random variables, then the sum of their squares,
is distributed according to the chisquare distribution with k degrees of freedom. This is usually denoted as
The chisquare distribution has one parameter: a positive integer k that specifies the number of degrees of freedom (the number of $Z_i$ s).
The probability density function (pdf) of the chisquare distribution is
1  def gamma_function(n): 
1  s = np.random.chisquare(4,10000) 
Let $X_1, \cdots, X_n$ be independent and identically distributed as $N(\mu, \sigma^2)$, i.e. this is a sample of size $n$ from a normally distributed population with expected mean value $\mu$ and variance $\sigma^{2}$
Let
be the sample mean and let
be the (Besselcorrected) sample variance.
Then the random variable
has a standard normal distribution
Student’s tdistribution has the probability density function given by
1  def gamma_function(n): 
1  ## Suppose the daily energy intake for 11 women in kilojoules (kJ) is: 
So the pvalue is about 0.009, which says the null hypothesis has a probability of about 99% of being true.
1  np.sum(s<t) / float(len(s)) 
0.0086
高斯模型即正态分布，高斯混合模型就是几个正态分布的叠加，每一个正态分布代表一个类别，所以和Kmeans很像，高斯混合模型也可以用来做无监督的聚类分析。
For any concave function, we have
Then, we have:
即通过模型来计算数据的期望值。通过更新参数μ和σ来让期望值最大化。这个过程可以不断迭代直到两次迭代中生成的参数变化非常小为止。该过程和kmeans的算法训练过程很相似（kmeans不断更新类中心来让结果最大化），只不过在这里的高斯模型中，我们需要同时更新两个参数：分布的均值和标准差。
KMeans 将样本分到离其最近的聚类中心所在的簇，也就是每个样本数据属于某簇的概率非零即1。对比KMeans，高斯混合的不同之处在于，样本点属于某簇的概率不是非零即1的，而是属于不同簇有不同的概率值。高斯混合模型假设所有样本点是由K个高斯分布混合而成的。
We provide a function to calculate log likelihood for mixture of Gaussians. The log likelihood quantifies the probability of observing a given set of data under a particular setting of the parameters in our model. We will use this to assess convergence of our EM algorithm; specifically, we will keep looping through EM update steps until the log likehood ceases to increase at a certain rate.
1  def log_sum_exp(Z): 
The first step in the EM algorithm is to compute cluster responsibilities. Let $r_{ik}$ denote the responsibility of cluster $k$ for data point $i$. Note that cluster responsibilities are fractional parts: Cluster responsibilities for a single data point $i$ should sum to 1.
To figure how much a cluster is responsible for a given data point, we compute the likelihood of the data point under the particular cluster assignment, multiplied by the weight of the cluster. For data point $i$ and cluster $k$, this quantity is
where $N(x_i  \mu_k, \Sigma_k)$ is the Gaussian distribution for cluster $k$ (with mean $\mu_k$ and covariance $\Sigma_k$).
We used $\propto$ because the quantity $N(x_i  \mu_k, \Sigma_k)$ is not yet the responsibility we want. To ensure that all responsibilities over each data point add up to 1, we add the normalization constant in the denominator:
Complete the following function that computes $r_{ik}$ for all data points $i$ and clusters $k$.
1  def compute_responsibilities(data, weights, means, covariances): 
Once the cluster responsibilities are computed, we update the parameters (weights, means, and covariances) associated with the clusters.
Computing soft counts. Before updating the parameters, we first compute what is known as “soft counts”. The soft count of a cluster is the sum of all cluster responsibilities for that cluster:
where we loop over data points. Note that, unlike kmeans, we must loop over every single data point in the dataset. This is because all clusters are represented in all data points, to a varying degree.
We provide the function for computing the soft counts:
1  def compute_soft_counts(resp): 
Updating weights. The cluster weights show us how much each cluster is represented over all data points. The weight of cluster $k$ is given by the ratio of the soft count $N^{\text{soft}}_{k}$ to the total number of data points $N$:
Notice that $N$ is equal to the sum over the soft counts $N^{\text{soft}}_{k}$ of all clusters.
Complete the following function:
1  def compute_weights(counts): 
Updating means. The mean of each cluster is set to the weighted average of all data points, weighted by the cluster responsibilities:
Complete the following function:
1  def compute_means(data, resp, counts): 
Updating covariances. The covariance of each cluster is set to the weighted average of all outer products, weighted by the cluster responsibilities:
The “outer product” in this context refers to the matrix product
Letting $(x_i  \hat{\mu}_k)$ to be $d \times 1$ column vector, this product is a $d \times d$ matrix. Taking the weighted average of all outer products gives us the covariance matrix, which is also $d \times d$.
Complete the following function:
1  def compute_covariances(data, resp, counts, means): 
1  # SOLUTION 
Reference from https://zhuanlan.zhihu.com/p/29538307
Reference from https://zhuanlan.zhihu.com/p/31103654
Reference from coursera course Machine Learning Foundation from University of Washington
When a network learn a new task. It has the inclination to forget the skills it has learned. The phenomenon that the model will forget the previous skills is called Catastrophic Forgetting.
The reason for resulting the catastrophic forgetting is not the model’s size which has no enough capacity to learn the new skills. To prove this we can make the model to learn a multitask problem and get a good results.
The ideas to solve knowledge retention
the idea of EWC is: learning the new parameters which are not far from the previous parameters
Conducting multitask learning by generating pseudodata using generative model.
We know the multitask learning is a good way to solve life long task(sometimes it is the upper bound). If a new task come, we can regard it as a multitask problem combined with previous tasks and build a model to solve it. But the premise of dong this is we have the data of previous tasks. In reality, we can not store all the dataset of previous tasks. Therefore, We can use generative model to generate previous dataset.
The difference of knowledge transfer between lifelong learning with transfer learning is that transfer learning
is just concentrate on new task while the lifelong learning shuild consider the catastrophic forgetting
The idea of GEM is: When we update the parameters by gredient descent we can find a direction which can benifits the previous tasks and new tasks to update. The disadvantages of GEM is we need store a little bit of data of previous tasks.
An example of model expansion is the Progressive Neural Networks
proposed in 2016. We fix the parameters after learning some tasks, and then train the new task. We build a new model and use the output of previous task as the input of new task. However, there is a disadvantage that you can not train too many new tasks because it will cause a lot of load.
Exper Gate’s method: We still have one model for each task. For example, we have three tasks, and the fourth task is similar to the first one (we use Gate to determine which new task is similar to the old one), then we will use the model of the first task as the fourth Initialization of each task model, this has formed a certain migration effect. However, this method is still a task corresponding to a model, which still causes a lot of load on storage.
The order of learning tasks is just like the order of our textbooks, which has a great impact on the final results. (This is a optimal order for the learning tasks).
]]>RCNN的算法步骤:
Kaiming He最先对此作出改进，提出了SPPnet（全称：Spatial Pyramid Pooling），最大的改进是只需要将原图输入一次，就可以得到每个候选区域的特征。
在RCNN中，候选区域需要经过变形缩放，以适应CNN输入，可以通过修改网络结构，使得任意大小的图片都能输入到CNN中。Kaiming He在论文中提出了SPP结构来适应任何大小的图片输入。SPPnet对RCNN最大的改进就是特征提取步骤做了修改，其他模块仍然和RCNN一样。特征提取不再需要每个候选区域都经过CNN，只需要将整张图片输入到CNN就可以了，ROI特征直接从特征图获取。和RCNN相比，速度提高了24~102倍。
SPPNet的算法步骤:
FAST RCNN的算法步骤：
由于Fast RCNN仍然是基于Selective Search方法提取region proposal，而Selective Search方法提取region proposal的计算是无法用GPU进行的，无法借助GPU的高度并行运算能力，所以效率极低。而且选取2000个候选区域，也加重了后面深度学习的处理压力。FasterRCNN = RPN（区域生成网络）+ FastRCNN，用RPN网络代替FastRCNN中的Selective Search是FasterRCNN的核心思想。
FASTER RCNN 的算法步骤:
所谓anchors，实际上就是一组由rpn/generate_anchors.py生成的矩形。直接运行作者demo中的generate_anchors.py可以得到以下输出：
1  [[ 84. 40. 99. 55.] 
其中每行的4个值 $(x_1, y_1, x_2, y_2)$ 表矩形左上
和右下
角点坐标。9个矩形共有3种形状，长宽比为大约为 $\frac{width}{height} \in \{1:1, 1:2, 2:1\}$三种，如图。实际上通过anchors就引入了检测中常用到的多尺度方法。
遍历Conv layers计算获得的feature maps，为每一个点都配备这9种anchors作为初始的检测框。这样做获得检测框很不准确，不用担心，后面还有2次bounding box regression可以修正检测框位置。
假设在conv5 feature map中每个点上有k个anchor（默认k=9），而每个anhcor要分positive和negative，所以每个点由256d feature转化为cls=2k scores；而每个anchor都有(x, y, w, h)对应4个偏移量，所以reg=4k coordinates. 补充一点，全部anchors拿去训练太多了，训练程序会在合适的anchors中随机选取128个postive anchors+128个negative anchors进行训练。
对于窗口一般使用四维向量 (x,y,w,h) 表示，分别表示窗口的中心点坐标和宽高。对于下图，红色的框A代表原始的positive Anchors，绿色的框G代表目标的GT，我们的目标是寻找一种关系，使得输入原始的anchor A经过映射得到一个跟真实窗口G更接近的回归窗口G’，即
那么经过何种变换F才能从图10中的anchor A变为G’呢？ 比较简单的思路就是:
需要学习的是 $d_x(A)$, $d_y(A)$, $d_w(A)$, $d_h(A)$ 这四个变换。当输入的anchor A与GT相差较小时，可以认为这种变换是一种线性变换， 那么就可以用线性回归来建模对窗口进行微调。
在 faster RCNN的原文中，positive anchor与ground truth之间的平移量$(t_x, t_y)$与尺度因子$(t_w, t_h)$如下:
对于训练bouding box regression网络回归分支，输入是X(cnn feature), label 是上述尺度变换因子，训练的目标是在输入fetaure X的条件下，回归网络分支的输出就是每个Anchor的平移量和变换尺度$(t_x, t_y, t_w, t_h)$ 显然即可用来修正Anchor位置了。
Faster RCNN的训练，是在已经训练好的model（如VGG_CNN_M_1024，VGG，ZF）的基础上继续进行训练。实际中训练过程分为6个步骤：
YOLO的核心思想就是利用整张图作为网络的输入，直接在输出层回归bounding box的位置和bounding box所属的类别。
YOLO的算法步骤:
可以提高模型收敛速度，减少过拟合；
首先采用448×448分辨率的ImageNet数据finetune使网络适应高分辨率输入；然后将该网络用于目标检测任务finetune；
去除了YOLO的全连接层，采用固定框（anchor boxes）来预测bounding boxes；借鉴Faster RCNN的做法，YOLO2也尝试采用先验框（anchor）。在每个grid预先设定一组不同大小和宽高比的边框，来覆盖整个图像的不同位置和多种尺度，这些先验框作为预定义的候选区在神经网络中将检测其中是否存在对象，以及微调边框的位置。
Anchor boxes是通过kmeans在训练集中学得的，并且作者定义了新的距离公式，使用kmeans获取anchor boxes来预测bounding boxes让模型更容易学习如何预测bounding boxes；聚类算法最重要的是选择如何计算两个边框之间的“距离”，对于常用的欧式距离，大边框会产生更大的误差，但我们关心的是边框的IOU。所以，YOLO2在聚类时采用以下公式来计算两个边框之间的“距离”。
借鉴于Faster RCNN的先验框方法，在训练的早期阶段，其位置预测容易不稳定。其位置预测公式为：
其中,(x,y)是预测边框的中心，$x_a$, $y_a$是先验框（anchor）的中心点坐标，(w_a,h_a)是先验框（anchor）的宽和高。$t_x$,$t_y$是要学习的尺度变换因子。
由于$t_x$,$t_y$没有约束，因此预测边框的中心可能出现在任何位置，训练早期阶段不容易稳定。YOLO调整了预测公式，将预测边框的中心约束在特定gird网格内。
其中，$b_X,b_y,b_w,b_h$是预测边框的中心和宽高,$Pr(object)\dot IOU(b,object) = \sigma(t_o)$ 是预测边框的置信度，YOLO1是直接预测置信度的值，这里对预测参数$t_o$进行$\sigma$变换后作为置信度的值。$c_x$,$c_y$是当前网格左上角到图像左上角的距离，要先将网格大小归一化，即令一个网格的宽=1，高=1。$p_w,p_h$是先验框的宽和高。
see explanantion in https://www.jianshu.com/p/86b8208f634f
YOLOv2通过添加一个pass through layer，将前一个卷积块的特征图的信息融合起来；对象检测面临的一个问题是图像中对象会有大有小，输入图像经过多层网络提取特征，最后输出的特征图中（比如YOLO2中输入416416经过卷积网络下采样最后输出是1313），较小的对象可能特征已经不明显甚至被忽略掉了。为了更好的检测出一些比较小的对象，最后输出的特征图需要保留一些更细节的信息。
YOLOv2网络只用到了卷积层和池化层，因此可以进行动态调整输入图像的尺寸，作者希望YOLOv2对于不同尺寸图像的检测都有较好的鲁棒性，因此做了针对性训练。这种策略让YOLOv2网络不得不学着对不同尺寸的图像输入都要预测得很好，这意味着同一个网络可以胜任不同分辨率的检测任务，在网络训练好之后，在使用时只需要根据需求，修改网络输入图像尺寸（width和height的值）即可。
Definition: search problem
Objective: find the minimum cost path from $S_start$ to an $s$ that satisfying IsEnd(s).
1 

Now suppose we don’t know what the costs are, but we observe someone getting from 1 to n via some sequence of walking and tramtaking. Can we figure out what the costs are? This is the goal of learning.
Let’s cast the problem as predicting an output y given an input x. Here, the input x is the search problem (visualized as a search tree) without the costs provided. The output y is the desired solution path. The question is what the costs should be set to so that y is actually the minimum cost path of the resulting search problem.
1  import sys 
UCS: explore states in order of PastCost(s)
A start: explore in order of PastCost(s) + h(s)
 A heuristic h(s) is any estimate of FutureCost(s).
One important aspect of A star is f = g + h. The f, g, and h variables are in our Node class and get calculated every time we create a new node. Quickly I’ll go over what these variables mean.
If h is consistent, A star returns the minimum cost path.
A heuristic h(s) is admissible if
Theorem: consistency implies admissibility
If a heuristic h(s) is consistent, then h(s) is admissible.
With an arbitrary configuration of walls, we can’t compute FutureCost(s) except by doing search. However, if we just relaxed the original problem by removing the walls, then we can compute FutureCost(s) in closed form: it’s just the Manhattan distance between $s$ and $s_{end}$. Specifically, ManhattanDistance((r1, c1), (r2, c2)) = r1 − r2 + c1 − c2.
1  import heapq, collections, re, sys, time, os, random 
An ML program is a sequence of bindings. Each binding gets typechecked and then (assuming it typechecks) evaluated. What type (if any) a binding has depends on a static environment, which is roughly the types of the preceding bindings in the file. How a binding is evaluated depends on a dynamic environment, which is roughly the values of the preceding bindings in the file. When we just say environment, we usually mean dynamic environment. Sometimes context is used as a synonym for static environment.
There are several kinds of bindings, but for now let’s consider only a variable binding, which in ML has this syntax :
Here, val is a keyword, x can be any variable, and e can be any expression. We now know a variable binding’s syntax (how to write it), but we still need to know its semantics (how it typechecks and evaluates). Mostly this depends on the expression e. To typecheck a variable binding, we use the current static environment (the types of preceding bindings) to typecheck e (which will depend on what kind of expression it is) and produce a new static environment
that is the current static environment except with x having type t where t is the type of e. Evaluation is analogous: To evaluate a variable binding, we use the “current dynamic environment” (the values of preceding bindings) to evaluate e (which will depend on what kind of expression it is) and produce a new dynamic environment
that is the current environment except with x having the value v where v is the result of evaluating e.
Whenever you learn a new construct in a programming language, you should ask these three questions:
for example:
1  fun pow (x:int, y:int) = (* correct only for y >= 0 *) 
To typecheck a function binding, we typecheck the body e in a static environment that (in addition to alltheearlierbindings)maps x1 to t1,…, xn to tn and x0 to t1 … tn > t. Because x0 is in the environment, we can make recursive function calls, i.e., a function definition can use itself. The syntax of a function type is argument types > result type
where the argument types are separated by * (which just happens to be the same character used in expressions for multiplication). For the function binding to typecheck, the body e must have the type t, i.e., the result type of x0. That makes sense given the evaluation rules below because the result of a function call is the result of evaluating e.
But what, exactly, is t  we never wrote it down? It can be any type, and it is up to the typechecker (part of the language implementation) to figure out what t should be such that using it for the result type of x0 makes, everything work out.
The evaluation rule for a function binding is trivial: A function is a value — we simply add x0 to the environment as a function that can be called later. As expected for recursion, x0 is in the dynamic environment in the function body and for subsequent bindings (but not, unlike in say Java, for preceding bindings, so the order you define functions is very important).
Tuples: fixed number of pieces that may have different types
(e1,e2)
#1 e
and #2 e
Lists can have any number of elements,but all list elements have the same type
Until we learn patternmatching, we will use three standardlibrary functions
hd
e evaluates to v1 (raise exception if e evaluates to [])tl
e evaluates to [v2,…,vn] (raise exception if e evaluates to [])Letexpressions are an absolutely crucial feature that allows for local variables in a very simple, general, and flexible way. Letexpressions are crucial for style and for efficiency. A letexpression lets us have local variables. In fact, it lets us have local bindings of any sort, including function bindings. Because it is a kind of expression, it can appear anywhere an expression can.
1  fun good_max (xs : int list) = if null xs 
The previous example does not properly handle the empty list — it returns 0. This is bad style because 0 is really not the maximum value of 0 numbers. There is no good answer, but we should deal with this case reasonably. One possibility is to raise an exception; you can learn about ML exceptions on your own if you are interested before we discuss them later in the course. Instead, let’s change the return type to either return the maximum number or indicate the input list was empty so there is no maximum. Given the constructs we have, we could code this up by return an int list, using [] if the input was the empty list and a list with one integer (the maximum) if the input list was not empty.
The ML library has options
which are a precise description: an option value has either 0 or 1 thing: NONE
is an option value carrying nothing whereas SOME e evaluates e to a value v and becomes the option carrying the one value v. The type of NONE is ‘a option and the type of SOME e is t option if e has type t.
NONE
has type ‘a option (much like [] has type ‘a list)SOME
e has type t option if e has type t (much like e::[])isSome
has type ‘a option > boolvalOf
has type ‘a option > ‘a (exception if given NONE)1  fun better_max (xs : int list) = if null xs 
andalso
e2orelse
e2not
e1= <> > < >= <=
< >= <= can be used with real, but not 1 int and 1 real
In ML, there is no way to change the contents of a binding, a tuple, or a list. If x maps to some value like the list of pairs [(3,4),(7,9)] in some environment, then x will forever map to that list in that environment. There is no assignment statement that changes x to map to a different list. (You can introduce a new binding that shadows x, but that will not affect any code that looks up the original
x in an environment.)
For a final example, the following Java is the key idea behind an actual security hole in an important (and subsequently fixed) Java library. Suppose we are maintaining permissions for who is allowed to access something like a file on the disk. It is fine to let everyone see who has permission, but clearly only those that do have permission can actually use the resource. Consider this wrong code (some parts omitted if not relevant):
1  class ProtectedResource { 
Can you find the problem? Here it is:
getAllowedUsers
returns an alias to the allowedUsers array, so any user can gain access by doing getAllowedUsers()[0] = currentUser().
Now that we have learned enough ML to write some simple functions and programs with it, we can list the essential “pieces” necessary for defining and learning any programming language:
In this week’s exercise you will train a convolutional neural network to classify images of the Fashion MNIST dataset and you will use TensorBoard to explore how it’s confusion matrix evolves over time.
1  # Load the TensorBoard notebook extension. 
1  import io 
TensorFlow version: 2.0.0
We are going to use a CNN to classify images in the the FashionMNIST dataset. This dataset consist of 70,000 grayscale images of fashion products from 10 categories, with 7,000 images per category. The images have a size of $28\times28$ pixels.
First, we load the data. Even though these are really images, we will load them as NumPy arrays and not as binary image objects. The data is already divided into training and testing sets.
1  # Load the data. 
train_images
is a NumPy array with shape (60000, 28, 28)
and test_images
is a NumPy array with shape (10000, 28, 28)
. However, our model expects arrays with shape (batch_size, height, width, channels)
. Therefore, we must reshape our NumPy arrays to also include the number of color channels. Since the images are grayscale, we will set channels
to 1
. We will also normalize the values of our NumPy arrays to be in the range [0,1]
.
1  # Preprocess images 
We will build a simple CNN and compile it.
1  # Build the model 
When training a classifier, it’s often useful to see the confusion matrix. The confusion matrix gives you detailed knowledge of how your classifier is performing on test data.
In the cell below, we will define a function that returns a Matplotlib figure containing the plotted confusion matrix.
1  def plot_confusion_matrix(cm, class_names): 
We are now ready to train the CNN and regularly log the confusion matrix during the process. In the cell below, you will create a Keras TensorBoard callback to log basic metrics.
1  # Clear logs prior to logging data. 
rm: cannot remove 'logs/image/20200222182126/cm': Directory not empty
Unfortunately, the Matplotlib file format cannot be logged as an image, but the PNG file format can be logged. So, you will create a helper function that takes a Matplotlib figure and converts it to PNG format so it can be written.
1  def plot_to_image(figure): 
In the cell below, you will define a function that calculates the confusion matrix.
1  def log_confusion_matrix(epoch, logs): 
The next step will be to run the code shown below to render the TensorBoard. Unfortunately, TensorBoard cannot be rendered within the Coursera environment. Therefore, we won’t run the code below.
1  # Start TensorBoard. 
However, you are welcome to download the notebook and run the above code locally on your machine or in Google’s Colab to see TensorBoard in action. Below are some example screenshots that you should see when executing the code:
In this exercise, we will learn on how to create models for TensorFlow Hub. You will be tasked with performing the following tasks:
1  import numpy as np 
We will start by creating a class called MNIST
. This class will load the MNIST dataset, preprocess the images from the dataset, and build a CNN based classifier. This class will also have some methods to train, test, and save our model.
In the cell below, fill in the missing code and create the following Keras Sequential
model:
1  Model: "sequential" 
Notice that we are using a tf.keras.layers.Lambda
layer at the beginning of our model. Lambda
layers are used to wrap arbitrary expressions as a Layer
object:
1  tf.keras.layers.Lambda(expression) 
The Lambda
layer exists so that arbitrary TensorFlow functions can be used when constructing Sequential
and Functional API models. Lambda
layers are best suited for simple operations.
1  class MNIST: 
We will now use the MNIST
class we created above to create an mnist
object. When creating our mnist
object we will use a dictionary to pass our training parameters. We will then call the train
and export_model
methods to train and save our model, respectively. Finally, we call the test
method to evaluate our model after training.
NOTE: It will take about 12 minutes to train the model for 5 epochs.
1  # Define the training parameters. 
WARNING:absl:Found a different version 3.0.0 of dataset mnist in data_dir /tf/week2/../tmp2. Using currently defined version 1.0.0.Model: "sequential_2"_________________________________________________________________Layer (type) Output Shape Param # =================================================================lambda_1 (Lambda) (None, 28, 28, 1) 0 _________________________________________________________________conv2d_3 (Conv2D) (None, 28, 28, 8) 80 _________________________________________________________________max_pooling2d_2 (MaxPooling2 (None, 14, 14, 8) 0 _________________________________________________________________conv2d_4 (Conv2D) (None, 14, 14, 16) 1168 _________________________________________________________________max_pooling2d_3 (MaxPooling2 (None, 7, 7, 16) 0 _________________________________________________________________conv2d_5 (Conv2D) (None, 7, 7, 32) 4640 _________________________________________________________________flatten_1 (Flatten) (None, 1568) 0 _________________________________________________________________dense_1 (Dense) (None, 128) 200832 _________________________________________________________________dense_2 (Dense) (None, 10) 1290 =================================================================Total params: 208,010Trainable params: 208,010Nontrainable params: 0_________________________________________________________________NoneEpoch 1/51875/1875 [==============================]  135s 72ms/step  loss: 0.1548  accuracy: 0.9532548  accuracy: 0.95Epoch 2/5 563/1875 [========>.....................]  ETA: 1:36  loss: 0.0868  accuracy: 0.9733
The export_model
method saved our model in the TensorFlow SavedModel format in the ./saved_model
directory. The SavedModel format saves our model and its weights in various files and directories. This makes it difficult to distribute our model. Therefore, it is convenient to create a single compressed file that contains all the files and folders of our model. To do this, we will use the tar
archiving program to create a tarball (similar to a Zip file) that contains our SavedModel.
1  # Create a tarball from the SavedModel. 
We can uncompress our tarball to make sure it has all the files and folders from our SavedModel.
1  # Inspect the tarball. 
././variables/./variables/variables.data00001of00002./variables/variables.data00000of00002./variables/variables.index./saved_model.pb./assets/
Once we have verified our tarball, we can now simulate server conditions. In a normal scenario, we will fetch our TF Hub module from a remote server using the module’s handle. However, since this notebook cannot host the server, we will instead point the module handle to the directory where our SavedModel is stored.
1  !rm rf ./module 
././variables/./variables/variables.data00001of00002./variables/variables.data00000of00002./variables/variables.index./saved_model.pb./assets/
1  # Define the module handle. 
1  # EXERCISE: Load the TF Hub module using the hub.load API. 
We will now test our TF Hub module with images from the test
split of the MNIST dataset.
1  filePath = f"{getcwd()}/../tmp2" 
WARNING:absl:Found a different version 3.0.0 of dataset mnist in data_dir /tf/week2/../tmp2. Using currently defined version 1.0.0.
1  # Test the TF Hub module for a single batch of data 
Predicted Labels: [6 2 3 7 2 2 3 4 7 6 6 9 2 0 9 6 2 0 6 5 1 4 8 1 9 8 4 0 0 5 8 4]True Labels: [6 2 3 7 2 2 3 4 7 6 6 9 2 0 9 6 8 0 6 5 1 4 8 1 9 8 4 0 0 5 2 4]
We can see that the model correctly predicts the labels for most images in the batch.
In the cell below, you will integrate the TensorFlow Hub module into the high level Keras API.
1  # EXERCISE: Integrate the TensorFlow Hub module into a Keras 
1  # Evaluate the model on the test_dataset. 
313/313 [==============================]  27s 88ms/step  loss: 0.0605  accuracy: 0.9824
1  # Print the metric values on which the model is being evaluated on. 
loss: 0.061accuracy: 0.982
]]>In this notebook, you will train a neural network to classify images of handwritten digits from the MNIST dataset. You will then save the trained model, and serve it using TensorFlow Serving.
1  try: 
1  import os 
Using TensorFlow Version: 2.2.0dev20200217
The MNIST dataset contains 70,000 grayscale images of the digits 0 through 9. The images show individual digits at a low resolution (28 by 28 pixels).
Even though these are really images, we will load them as NumPy arrays and not as binary image objects.
1  mnist = tf.keras.datasets.mnist 
Downloading data from https://storage.googleapis.com/tensorflow/tfkerasdatasets/mnist.npz11493376/11490434 [==============================]  1s 0us/step
1  # EXERCISE: Scale the values of the arrays below to be between 0.0 and 1.0. 
1  train_images.shape, test_images.shape 
((60000, 28, 28), (10000, 28, 28))
In the cell below use the .reshape
method to resize the arrays to the following sizes:
1  train_images.shape: (60000, 28, 28, 1) 
1  # EXERCISE: Reshape the arrays below. 
1  print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype)) 
train_images.shape: (60000, 28, 28, 1), of float64test_images.shape: (10000, 28, 28, 1), of float64
1  idx = 42 
In the cell below build a tf.keras.Sequential
model that can be used to classify the images of the MNIST dataset. Feel free to use the simplest possible CNN. Make sure your model has the correct input_shape
and the correct number of output units.
1  # EXERCISE: Create a model. 
Model: "sequential"_________________________________________________________________Layer (type) Output Shape Param # =================================================================Conv1 (Conv2D) (None, 13, 13, 8) 80 _________________________________________________________________flatten (Flatten) (None, 1352) 0 _________________________________________________________________Softmax (Dense) (None, 10) 13530 =================================================================Total params: 13,610Trainable params: 13,610Nontrainable params: 0_________________________________________________________________
In the cell below configure your model for training using the adam
optimizer, sparse_categorical_crossentropy
as the loss, and accuracy
for your metrics. Then train the model for the given number of epochs, using the train_images
array.
1  # EXERCISE: Configure the model for training. 
Train on 60000 samples, validate on 10000 samplesEpoch 1/560000/60000 [==============================]  8s 127us/sample  loss: 0.3098  accuracy: 0.9120  val_loss: 0.1723  val_accuracy: 0.9489Epoch 2/560000/60000 [==============================]  7s 121us/sample  loss: 0.1511  accuracy: 0.9569  val_loss: 0.1145  val_accuracy: 0.9667Epoch 3/560000/60000 [==============================]  7s 122us/sample  loss: 0.1103  accuracy: 0.9680  val_loss: 0.0939  val_accuracy: 0.9720Epoch 4/560000/60000 [==============================]  7s 121us/sample  loss: 0.0901  accuracy: 0.9737  val_loss: 0.0895  val_accuracy: 0.9739Epoch 5/560000/60000 [==============================]  7s 121us/sample  loss: 0.0780  accuracy: 0.9763  val_loss: 0.0787  val_accuracy: 0.9758
1  # EXERCISE: Evaluate the model on the test images. 
10000/10000 [==============================]  0s 39us/sample  loss: 0.0787  accuracy: 0.9758loss: 0.0787accuracy: 0.976
1  MODEL_DIR = "digits_model" 
1  !saved_model_cli show dir {export_path} all 
MetaGraphDef with tagset: 'serve' contains the following SignatureDefs:signature_def['__saved_model_init_op']: The given SavedModel SignatureDef contains the following input(s): The given SavedModel SignatureDef contains the following output(s): outputs['__saved_model_init_op'] tensor_info: dtype: DT_INVALID shape: unknown_rank name: NoOp Method name is: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['Conv1_input'] tensor_info: dtype: DT_FLOAT shape: (1, 28, 28, 1) name: serving_default_Conv1_input:0 The given SavedModel SignatureDef contains the following output(s): outputs['Softmax'] tensor_info: dtype: DT_FLOAT shape: (1, 10) name: StatefulPartitionedCall:0 Method name is: tensorflow/serving/predictWARNING:tensorflow:From /Users/ZRC/miniconda3/envs/tryit/lib/python3.6/sitepackages/tensorflow/python/ops/resource_variable_ops.py:1809: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.Instructions for updating:If using Keras pass *_constraint arguments to layers.Defined Functions: Function Name: '__call__' Option #1 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #2 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #3 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #4 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Function Name: '_default_save_signature' Option #1 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Function Name: 'call_and_return_all_conditional_losses' Option #1 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #2 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #3 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #4 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None
1  # This is the same as you would do from your command line, but without the [arch=amd64], and no sudo 
tee: /etc/apt/sources.list.d/tensorflowserving.list: No such file or directorydeb http://storage.googleapis.com/tensorflowservingapt stable tensorflowmodelserver tensorflowmodelserveruniversalUnable to locate an executable at "/Library/Java/JavaVirtualMachines/jdk1.8.0_73.jdk/Contents/Home/bin/apt" (1)
1  !aptget install tensorflowmodelserver 
/bin/sh: aptget: command not found
You will now launch the TensorFlow model server with a bash script. In the cell below use the following parameters when running the TensorFlow model server:
rest_api_port
: Use port 8501
for your requests.model_name
: Use digits_model
as your model name. model_base_path
: Use the environment variable MODEL_DIR
defined below as the base path to the saved model.1  os.environ["MODEL_DIR"] = MODEL_DIR 
1  MODEL_DIR 
'digits_model'
1  %%bash 
docker: Error response from daemon: driver failed programming external connectivity on endpoint wonderful_ganguly (32049ac8fc320931031b817d6269004dcac5878b1e8c8addceb79fb1cd5dca24): Bind for 0.0.0.0:8501 failed: port is already allocated.
1  # # EXERCISE: Fill in the missing code below. 
1  !tail server.log 
docker: Error response from daemon: driver failed programming external connectivity on endpoint determined_antonelli (6a4d67f3751bc7f9fed7821f6df430cbf368992e00c710738cb6f1d9f1e647d2): Bind for 0.0.0.0:8501 failed: port is already allocated.
In the cell below construct a JSON object and use the first three images of the testing set (test_images
) as your data.
1  # EXERCISE: Create JSON Object 
In the cell below, send a predict request as a POST to the server’s REST endpoint, and pass it your test data. You should ask the server to give you the latest version of your model.
1  # EXERCISE: Fill in the code below 
1  predictions 
[[1.68959902e09, 3.5654768e10, 4.89267848e07, 6.09665512e05, 5.58686e10, 2.27727126e09, 8.49646383e15, 0.999936819, 1.21679037e07, 1.63355e06], [1.37625943e06, 6.47248962e05, 0.99961108, 4.23558049e06, 5.8764843e10, 7.63888067e07, 0.000306190021, 2.43645703e15, 1.15736648e05, 8.24755108e12], [2.70507389e05, 0.996317267, 0.00121238839, 1.1719947e05, 0.00065574795, 2.86702857e06, 0.000181563752, 0.000955980679, 0.000630841067, 4.60029833e06]]
1  plt.figure(figsize=(10,15)) 
In this week’s exercise, we’ll go back to the classic cats versus dogs example, but instead of just naively loading the data to train a model, you will be parallelizing various stages of the Extract, Transform and Load processes. In particular, you will be performing following tasks:
1  import multiprocessing 
1  def create_model(): 
Just for comparison, let’s start by using the naive approach to Extract, Transform, and Load the data to train the model defined above. By naive approach we mean that we won’t apply any of the new concepts of parallelization that we learned about in this module.
1  dataset_name = 'cats_vs_dogs' 
1  print(info.version) 
2.0.1
1  def preprocess(features): 
1  train_dataset = dataset.map(preprocess).batch(32) 
The next step will be to train the model using the following code:
1  model = create_model() 
Since we want to focus on the parallelization techniques, we won’t go through the training process here, as this can take some time.
The following exercises are about parallelizing various stages of Extract, Transform and Load processes. In particular, you will be tasked with performing following tasks:
We start by creating a dataset of strings corresponding to the file_pattern
of the TFRecords of the cats_vs_dogs dataset.
1  file_pattern = f'{getcwd()}/../tmp2/{dataset_name}/{info.version}/{dataset_name}train.tfrecord*' 
Let’s recall that the TFRecord format is a simple format for storing a sequence of binary records. This is very useful because by serializing the data and storing it in a set of files (100200MB each) that can each be read linearly greatly increases the efficiency when reading the data.
Since we will use it later, we should also recall that a tf.Example
message (or protobuf) is a flexible message type that represents a {"string": tf.train.Feature}
mapping.
In the cell below you will use the interleave operation with certain arguments to parallelize the extraction of the stored TFRecords of the cats_vs_dogs dataset.
Recall that tf.data.experimental.AUTOTUNE
will delegate the decision about what level of parallelism to use to the tf.data
runtime.
1  # EXERCISE: Parallelize the extraction of the stored TFRecords of 
At this point the train_dataset
contains serialized tf.train.Example
messages. When iterated over, it returns these as scalar string tensors. The sample output for one record is given below:
1  <tf.Tensor: id=189, shape=(), dtype=string, numpy=b'\n\x8f\xc4\x01\n\x0e\n\x05label\x12\x05\x1a\x03\n\x01\x00\n,\n\x0eimage/filename\x12\x1a\n\x18\n\x16PetImages/Cat/4159.jpg\n\xcd\xc3\x01\n\x05image\x12...\xff\xd9'> 
In order to be able to use these tensors to train our model, we must first parse them and decode them. We can parse and decode these string tensors by using a function. In the cell below you will create a read_tfrecord
function that will read the serialized tf.train.Example
messages and decode them. The function will also normalize and resize the images after they have been decoded.
In order to parse the tf.train.Example
messages we need to create a feature_description
dictionary. We need the feature_description
dictionary because TFDS uses graphexecution and therefore, needs this description to build their shape and type signature. The basic structure of the feature_description
dictionary looks like this:
1  feature_description = {'feature': tf.io.FixedLenFeature([], tf.Dtype, default_value)} 
The number of features in your feature_description
dictionary will vary depending on your dataset. In our particular case, the features are 'image'
and 'label'
and can be seen in the sample output of the string tensor above. Therefore, our feature_description
dictionary will look like this:
1  feature_description = { 
where we have given the default values of ""
and 1
to the 'image'
and 'label'
respectively.
The next step will be to parse the serialized tf.train.Example
message using the feature_description
dictionary given above. This can be done with the following code:
1  example = tf.io.parse_single_example(serialized_example, feature_description) 
Finally, we can decode the image by using:
1  image = tf.io.decode_jpeg(example['image'], channels=3) 
Use the code given above to complete the exercise below.
1  # EXERCISE: Fill in the missing code below. 
You can now apply the read_tfrecord
function to each item in the train_dataset
by using the map
method. You can parallelize the transformation of the train_dataset
by using the map
method with the num_parallel_calls
set to the number of CPU cores.
1  # EXERCISE: Fill in the missing code below. 
8
1  # EXERCISE: Cache the train_dataset inmemory. 
1  # EXERCISE: Fill in the missing code below. 
The next step will be to train your model using the following code:
1  model = create_model() 
We won’t go through the training process here as this can take some time. However, due to the parallelization of the various stages of the ETL processes, you should see a decrease in training time as compared to the naive approach depicted at beginning of the notebook.
```
1  import pandas as pd 
Pandas is a Python library with many helpful utilities for loading and working with structured data. We will use Pandas to download the dataset and load it into a dataframe.
1  filePath = f"{getcwd()}/../tmp2/heart.csv" 
age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  ca  thal  target  

0  63  1  1  145  233  1  2  150  0  2.3  3  0  fixed  0 
1  67  1  4  160  286  0  2  108  1  1.5  2  3  normal  1 
2  67  1  4  120  229  0  2  129  1  2.6  2  2  reversible  0 
3  37  1  3  130  250  0  0  187  0  3.5  3  0  normal  0 
4  41  0  2  130  204  0  2  172  0  1.4  1  0  normal  0 
The dataset we downloaded was a single CSV file. We will split this into train, validation, and test sets.
1  train, test = train_test_split(dataframe, test_size=0.2) 
193 train examples49 validation examples61 test examples
tf.data
Next, we will wrap the dataframes with tf.data. This will enable us to use feature columns as a bridge to map from the columns in the Pandas dataframe to features used to train the model. If we were working with a very large CSV file (so large that it does not fit into memory), we would use tf.data to read it from disk directly.
1  # EXERCISE: A utility method to create a tf.data dataset from a Pandas Dataframe. 
1  batch_size = 5 # A small batch sized is used for demonstration purposes 
Now that we have created the input pipeline, let’s call it to see the format of the data it returns. We have used a small batch size to keep the output readable.
1  for feature_batch, label_batch in train_ds.take(1): 
Every feature: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']A batch of ages: tf.Tensor([51 63 64 58 57], shape=(5,), dtype=int32)A batch of targets: tf.Tensor([0 1 0 0 0], shape=(5,), dtype=int64)
We can see that the dataset returns a dictionary of column names (from the dataframe) that map to column values from rows in the dataframe.
TensorFlow provides many types of feature columns. In this section, we will create several types of feature columns, and demonstrate how they transform a column from the dataframe.
1  # Try to demonstrate several types of feature columns by getting an example. 
1  # A utility method to create a feature column and to transform a batch of data. 
The output of a feature column becomes the input to the model (using the demo function defined above, we will be able to see exactly how each column from the dataframe is transformed). A numeric column is the simplest type of column. It is used to represent real valued features.
1  # EXERCISE: Create a numeric feature column out of 'age' and demo it. 
[[51.] [58.] [63.] [64.] [60.]]
In the heart disease dataset, most columns from the dataframe are numeric.
Often, you don’t want to feed a number directly into the model, but instead split its value into different categories based on numerical ranges. Consider raw data that represents a person’s age. Instead of representing age as a numeric column, we could split the age into several buckets using a bucketized column.
1  # EXERCISE: Create a bucketized feature column out of 'age' with 
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]
Notice the onehot values above describe which age range each row matches.
In this dataset, thal is represented as a string (e.g. ‘fixed’, ‘normal’, or ‘reversible’). We cannot feed strings directly to a model. Instead, we must first map them to numeric values. The categorical vocabulary columns provide a way to represent strings as a onehot vector (much like you have seen above with age buckets).
Note: You will probably see some warning messages when running some of the code cell below. These warnings have to do with software updates and should not cause any errors or prevent your code from running.
1  # EXERCISE: Create a categorical vocabulary column out of the 
[[0. 1. 0.] [0. 1. 0.] [0. 0. 1.] [0. 0. 1.] [0. 1. 0.]]
The vocabulary can be passed as a list using categorical_column_with_vocabulary_list, or loaded from a file using categorical_column_with_vocabulary_file.
Suppose instead of having just a few possible strings, we have thousands (or more) values per category. For a number of reasons, as the number of categories grow large, it becomes infeasible to train a neural network using onehot encodings. We can use an embedding column to overcome this limitation. Instead of representing the data as a onehot vector of many dimensions, an embedding column represents that data as a lowerdimensional, dense vector in which each cell can contain any number, not just 0 or 1. You can tune the size of the embedding with the dimension
parameter.
1  # EXERCISE: Create an embedding column out of the categorical 
[[1.4254066e01 1.0374661e01 3.4352791e01 3.3996427e01 3.2193713e02 1.8381193e01 1.8051244e01 3.2638407e01] [1.4254066e01 1.0374661e01 3.4352791e01 3.3996427e01 3.2193713e02 1.8381193e01 1.8051244e01 3.2638407e01] [6.5549983e05 2.7680036e01 4.1849682e01 5.3418136e01 1.6281548e01 2.5406811e01 8.8969752e02 1.8004593e01] [6.5549983e05 2.7680036e01 4.1849682e01 5.3418136e01 1.6281548e01 2.5406811e01 8.8969752e02 1.8004593e01] [1.4254066e01 1.0374661e01 3.4352791e01 3.3996427e01 3.2193713e02 1.8381193e01 1.8051244e01 3.2638407e01]]
Another way to represent a categorical column with a large number of values is to use a categorical_column_with_hash_bucket. This feature column calculates a hash value of the input, then selects one of the hash_bucket_size
buckets to encode a string. When using this column, you do not need to provide the vocabulary, and you can choose to make the number of hash buckets significantly smaller than the number of actual categories to save space.
1  # EXERCISE: Create a hashed feature column with 'thal' as the key and 
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]
Combining features into a single feature, better known as feature crosses, enables a model to learn separate weights for each combination of features. Here, we will create a new feature that is the cross of age and thal. Note that crossed_column
does not build the full table of all possible combinations (which could be very large). Instead, it is backed by a hashed_column
, so you can choose how large the table is.
1  # EXERCISE: Create a crossed column using the bucketized column (age_buckets), 
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]
We have seen how to use several types of feature columns. Now we will use them to train a model. The goal of this exercise is to show you the complete code needed to work with feature columns. We have selected a few columns to train our model below arbitrarily.
If your aim is to build an accurate model, try a larger dataset of your own, and think carefully about which features are the most meaningful to include, and how they should be represented.
1  dataframe.dtypes 
age int64sex int64cp int64trestbps int64chol int64fbs int64restecg int64thalach int64exang int64oldpeak float64slope int64ca int64thal objecttarget int64dtype: object
You can use the above list of column datatypes to map the appropriate feature column to every column in the dataframe.
1  # EXERCISE: Fill in the missing code below 
Now that we have defined our feature columns, we will use a DenseFeatures layer to input them to our Keras model.
1  # EXERCISE: Create a Keras DenseFeatures layer and pass the feature_columns you just created. 
Earlier, we used a small batch size to demonstrate how feature columns worked. We create a new input pipeline with a larger batch size.
1  batch_size = 32 
1  model = tf.keras.Sequential([ 
Epoch 1/1007/7 [==============================]  4s 609ms/step  loss: 1.5455  accuracy: 0.6321  val_loss: 0.0000e+00  val_accuracy: 0.0000e+00Epoch 2/1007/7 [==============================]  0s 45ms/step  loss: 1.6424  accuracy: 0.5803  val_loss: 1.7392  val_accuracy: 0.7143Epoch 3/1007/7 [==============================]  0s 44ms/step  loss: 1.2255  accuracy: 0.6995  val_loss: 0.7653  val_accuracy: 0.5714Epoch 4/1007/7 [==============================]  0s 44ms/step  loss: 0.7326  accuracy: 0.6891  val_loss: 0.5689  val_accuracy: 0.6939Epoch 5/1007/7 [==============================]  0s 43ms/step  loss: 0.5230  accuracy: 0.7358  val_loss: 0.5406  val_accuracy: 0.7143Epoch 6/1007/7 [==============================]  0s 44ms/step  loss: 0.4348  accuracy: 0.8083  val_loss: 0.5609  val_accuracy: 0.7143Epoch 7/1007/7 [==============================]  0s 56ms/step  loss: 0.4592  accuracy: 0.7824  val_loss: 0.5710  val_accuracy: 0.7347Epoch 8/1007/7 [==============================]  0s 45ms/step  loss: 0.4996  accuracy: 0.7461  val_loss: 0.5585  val_accuracy: 0.7143Epoch 9/1007/7 [==============================]  0s 44ms/step  loss: 0.4389  accuracy: 0.7927  val_loss: 0.5297  val_accuracy: 0.6735Epoch 10/1007/7 [==============================]  0s 55ms/step  loss: 0.3914  accuracy: 0.8446  val_loss: 0.5216  val_accuracy: 0.6531Epoch 11/1007/7 [==============================]  0s 45ms/step  loss: 0.4022  accuracy: 0.7979  val_loss: 0.5331  val_accuracy: 0.7347Epoch 12/1007/7 [==============================]  0s 54ms/step  loss: 0.3811  accuracy: 0.8238  val_loss: 0.6522  val_accuracy: 0.6735Epoch 13/1007/7 [==============================]  0s 44ms/step  loss: 0.4173  accuracy: 0.7927  val_loss: 0.5219  val_accuracy: 0.7347Epoch 14/1007/7 [==============================]  0s 44ms/step  loss: 0.4235  accuracy: 0.7513  val_loss: 0.5027  val_accuracy: 0.6531Epoch 15/1007/7 [==============================]  0s 44ms/step  loss: 0.3789  accuracy: 0.7979  val_loss: 0.7249  val_accuracy: 0.6531Epoch 16/1007/7 [==============================]  0s 45ms/step  loss: 0.3972  accuracy: 0.8342  val_loss: 0.4830  val_accuracy: 0.6939Epoch 17/1007/7 [==============================]  0s 55ms/step  loss: 0.3339  accuracy: 0.8601  val_loss: 0.4912  val_accuracy: 0.6531Epoch 18/1007/7 [==============================]  0s 44ms/step  loss: 0.3555  accuracy: 0.7927  val_loss: 0.6399  val_accuracy: 0.6939Epoch 19/1007/7 [==============================]  0s 43ms/step  loss: 0.3531  accuracy: 0.8601  val_loss: 0.5526  val_accuracy: 0.6735Epoch 20/1007/7 [==============================]  0s 44ms/step  loss: 0.3810  accuracy: 0.7876  val_loss: 0.5751  val_accuracy: 0.7143Epoch 21/1007/7 [==============================]  0s 56ms/step  loss: 0.3409  accuracy: 0.8549  val_loss: 0.5524  val_accuracy: 0.7551Epoch 22/1007/7 [==============================]  0s 44ms/step  loss: 0.3167  accuracy: 0.8756  val_loss: 0.6607  val_accuracy: 0.7143Epoch 23/1007/7 [==============================]  0s 45ms/step  loss: 0.3732  accuracy: 0.8601  val_loss: 0.5993  val_accuracy: 0.6939Epoch 24/1007/7 [==============================]  0s 55ms/step  loss: 0.3918  accuracy: 0.7979  val_loss: 0.5646  val_accuracy: 0.6735Epoch 25/1007/7 [==============================]  0s 44ms/step  loss: 0.3624  accuracy: 0.8187  val_loss: 0.7324  val_accuracy: 0.6735Epoch 26/1007/7 [==============================]  0s 56ms/step  loss: 0.3531  accuracy: 0.8446  val_loss: 0.4501  val_accuracy: 0.6939Epoch 27/1007/7 [==============================]  0s 45ms/step  loss: 0.3164  accuracy: 0.8653  val_loss: 0.4770  val_accuracy: 0.6735Epoch 28/1007/7 [==============================]  0s 55ms/step  loss: 0.3557  accuracy: 0.8290  val_loss: 0.5188  val_accuracy: 0.7551Epoch 29/1007/7 [==============================]  0s 45ms/step  loss: 0.3193  accuracy: 0.8446  val_loss: 0.5949  val_accuracy: 0.7347Epoch 30/1007/7 [==============================]  0s 55ms/step  loss: 0.3049  accuracy: 0.8601  val_loss: 0.5904  val_accuracy: 0.7347Epoch 31/1007/7 [==============================]  0s 44ms/step  loss: 0.3150  accuracy: 0.8705  val_loss: 0.4901  val_accuracy: 0.6531Epoch 32/1007/7 [==============================]  0s 55ms/step  loss: 0.3223  accuracy: 0.8446  val_loss: 0.5034  val_accuracy: 0.6939Epoch 33/1007/7 [==============================]  0s 43ms/step  loss: 0.3178  accuracy: 0.8394  val_loss: 0.6359  val_accuracy: 0.7347Epoch 34/1007/7 [==============================]  0s 43ms/step  loss: 0.3041  accuracy: 0.8549  val_loss: 0.5558  val_accuracy: 0.7551Epoch 35/1007/7 [==============================]  0s 44ms/step  loss: 0.2853  accuracy: 0.8808  val_loss: 0.5089  val_accuracy: 0.6939Epoch 36/1007/7 [==============================]  0s 55ms/step  loss: 0.2905  accuracy: 0.8653  val_loss: 0.5989  val_accuracy: 0.7347Epoch 37/1007/7 [==============================]  0s 45ms/step  loss: 0.2885  accuracy: 0.8705  val_loss: 0.5644  val_accuracy: 0.7551Epoch 38/1007/7 [==============================]  0s 55ms/step  loss: 0.2890  accuracy: 0.8601  val_loss: 0.5590  val_accuracy: 0.7755Epoch 39/1007/7 [==============================]  0s 45ms/step  loss: 0.2792  accuracy: 0.8808  val_loss: 0.4820  val_accuracy: 0.7551Epoch 40/1007/7 [==============================]  0s 55ms/step  loss: 0.2781  accuracy: 0.8653  val_loss: 0.4974  val_accuracy: 0.7551Epoch 41/1007/7 [==============================]  0s 45ms/step  loss: 0.2873  accuracy: 0.8705  val_loss: 0.5550  val_accuracy: 0.7551Epoch 42/1007/7 [==============================]  0s 55ms/step  loss: 0.2737  accuracy: 0.8808  val_loss: 0.5356  val_accuracy: 0.7347Epoch 43/1007/7 [==============================]  0s 44ms/step  loss: 0.2677  accuracy: 0.8860  val_loss: 0.5071  val_accuracy: 0.7551Epoch 44/1007/7 [==============================]  0s 44ms/step  loss: 0.2794  accuracy: 0.8756  val_loss: 0.5320  val_accuracy: 0.6939Epoch 45/1007/7 [==============================]  0s 55ms/step  loss: 0.2932  accuracy: 0.8394  val_loss: 0.5533  val_accuracy: 0.7755Epoch 46/1007/7 [==============================]  0s 45ms/step  loss: 0.2750  accuracy: 0.8705  val_loss: 0.5723  val_accuracy: 0.7347Epoch 47/1007/7 [==============================]  0s 55ms/step  loss: 0.2694  accuracy: 0.8808  val_loss: 0.5347  val_accuracy: 0.7551Epoch 48/1007/7 [==============================]  0s 44ms/step  loss: 0.2632  accuracy: 0.8912  val_loss: 0.5369  val_accuracy: 0.7755Epoch 49/1007/7 [==============================]  0s 44ms/step  loss: 0.2677  accuracy: 0.8860  val_loss: 0.5837  val_accuracy: 0.7143Epoch 50/1007/7 [==============================]  0s 55ms/step  loss: 0.2635  accuracy: 0.8808  val_loss: 0.5337  val_accuracy: 0.7755Epoch 51/1007/7 [==============================]  0s 45ms/step  loss: 0.2592  accuracy: 0.8912  val_loss: 0.5533  val_accuracy: 0.7755Epoch 52/1007/7 [==============================]  0s 55ms/step  loss: 0.2536  accuracy: 0.8912  val_loss: 0.5743  val_accuracy: 0.7347Epoch 53/1007/7 [==============================]  0s 45ms/step  loss: 0.2511  accuracy: 0.9016  val_loss: 0.5451  val_accuracy: 0.7551Epoch 54/1007/7 [==============================]  0s 55ms/step  loss: 0.2650  accuracy: 0.8860  val_loss: 0.5864  val_accuracy: 0.6531Epoch 55/1007/7 [==============================]  0s 45ms/step  loss: 0.3354  accuracy: 0.8290  val_loss: 0.5772  val_accuracy: 0.7347Epoch 56/1007/7 [==============================]  0s 54ms/step  loss: 0.2759  accuracy: 0.8653  val_loss: 0.5857  val_accuracy: 0.7347Epoch 57/1007/7 [==============================]  0s 44ms/step  loss: 0.2522  accuracy: 0.8860  val_loss: 0.5930  val_accuracy: 0.7347Epoch 58/1007/7 [==============================]  0s 44ms/step  loss: 0.2488  accuracy: 0.8808  val_loss: 0.5814  val_accuracy: 0.7551Epoch 59/1007/7 [==============================]  0s 55ms/step  loss: 0.2428  accuracy: 0.8964  val_loss: 0.5805  val_accuracy: 0.7551Epoch 60/1007/7 [==============================]  0s 44ms/step  loss: 0.2555  accuracy: 0.8912  val_loss: 0.5903  val_accuracy: 0.7347Epoch 61/1007/7 [==============================]  0s 44ms/step  loss: 0.2391  accuracy: 0.9016  val_loss: 0.5721  val_accuracy: 0.7755Epoch 62/1007/7 [==============================]  0s 44ms/step  loss: 0.2423  accuracy: 0.8860  val_loss: 0.5911  val_accuracy: 0.7551Epoch 63/1007/7 [==============================]  0s 44ms/step  loss: 0.2450  accuracy: 0.8756  val_loss: 0.5845  val_accuracy: 0.7755Epoch 64/1007/7 [==============================]  0s 55ms/step  loss: 0.2447  accuracy: 0.8912  val_loss: 0.5883  val_accuracy: 0.7551Epoch 65/1007/7 [==============================]  0s 44ms/step  loss: 0.2386  accuracy: 0.8964  val_loss: 0.6093  val_accuracy: 0.7551Epoch 66/1007/7 [==============================]  0s 56ms/step  loss: 0.2278  accuracy: 0.9067  val_loss: 0.6654  val_accuracy: 0.7347Epoch 67/1007/7 [==============================]  0s 44ms/step  loss: 0.2474  accuracy: 0.8912  val_loss: 0.6545  val_accuracy: 0.7143Epoch 68/1007/7 [==============================]  0s 45ms/step  loss: 0.2509  accuracy: 0.8808  val_loss: 0.6298  val_accuracy: 0.6735Epoch 69/1007/7 [==============================]  0s 44ms/step  loss: 0.2931  accuracy: 0.8549  val_loss: 0.6237  val_accuracy: 0.7347Epoch 70/1007/7 [==============================]  0s 44ms/step  loss: 0.2653  accuracy: 0.8808  val_loss: 0.6296  val_accuracy: 0.7143Epoch 71/1007/7 [==============================]  0s 43ms/step  loss: 0.2649  accuracy: 0.8549  val_loss: 0.5915  val_accuracy: 0.6531Epoch 72/1007/7 [==============================]  0s 44ms/step  loss: 0.3141  accuracy: 0.8394  val_loss: 0.6017  val_accuracy: 0.7755Epoch 73/1007/7 [==============================]  0s 45ms/step  loss: 0.2557  accuracy: 0.8756  val_loss: 0.6444  val_accuracy: 0.7347Epoch 74/1007/7 [==============================]  0s 55ms/step  loss: 0.2220  accuracy: 0.9067  val_loss: 0.6380  val_accuracy: 0.7347Epoch 75/1007/7 [==============================]  0s 44ms/step  loss: 0.2209  accuracy: 0.9016  val_loss: 0.6977  val_accuracy: 0.7347Epoch 76/1007/7 [==============================]  0s 55ms/step  loss: 0.2318  accuracy: 0.8964  val_loss: 0.6422  val_accuracy: 0.7347Epoch 77/1007/7 [==============================]  0s 44ms/step  loss: 0.2183  accuracy: 0.9067  val_loss: 0.6183  val_accuracy: 0.7143Epoch 78/1007/7 [==============================]  0s 55ms/step  loss: 0.2304  accuracy: 0.8860  val_loss: 0.6522  val_accuracy: 0.7143Epoch 79/1007/7 [==============================]  0s 44ms/step  loss: 0.2338  accuracy: 0.8756  val_loss: 0.5959  val_accuracy: 0.7551Epoch 80/1007/7 [==============================]  0s 44ms/step  loss: 0.2250  accuracy: 0.8964  val_loss: 0.6232  val_accuracy: 0.7551Epoch 81/1007/7 [==============================]  0s 55ms/step  loss: 0.2275  accuracy: 0.8912  val_loss: 0.6500  val_accuracy: 0.7551Epoch 82/1007/7 [==============================]  0s 44ms/step  loss: 0.2053  accuracy: 0.9016  val_loss: 0.6249  val_accuracy: 0.7347Epoch 83/1007/7 [==============================]  0s 45ms/step  loss: 0.2250  accuracy: 0.8964  val_loss: 0.6744  val_accuracy: 0.7347Epoch 84/1007/7 [==============================]  0s 44ms/step  loss: 0.2109  accuracy: 0.9067  val_loss: 0.7039  val_accuracy: 0.7347Epoch 85/1007/7 [==============================]  0s 45ms/step  loss: 0.2171  accuracy: 0.9016  val_loss: 0.6693  val_accuracy: 0.7347Epoch 86/1007/7 [==============================]  0s 44ms/step  loss: 0.2187  accuracy: 0.9067  val_loss: 0.6765  val_accuracy: 0.7143Epoch 87/1007/7 [==============================]  0s 45ms/step  loss: 0.2225  accuracy: 0.9067  val_loss: 0.6637  val_accuracy: 0.6939Epoch 88/1007/7 [==============================]  0s 44ms/step  loss: 0.2193  accuracy: 0.8808  val_loss: 0.7029  val_accuracy: 0.6735Epoch 89/1007/7 [==============================]  0s 45ms/step  loss: 0.2644  accuracy: 0.8653  val_loss: 0.6829  val_accuracy: 0.6939Epoch 90/1007/7 [==============================]  0s 55ms/step  loss: 0.2625  accuracy: 0.8601  val_loss: 0.6617  val_accuracy: 0.7347Epoch 91/1007/7 [==============================]  0s 45ms/step  loss: 0.2206  accuracy: 0.8860  val_loss: 0.6889  val_accuracy: 0.7551Epoch 92/1007/7 [==============================]  0s 55ms/step  loss: 0.2090  accuracy: 0.8964  val_loss: 0.7322  val_accuracy: 0.7347Epoch 93/1007/7 [==============================]  0s 45ms/step  loss: 0.2016  accuracy: 0.9119  val_loss: 0.7244  val_accuracy: 0.7347Epoch 94/1007/7 [==============================]  0s 55ms/step  loss: 0.1933  accuracy: 0.9067  val_loss: 0.6788  val_accuracy: 0.7347Epoch 95/1007/7 [==============================]  0s 45ms/step  loss: 0.2002  accuracy: 0.9171  val_loss: 0.6849  val_accuracy: 0.7143Epoch 96/1007/7 [==============================]  0s 55ms/step  loss: 0.2138  accuracy: 0.8964  val_loss: 0.7610  val_accuracy: 0.6939Epoch 97/1007/7 [==============================]  0s 45ms/step  loss: 0.2225  accuracy: 0.8912  val_loss: 0.6998  val_accuracy: 0.6939Epoch 98/1007/7 [==============================]  0s 55ms/step  loss: 0.2089  accuracy: 0.9067  val_loss: 0.6846  val_accuracy: 0.7143Epoch 99/1007/7 [==============================]  0s 44ms/step  loss: 0.2043  accuracy: 0.8964  val_loss: 0.7292  val_accuracy: 0.7347Epoch 100/1007/7 [==============================]  0s 55ms/step  loss: 0.2008  accuracy: 0.9016  val_loss: 0.7064  val_accuracy: 0.7143<tensorflow.python.keras.callbacks.History at 0x7f33184937b8>
1  loss, accuracy = model.evaluate(test_ds) 
2/2 [==============================]  1s 329ms/step  loss: 0.5511  accuracy: 0.8197Accuracy 0.8196721
]]>First, to perform the Extract process we use tfts.load. This handles everything from downloading the raw data to parsing and splitting it, giving us a dataset. Next, we perform the Transform process. In this simple example, our transform process will just consist of shuffling the dataset. Finally, we Load one record by using the take(1) method. In this case, each record consists of an image and its corresponding label. After loading the record we proceed to plot the image and print its corresponding label.
1  # EXTRACT 
1  import tensorflow as tf 
Before using the new S3 API, we must first find out whether the MNIST dataset implements the new S3 API. In the cell below we indicate that we want to use version 3.*.*
of the MNIST dataset.
1 

We can see that the code above printed True
, which means that version 3.*.*
of the MNIST dataset supports the new S3 API.
Now, let’s see how we can use the S3 API to download the MNIST dataset and specify the splits we want use. In the code below we download the train
and test
splits of the MNIST dataset and then we print their size. We will see that there are 60,000 records in the training set and 10,000 in the test set.
1  train_ds, test_ds = tfds.load('mnist:3.*.*', split=['train', 'test']) 
In the S3 API we can use strings to specify the slicing instructions. For example, in the cell below we will merge the training and test sets by passing the string ’train+test'
to the split
argument.
1  combined = tfds.load('mnist:3.*.*', split='train+test') 
We can also use Python style list slicers to specify the data we want. For example, we can specify that we want to take the first 10,000 records of the train
split with the string 'train[:10000]'
, as shown below:
1  first10k = tfds.load('mnist:3.*.*', split='train[:10000]') 
The S3 API, also allows us to specify the percentage of the data we want to use. For example, we can select the first 20\% of the training set with the string 'train[:20%]'
, as shown below:
1  first20p = tfds.load('mnist:3.*.*', split='train[:20%]') 
We can see that first20p
contains 12,000 records, which is indeed 20\% the total number of records in the training set. Recall that the training set contains 60,000 records.
Because the slices are stringbased we can use loops, like the ones shown below, to slice up the dataset and make some pretty complex splits. For example, the loops below create 10 complimentary validation and training sets (each loop returns a list with 5 data sets).
1  val_ds = tfds.load('mnist:3.*.*', split=['train[{}%:{}%]'.format(k, k+20) for k in range(0, 100, 20)]) 
The S3 API also allows us to compose new datasets by using pieces from different splits. For example, we can create a new dataset from the first 10\% of the test set and the last 80\% of the training set, as shown below.
1  composed_ds = tfds.load('mnist:3.*.*', split='test[:10%]+train[80%:]') 
1  import pandas as pd 
Pandas is a Python library with many helpful utilities for loading and working with structured data. We will use Pandas to download the dataset and load it into a dataframe.
1  filePath = f"{getcwd()}/../tmp2/heart.csv" 
age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  ca  thal  target  

0  63  1  1  145  233  1  2  150  0  2.3  3  0  fixed  0 
1  67  1  4  160  286  0  2  108  1  1.5  2  3  normal  1 
2  67  1  4  120  229  0  2  129  1  2.6  2  2  reversible  0 
3  37  1  3  130  250  0  0  187  0  3.5  3  0  normal  0 
4  41  0  2  130  204  0  2  172  0  1.4  1  0  normal  0 
The dataset we downloaded was a single CSV file. We will split this into train, validation, and test sets.
1  train, test = train_test_split(dataframe, test_size=0.2) 
193 train examples49 validation examples61 test examples
tf.data
Next, we will wrap the dataframes with tf.data. This will enable us to use feature columns as a bridge to map from the columns in the Pandas dataframe to features used to train the model. If we were working with a very large CSV file (so large that it does not fit into memory), we would use tf.data to read it from disk directly.
1  # EXERCISE: A utility method to create a tf.data dataset from a Pandas Dataframe. 
1  batch_size = 5 # A small batch sized is used for demonstration purposes 
Now that we have created the input pipeline, let’s call it to see the format of the data it returns. We have used a small batch size to keep the output readable.
1  for feature_batch, label_batch in train_ds.take(1): 
Every feature: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']A batch of ages: tf.Tensor([51 63 64 58 57], shape=(5,), dtype=int32)A batch of targets: tf.Tensor([0 1 0 0 0], shape=(5,), dtype=int64)
We can see that the dataset returns a dictionary of column names (from the dataframe) that map to column values from rows in the dataframe.
TensorFlow provides many types of feature columns. In this section, we will create several types of feature columns, and demonstrate how they transform a column from the dataframe.
1  # Try to demonstrate several types of feature columns by getting an example. 
1  # A utility method to create a feature column and to transform a batch of data. 
The output of a feature column becomes the input to the model (using the demo function defined above, we will be able to see exactly how each column from the dataframe is transformed). A numeric column is the simplest type of column. It is used to represent real valued features.
1  # EXERCISE: Create a numeric feature column out of 'age' and demo it. 
[[51.] [58.] [63.] [64.] [60.]]
In the heart disease dataset, most columns from the dataframe are numeric.
Often, you don’t want to feed a number directly into the model, but instead split its value into different categories based on numerical ranges. Consider raw data that represents a person’s age. Instead of representing age as a numeric column, we could split the age into several buckets using a bucketized column.
1  # EXERCISE: Create a bucketized feature column out of 'age' with 
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]
Notice the onehot values above describe which age range each row matches.
In this dataset, thal is represented as a string (e.g. ‘fixed’, ‘normal’, or ‘reversible’). We cannot feed strings directly to a model. Instead, we must first map them to numeric values. The categorical vocabulary columns provide a way to represent strings as a onehot vector (much like you have seen above with age buckets).
Note: You will probably see some warning messages when running some of the code cell below. These warnings have to do with software updates and should not cause any errors or prevent your code from running.
1  # EXERCISE: Create a categorical vocabulary column out of the 
[[0. 1. 0.] [0. 1. 0.] [0. 0. 1.] [0. 0. 1.] [0. 1. 0.]]
The vocabulary can be passed as a list using categorical_column_with_vocabulary_list, or loaded from a file using categorical_column_with_vocabulary_file.
Suppose instead of having just a few possible strings, we have thousands (or more) values per category. For a number of reasons, as the number of categories grow large, it becomes infeasible to train a neural network using onehot encodings. We can use an embedding column to overcome this limitation. Instead of representing the data as a onehot vector of many dimensions, an embedding column represents that data as a lowerdimensional, dense vector in which each cell can contain any number, not just 0 or 1. You can tune the size of the embedding with the dimension
parameter.
1  # EXERCISE: Create an embedding column out of the categorical 
[[1.4254066e01 1.0374661e01 3.4352791e01 3.3996427e01 3.2193713e02 1.8381193e01 1.8051244e01 3.2638407e01] [1.4254066e01 1.0374661e01 3.4352791e01 3.3996427e01 3.2193713e02 1.8381193e01 1.8051244e01 3.2638407e01] [6.5549983e05 2.7680036e01 4.1849682e01 5.3418136e01 1.6281548e01 2.5406811e01 8.8969752e02 1.8004593e01] [6.5549983e05 2.7680036e01 4.1849682e01 5.3418136e01 1.6281548e01 2.5406811e01 8.8969752e02 1.8004593e01] [1.4254066e01 1.0374661e01 3.4352791e01 3.3996427e01 3.2193713e02 1.8381193e01 1.8051244e01 3.2638407e01]]
Another way to represent a categorical column with a large number of values is to use a categorical_column_with_hash_bucket. This feature column calculates a hash value of the input, then selects one of the hash_bucket_size
buckets to encode a string. When using this column, you do not need to provide the vocabulary, and you can choose to make the number of hash buckets significantly smaller than the number of actual categories to save space.
1  # EXERCISE: Create a hashed feature column with 'thal' as the key and 
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]
Combining features into a single feature, better known as feature crosses, enables a model to learn separate weights for each combination of features. Here, we will create a new feature that is the cross of age and thal. Note that crossed_column
does not build the full table of all possible combinations (which could be very large). Instead, it is backed by a hashed_column
, so you can choose how large the table is.
1  # EXERCISE: Create a crossed column using the bucketized column (age_buckets), 
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]
We have seen how to use several types of feature columns. Now we will use them to train a model. The goal of this exercise is to show you the complete code needed to work with feature columns. We have selected a few columns to train our model below arbitrarily.
If your aim is to build an accurate model, try a larger dataset of your own, and think carefully about which features are the most meaningful to include, and how they should be represented.
1  dataframe.dtypes 
age int64sex int64cp int64trestbps int64chol int64fbs int64restecg int64thalach int64exang int64oldpeak float64slope int64ca int64thal objecttarget int64dtype: object
You can use the above list of column datatypes to map the appropriate feature column to every column in the dataframe.
1  # EXERCISE: Fill in the missing code below 
Now that we have defined our feature columns, we will use a DenseFeatures layer to input them to our Keras model.
1  # EXERCISE: Create a Keras DenseFeatures layer and pass the feature_columns you just created. 
Earlier, we used a small batch size to demonstrate how feature columns worked. We create a new input pipeline with a larger batch size.
1  batch_size = 32 
1  model = tf.keras.Sequential([ 
......7/7 [==============================]  0s 45ms/step  loss: 0.2225  accuracy: 0.8912  val_loss: 0.6998  val_accuracy: 0.6939Epoch 98/1007/7 [==============================]  0s 55ms/step  loss: 0.2089  accuracy: 0.9067  val_loss: 0.6846  val_accuracy: 0.7143Epoch 99/1007/7 [==============================]  0s 44ms/step  loss: 0.2043  accuracy: 0.8964  val_loss: 0.7292  val_accuracy: 0.7347Epoch 100/1007/7 [==============================]  0s 55ms/step  loss: 0.2008  accuracy: 0.9016  val_loss: 0.7064  val_accuracy: 0.7143<tensorflow.python.keras.callbacks.History at 0x7f33184937b8>
1  loss, accuracy = model.evaluate(test_ds) 
2/2 [==============================]  1s 329ms/step  loss: 0.5511  accuracy: 0.8197Accuracy 0.8196721
]]>We recommend crosscompiling the TensorFlow Raspbian package. Crosscompilation is using a different platform to build the package than deploy to. Instead of using the Raspberry Pi’s limited RAM and comparatively slow processor, it’s easier to build TensorFlow on a more powerful host machine running Linux, macOS, or Windows. You can see detailed instructions here.
To quickly start executing TensorFlow Lite models with Python, you can install just the TensorFlow Lite interpreter, instead of all TensorFlow packages.
This interpreteronly package is a fraction the size of the full TensorFlow package and includes the bare minimum code required to run inferences with TensorFlow Lite—it includes only the tf.lite.Interpreter
Python class. This small package is ideal when all you want to do is execute .tflite models and avoid wasting disk space with the large TensorFlow library.
To install just the interpreter, download the appropriate Python wheel for your system from the following link, and then install it with the pip install
command.
For example, if you’re setting up a Raspberry Pi Model B (using Raspbian Stretch, which has Python 3.5), install the Python wheel as follows (after you click to download the .whl
file in the provided link):
1  pip install tflite_runtime1.14.0cp35cp35mlinux_armv7l.whl 
So instead of importing Interpreter from the tensorflow module, you need to import it from tflite_runtime.
1  from tflite_runtime.interpreter import Interpreter 
In case you have built TensorFlow from source, you need to import the Interpreter as follows:
1  from tensorflow.lite.python.interpreter import Interpreter 
To get started, download the pretrained model along with its label file.
1  wget https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_1.0_224_quant_and_labels.zip 
To install the Python dependencies, run:
1  pip install numpy 
Next, to run the code on Raspberry Pi, use classify.py
as follows:
1  python3 classify.py filename dog.jpg model_path mobilenet_v1_1.0_224_quant.tflite label_path labels_mobilenet_quant_v1_224.txt 
classify.py
1 

1  # ATTENTION: Please do not alter any of the provided code in the exercise. Only add your own code where indicated 
This notebook uses the Fashion MNIST dataset which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:
Figure 1. FashionMNIST samples (by Zalando, MIT License). 
Fashion MNIST is intended as a dropin replacement for the classic MNIST dataset—often used as the “Hello, World” of machine learning programs for computer vision. The MNIST dataset contains images of handwritten digits (0, 1, 2, etc.) in a format identical to that of the articles of clothing we’ll use here.
This uses Fashion MNIST for variety, and because it’s a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected. They’re good starting points to test and debug code.
We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from TensorFlow. Import and load the Fashion MNIST data directly from TensorFlow:
1  # TensorFlow 
• Using TensorFlow Version: 2.0.0• GPU Device Found.
We will use TensorFlow Datasets to load the Fashion MNIST dataset.
1  splits = tfds.Split.ALL.subsplit(weighted=(80, 10, 10)) 
1  sample = next(iter(train_examples)) 
1  sample[0].shape 
TensorShape([28, 28, 1])
1  sample[1] 
<tf.Tensor: id=490, shape=(), dtype=int64, numpy=6>
The class names are not included with the dataset, so we will specify them here.
1  class_names = ['Tshirt_top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
1  # Create a labels.txt file with the class names 
1  # The images in the dataset are 28 by 28 pixels. 
1  # EXERCISE: Write a function to normalize the images. 
1  # Specify the batch size 
1  # Create Datasets 
1  batch_sample = next(iter(train_batches)) 
1  tf.squeeze(batch_sample[0][0]).numpy() 
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.16078432, 0.47843137, 0.3019608 , 0.10588235, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.46666667, 0.8509804 , 0.6666667 , 0.63529414, 0.11372549, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.13333334, 0.74509805, 0.98039216, 0.8117647 , 0.6156863 , 0.62352943, 0.87058824, 0.40392157, 0.01568628, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.4509804 , 0.99215686, 0.87058824, 0.7019608 , 0.57254905, 0.6666667 , 0.56078434, 0.49411765, 0.47058824, 0.11372549, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.3372549 , 0.8666667 , 0.76862746, 0.7019608 , 0.6901961 , 0.57254905, 0.33333334, 0.49411765, 0.6039216 , 0.24705882, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.40392157, 0.78431374, 0.65882355, 0.7137255 , 0.58431375, 0.39215687, 0.29411766, 0.78431374, 0.39215687, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.6039216 , 0.8 , 0.73333335, 0.78431374, 0.49019608, 0.3137255 , 0.38039216, 0.17254902, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.26666668, 0.9019608 , 0.8117647 , 0.8392157 , 0.49019608, 0.38039216, 0.37254903, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.7058824 , 0.58431375, 0.49019608, 0.4 , 0.4509804 , 0.23529412, 0. , 0.00392157, 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.3137255 , 0.7176471 , 0.42745098, 0.42352942, 0.38039216, 0.45882353, 0.15686275, 0. , 0. , 0.01176471, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.62352943, 0.7372549 , 0.44705883, 0.40392157, 0.38039216, 0.44705883, 0.23921569, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.85882354, 0.78039217, 0.44705883, 0.4117647 , 0.4 , 0.4117647 , 0.4117647 , 0.03529412, 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.67058825, 0.8666667 , 0.54509807, 0.48235294, 0.43529412, 0.4117647 , 0.4 , 0.1254902 , 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.37254903, 0.9137255 , 0.6039216 , 0.54509807, 0.5058824 , 0.53333336, 0.5372549 , 0.11372549, 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.4 , 0.89411765, 0.5254902 , 0.54509807, 0.5176471 , 0.57254905, 0.65882355, 0.16078432, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.53333336, 0.8901961 , 0.5254902 , 0.54509807, 0.5176471 , 0.57254905, 0.56078434, 0.16078432, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.64705884, 0.8784314 , 0.49411765, 0.56078434, 0.48235294, 0.6039216 , 0.6156863 , 0.14509805, 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.8156863 , 0.8784314 , 0.5058824 , 0.6156863 , 0.48235294, 0.6039216 , 0.6039216 , 0.16078432, 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.9372549 , 0.84705883, 0.49411765, 0.6784314 , 0.45882353, 0.68235296, 0.627451 , 0.21176471, 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.95686275, 0.8392157 , 0.49411765, 0.7176471 , 0.47058824, 0.68235296, 0.6156863 , 0.23529412, 0. , 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.06666667, 0.9019608 , 0.8392157 , 0.47058824, 0.7490196 , 0.49019608, 0.75686276, 0.6039216 , 0.27058825, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.14509805, 0.8352941 , 0.8784314 , 0.46666667, 0.78039217, 0.54901963, 0.78039217, 0.57254905, 0.30588236, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.22352941, 0.8235294 , 0.99215686, 0.47058824, 0.8 , 0.58431375, 0.7921569 , 0.54901963, 0.34901962, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.28235295, 0.75686276, 1. , 0.48235294, 0.76862746, 0.6156863 , 0.7921569 , 0.5568628 , 0.36078432, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.36862746, 0.73333335, 0.8901961 , 0.5058824 , 0.7254902 , 0.68235296, 0.77254903, 0.54509807, 0.4117647 , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.49019608, 0.6901961 , 0.90588236, 0.5372549 , 0.6392157 , 0.7058824 , 0.85882354, 0.56078434, 0.42745098, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.627451 , 0.78039217, 0.9137255 , 0.827451 , 0.6509804 , 0.91764706, 0.69411767, 0.6 , 0.48235294, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.04705882, 0.34901962, 0.24705882, 0.11764706, 0.18431373, 0.77254903, 0.29411766, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]], dtype=float32)
1  batch_sample[1][0].numpy().argmax() 
3
1  def show_batch(x,y,shape = None): 
1  show_batch(batch_sample[0],batch_sample[1], (4,4)) 
1  Model: "sequential" 
1  # EXERCISE: Build and compile the model shown in the previous cell. 
1  model.summary() 
Model: "sequential_1"_________________________________________________________________Layer (type) Output Shape Param # =================================================================conv2d_2 (Conv2D) (None, 26, 26, 16) 160 _________________________________________________________________max_pooling2d_1 (MaxPooling2 (None, 13, 13, 16) 0 _________________________________________________________________conv2d_3 (Conv2D) (None, 11, 11, 32) 4640 _________________________________________________________________flatten_1 (Flatten) (None, 3872) 0 _________________________________________________________________dense_2 (Dense) (None, 64) 247872 _________________________________________________________________dense_3 (Dense) (None, 10) 650 =================================================================Total params: 253,322Trainable params: 253,322Nontrainable params: 0_________________________________________________________________
1  history = model.fit(train_batches, epochs=10, validation_data=validation_batches) 
Epoch 1/10219/219 [==============================]  148s 675ms/step  loss: 0.5912  accuracy: 0.7919  val_loss: 0.0000e+00  val_accuracy: 0.0000e+00Epoch 2/10219/219 [==============================]  4s 20ms/step  loss: 0.3837  accuracy: 0.8648  val_loss: 0.3390  val_accuracy: 0.8796Epoch 3/10219/219 [==============================]  4s 20ms/step  loss: 0.3319  accuracy: 0.8819  val_loss: 0.3046  val_accuracy: 0.8914Epoch 4/10219/219 [==============================]  4s 20ms/step  loss: 0.3014  accuracy: 0.8925  val_loss: 0.2903  val_accuracy: 0.8957Epoch 5/10219/219 [==============================]  4s 20ms/step  loss: 0.2805  accuracy: 0.8993  val_loss: 0.2841  val_accuracy: 0.9011Epoch 6/10219/219 [==============================]  4s 20ms/step  loss: 0.2602  accuracy: 0.9054  val_loss: 0.2777  val_accuracy: 0.9009Epoch 7/10219/219 [==============================]  4s 20ms/step  loss: 0.2477  accuracy: 0.9101  val_loss: 0.2548  val_accuracy: 0.9091Epoch 8/10219/219 [==============================]  4s 20ms/step  loss: 0.2351  accuracy: 0.9144  val_loss: 0.2703  val_accuracy: 0.9000Epoch 9/10219/219 [==============================]  4s 20ms/step  loss: 0.2209  accuracy: 0.9198  val_loss: 0.2462  val_accuracy: 0.9126Epoch 10/10219/219 [==============================]  4s 20ms/step  loss: 0.2108  accuracy: 0.9243  val_loss: 0.2566  val_accuracy: 0.9089
You will now save the model to TFLite. We should note, that you will probably see some warning messages when running the code below. These warnings have to do with software updates and should not cause any errors or prevent your code from running.
1  # EXERCISE: Use the tf.saved_model API to save your model in the SavedModel format. 
WARNING:tensorflow:From /usr/local/lib/python3.6/distpackages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.Instructions for updating:If using Keras pass *_constraint arguments to layers.WARNING:tensorflow:From /usr/local/lib/python3.6/distpackages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.Instructions for updating:If using Keras pass *_constraint arguments to layers.INFO:tensorflow:Assets written to: saved_model/1/assetsINFO:tensorflow:Assets written to: saved_model/1/assets
1  # Select mode of optimization 
1  # EXERCISE: Use the TFLiteConverter SavedModel API to initialize the converter 
1  tflite_model_file = pathlib.Path('./model.tflite') 
258704
1  # Load TFLite model and allocate tensors. 
1  # Gather results for the randomly sampled test images 
1  class_names 
['Tshirt_top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
1  # Utilities functions for plotting 
1  # Visualize the outputs 
The simplest form of posttraining quantization quantizes weights from floating point to 8bits of precision. This technique is enabled as an option in the TensorFlow Lite converter. At inference, weights are converted from 8bits of precision to floating point and computed using floatingpoint kernels. This conversion is done once and cached to reduce latency.
To further improve latency, hybrid operators dynamically quantize activations to 8bits and perform computations with 8bit weights and activations. This optimization provides latencies close to fully fixedpoint inference. However, the outputs are still stored using floating point, so that the speedup with hybrid ops is less than a full fixedpoint computation.
1  converter.optimizations = [tf.lite.Optimize.DEFAULT] 
We can get further latency improvements, reductions in peak memory usage, and access to integer only hardware accelerators by making sure all model math is quantized. To do this, we need to measure the dynamic range of activations and inputs with a representative data set. You can simply create an input data generator and provide it to our converter.
1  def representative_data_gen(): 
The resulting model will be fully quantized but still take float input and output for convenience.
Ops that do not have quantized implementations will automatically be left in floating point. This allows conversion to occur smoothly but may restrict deployment to accelerators that support float.
To require the converter to only output integer operations, one can specify:
1  converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] 
1  let p = new Promise((resolve, reject) => { 
fulfilled
，或者调用reject，会将p的状态变成rejected
..then
.then
注册状态的回调首先，promise实例有三种状态：
fulfilled和rejected有可以说是已成功和已失败，这两种状态又归为已完成状态
resolve
和reject
调用resolve和reject能将分别将promise实例的状态变成fulfilled和rejected，只有状态变成已完成（即fulfilled和rejected之一），才能触发状态的回调
.then
用于为promise对象的状态注册回调函数。它会返回一个promise对象，所以可以进行链式调用，也就是.then后面可以继续.then。在注册的状态回调函数中，可以通过return
语句改变.then返回的promise对象的状态，以及向后面.then注册的状态回调传递数据；也可以不使用return语句，那样默认就是将返回的promise对象resolve。.catch
用于注册rejected状态的回调函数，同时该回调也是程序出错的回调，即如果前面的程序运行过程中出错，也会进入执行该回调函数。同.then一样，也会返回新的promise对象。调用Promise.resolve会返回一个状态为fulfilled状态的promise对象，参数会作为数据传递给后面的状态回调函数; Promise.reject与Promise.resolve同理，区别在于返回的promise对象状态为rejected
1  Promise.resolve(1); 
题目：红灯三秒亮一次，绿灯一秒亮一次，黄灯2秒亮一次；如何让三个灯不断交替重复亮灯？
1  function red(){ 
async关键字的意思很简单，就是函数返回的是一个promise
1  async function f() { 
async函数会返回一个promise对象，如果function中返回的是一个值，async直接会用Promise.resolve()
包裹一下返回。
[return_value] = await expression;
关键词await是等待的意思，那么他在等什么呢？ 等的是一个表达式，那么表达式，可以是一个常量，变量，promise，函数等。
await操作符等的是一个返回的结果，那么如果是同步的情况，那就直接返回了。 异步的情况下，await会阻塞整一个流程，直到结果返回之后，才会继续下面的代码。
1  function getProvinces () { 
上面代码先定义了一个获取省份数据的getProvinces函数，其中用setTimeout模拟数据请求的异步操作。当我们在asyncFn 函数前面使用关键字 async 就表明该函数内存在异步操作。当遇到 await 关键字时，会等待异步操作完成后再接着执行接下去的代码。所以代码的执行结果为等待1000毫秒之后才会在控制台中打印出 ‘hello async’。
An operating system provide applications with access to the underlying hardware by exporting a number of services. Thses services are directly linked to some of the components of the hardware.
Which of the following are likely components of an operating system?
A:
file system (hides hardware complexity), device driver (makes decisions), scheduler (distributes processes).
For the following options, indicate if they are examples of abstraction or arbitration.
A:
On a 64bit Linuxbased OS, which system call is used to:
A:
The answers are respectively,
kill
setgid
mount
sysctl
We will build what we will call a search tree. The root of the tree is the start state , and the leaves are the end states (IsEnd(s) is true). Each edge leaving a node s corresponds to a possible action that could be performed in state s. The edge is labeled with the action and its cost, written a : Cost(s,a)
. The action leads deterministically to the successor state Succ(s,a)
, represented by the child node.
In summary, each roottoleaf path represents a possible action sequence, and the sum of the costs of the edges is the cost of that path. The goal is to find the roottoleaf path that ends in a valid end state with minimum cost.
Note that in code, we usually do not build the search tree as a concrete data structure. The search tree is used merely to visualize the computation of the search algorithms and study the structure of the search problem.
1  Street with blocks numbered 1 to n. 
1  class TransportationProblem(object): 
Now let’s put modeling aside and suppose we are handed a search problem. How do we construct an algorithm for finding a minimum cost path (not necessarily unique)?
If b
actions per state, maximum depth is D
actions:
We will start with backtracking search, the simplest algorithm which just tries all paths. The algorithm is called recursively on the current state s
and the path leading up to that state. If we have reached a goal, then we can update the minimum cost path with the current path. Otherwise, we consider all possible actions a from state s
, and recursively search each of the possibilities.
Graphically, backtracking search performs a depthfirst traversal of the search tree. What is the time and memory complexity of this algorithm? To get a simple characterization, assume that the search tree has maximum depth D (each path consists of D actions/edges) and that there are b available actions per state (the branching factor is b). It is easy to see that backtracking search only requires O(D) memory (to maintain the stack for the recurrence), which is as good as it gets.
However, the running time is proportional to the number of nodes in the tree, since the algorithm needs to check each of them. The number of nodes is
Note that the total number of nodes in the search tree is on the same order as the number of leaves, so the cost is always dominated by the last level.
Backtracking search will always work (i.e., find a minimum cost path), but there are cases where we can do it faster. But in order to do that, we need some additional assumptions — there is no free lunch. Suppose we make the assumption that all the action costs are zero. In other words, all we care about is finding a valid action sequence that reaches the goal. Any such sequence will have the minimum cost: zero.
In this case, we can just modify backtracking search to not keep track of costs and then stop searching as soon as we reach a goal. The resulting algorithm is depthfirst search (DFS), which should be familiar to you. The worst time and space complexity are of the same order as backtracking search. In particular, if there is no path to an end state, then we have to search the entire tree.
Breadthfirst search (BFS), which should also be familiar, makes a less stringent assumption, that all the action costs are the same nonnegative number. This effectively means that all the paths of a given length have the same cost. BFS maintains a queue
of states to be explored. It pops a state off the queue, then pushes its successors back on the queue.
1  def backtrackingSearch(problem): 
backtracking search with memoization — potentially exponential savings
1  def dynamicProgramming(problem): 
The general strategy of UCS is to maintain three sets of nodes: explored, frontier, and unexplored. Throughout the course of the algorithm, we will move states from unexplored to frontier, and from frontier to explored. The key invariant is that we have computed the minimum cost paths to all the nodes in the explored set. So when the end state moves into the explored set, then we are done.
1  def uniformCostSearch(problem): 