pip install tensorflow --user
1 Tensorflow Softmax
J(\boldsymbol{W}) &= CE(\boldsymbol{y}, softmax(\boldsymbol{xW}))
a softmax
def softmax(x): """ Compute the softmax function in tensorflow. You might find the tensorflow functions tf.exp, tf.reduce_max, tf.reduce_sum, tf.expand_dims useful. (Many solutions are possible, so you may not need to use all of these functions). Recall also that many common tensorflow operations are sugared (e.g. x * y does a tensor multiplication if x and y are both tensors). Make sure to implement the numerical stability fixes as in the previous homework! Args: x: tf.Tensor with shape (n_samples, n_features). Note feature vectors are represented by row-vectors. (For simplicity, no need to handle 1-d input as in the previous homework) Returns: out: tf.Tensor with shape (n_sample, n_features). You need to construct this tensor in this problem. """ ### YOUR CODE HERE x_max = tf.reduce_max(x,1,keep_dims=True) # find row-wise maximums x_sub = tf.sub(x,x_max) # subtract maximums x_exp = tf.exp(x_sub) # exponentiation sum_exp = tf.reduce_sum(x_exp,1,keep_dims=True) # row-wise sums out = tf.div(x_exp,sum_exp) # divide ### END YOUR CODE return out
b 交叉熵
def cross_entropy_loss(y, yhat): """ Compute the cross entropy loss in tensorflow. The loss should be summed over the current minibatch. y is a one-hot tensor of shape (n_samples, n_classes) and yhat is a tensor of shape (n_samples, n_classes). y should be of dtype tf.int32, and yhat should be of dtype tf.float32. The functions tf.to_float, tf.reduce_sum, and tf.log might prove useful. (Many solutions are possible, so you may not need to use all of these functions). Note: You are NOT allowed to use the tensorflow built-in cross-entropy functions. Args: y: tf.Tensor with shape (n_samples, n_classes). One-hot encoded. yhat: tf.Tensorwith shape (n_sample, n_classes). Each row encodes a probability distribution and should sum to 1. Returns: out: tf.Tensor with shape (1,) (Scalar output). You need to construct this tensor in the problem. """ ### YOUR CODE HERE l_yhat = tf.log(yhat) # log yhat product = tf.mul(tf.to_float(y), l_yhat) # multiply element-wise out = tf.neg(tf.reduce_sum(product)) # negative summation to scalar ### END YOUR CODE return out
c Placeholders & Feed Dictionaries
d Softmax & CE Loss
def add_prediction_op(self): """Adds the core transformation for this model which transforms a batch of input data into a batch of predictions. In this case, the transformation is a linear layer plus a softmax transformation: y = softmax(Wx + b) Hint: Make sure to create tf.Variables as needed. Hint: For this simple use-case, it's sufficient to initialize both weights W and biases b with zeros. Args: input_data: A tensor of shape (batch_size, n_features). Returns: pred: A tensor of shape (batch_size, n_classes) """ ### YOUR CODE HERE with tf.variable_scope("transformation"): bias = tf.Variable(tf.random_uniform([self.config.n_classes])) W = tf.Variable(tf.random_uniform([self.config.n_features, self.config.n_classes])) z = tf.matmul(self.input_placeholder, W) + bias pred = softmax(z) ### END YOUR CODE return pred def add_loss_op(self, pred): """Adds cross_entropy_loss ops to the computational graph. Hint: Use the cross_entropy_loss function we defined. This should be a very short function. Args: pred: A tensor of shape (batch_size, n_classes) Returns: loss: A 0-d tensor (scalar) """ ### YOUR CODE HERE loss = cross_entropy_loss(self.labels_placeholder, pred) ### END YOUR CODE return loss
e Training Optimizer
def add_training_op(self, loss): """Sets up the training Ops. Creates an optimizer and applies the gradients to all trainable variables. The Op returned by this function is what must be passed to the `sess.run()` call to cause the model to train. See https://www.tensorflow.org/versions/r0.7/api_docs/python/train.html#Optimizer for more information. Hint: Use tf.train.GradientDescentOptimizer to get an optimizer object. Calling optimizer.minimize() will return a train_op object. Args: loss: Loss tensor, from cross_entropy_loss. Returns: train_op: The Op for training. """ ### YOUR CODE HERE train_op = tf.train.GradientDescentOptimizer(self.config.lr).minimize(loss) ### END YOUR CODE return train_op
Epoch 47: loss = 0.45 (0.007 sec) Epoch 48: loss = 0.44 (0.007 sec) Epoch 49: loss = 0.43 (0.007 sec) Basic (non-exhaustive) classifier tests pass
2g(i)中:(以下来自斯坦福官网给出的Solution):Adam使用动量,可以防止梯度更新过快。一是为了在陷入局部最优的时候梯度不为零, 仍然可以逃离局部最优。二是可以让每一次的梯度估计都更加接近数据集整体的梯度。
我想问下,为什么笔记note里面的困惑度perplexity = 2^J,和作业里面定义的困惑度不一样??
2.b, 每个词进去再出来,应该是2n步,可以数数上面5个词是10步。
每次代码都得参考您的= =~~很感谢!
我之前是看的一个网友总结的,= =反正我看完,做出来的结果就不对,还觉得正确的解法不符合定理。
不是很明白 3.d 关于反向传播计算复杂度的计算,是dJ/dL, dJ/dI, dJ/dH三个计算复杂度的和么? 对于softmax只在最后做一次来讲,传播 t 次是否应该是前两项乘上 t,而最后一项只算一次呢?
1. 是的
2. 有两种interpretation,见上文补充。
d Minibatch Parsing部分,
代码27行minibatch = [parse for parse in partial_parses if len(parse.stack) > 1 or len(parse.buffer) > 0]