Learning In RNN
Recurrent Neural Network(RNN)
The chinese name is 循环神经网络.
¶The introduction of RNN by Dr.Mofan Zhou
If you want some dramatic explanations, please check here.What is Recurrent Neural NetWorks?
ps:If you don’t have SSR, you can find the video in BiliBili.
For the input x(t), we can compute the Y(t) by RNN.
Then we can call the value S(t) that RNN compute.
Next RNN will compute the value in X(t+1) and get S(t+1). After that, Y(t+1) is equal S(t) add S(t+1).
In generally, RNN is look like this:
RNN has many structures and can do many things, like classification、regression and so on.
¶The introduction by Dr.Hongyi Li
The original video: check here
¶Example Application
¶Slot Filling(填槽)
In many aspects in the life, we can use the slot filling, just like: intelligence customer service、ticket book system and so on.
In this picture, we need the ticket book system can automatically slot the text in suitable loaction.
¶Solving slot filling by Feedforward network?
Net structure:
input: a ward (Each word is represented as a vector)
output: Probability distribution(概率分布) that the input word belonging to the slots.
If we can make our neural network have the ==memory==, so the network will classify if it is a destination or departure.
==The way of make a word be represented as a vector==
1-of-N encoding
If we have n words, we can create a vector of length n. And each word will has a value equals one in different index.
Beyond 1-of-N encoding
If we have a word that not in our dictionary, we will classify it as other, So we must add a dimension for “other”.
Word hashing
We can also use the word hashing to represent the word. We can make a letter vetcor.
¶Recurrent Neural NetWork
The output of hidden layer are stored in the memory.
¶Example
Presumed:
ALL the weights are “1”, no bias.
All the activation functions are lienar.
Input sequence: $ \left[ \begin{matrix}1\\1 \end{matrix} \right] \left[ \begin{matrix}1\\ 1 \end{matrix} \right] \left[ \begin{matrix}2 \\ 2 \end{matrix} \right] \dots \dots$
First, we must give the initial values of a. Default: 0
Step1:
Input 1 and 1, compute 0 and 1 will get the 2 and 2. And compute 2 and 2 will get 4 and 4. The first output will be 4 and 4, and the memory are 2 and 2.
Step2:
Input 1 and 1, compute 1 and 2 will get 6 and 6, And compute 6 and 6 will get 12 and 12. The second output will be 12 and 12, and the memory are 6 and 6.
Step3:
Input 2 and 2, compute 2 and 6 will get 16 and 16, And compute 16 and 16 will get 32 and 32. The second output will be 32 and 32, and the memory are 16 and 16.
¶RNN
¶Ticket Book System apply in RNN
Step1:
In the figure, arrive is inputed as x1, compute a1 by RNN and store it.
Step2:
In this figure, Taipei is inputed as x2, use x2 and a1 to compute a2 and store it.
Step3:
In this figure, on is inputed as x3, use x3 and a2 to compute a 3 and store it.
So on
==Attention:== we are not have only one hidden layer. It can be so many.
¶Elman Network & Jordan Network
Elman Network we have talked above, and the Jordan Network is the new one we want to introduce.
Actually, the differnece between Jordan Network and Elman Network is that Jordan Network make the output to compute with next step computation.
¶Bidirectional(双向的) RNN
由上图所示,我们可以同时训练一个正向的循环神经网络,又可以训练一个逆向的神经网络,然后在将他们结果都输入到一个新的隐藏层进行计算产生$ y^{t}\:\:\:y^{t+1}\:\:\:y^{y+2} \dots $
.
¶Long Short-term Memory(LSTM)
The LSTM has three gates, like input gate, forget gate, output gate.
For all gates, they have own neural network that can learning by themselves to decide how to control the Gate.
If the output of other part of the network want to input the memory cell, it must pass the input gate. And input gate also be controled by other part of the network. This action can study by itself, it can decide when to open the input gate.
By the way, the output gate and forget gate also have similar structure.
Sepcial Neuron(特殊神经元):
- 4 inputs
- 1 output
4 input:
- Other part of the network of Input Gate
- Signal control the input gate
- Signal control the output gate
- Signal control the forget gate
1 output:
- Other part of the network of Output Gate
The LSTM can remember many memory cells in long time.
¶Other expression
==Activation funcion f is usually a sigmoid function.==
The sigmoid function can support between 0 and 1. 0 stand for gate closed, 1 symbolic gate opened. Mimic open and close gate.
$ g(z)f(z_{i}) = g(z)\:multiply\: f(z_{i}) $
$ c\:multiply\:f(z_{f}) = cf(z_{f})$
$ c^{'}=g(z)g(z_{i})+cf(z_{f}) $
$ a = h(c')\:multiply\:f(z_{0}) $
¶LSTM - Example
¶Init:
- one LSTM cell Memory
- 3 dimension input
- 1 dimension output
¶Condition:
- When
$ x_{2}=1$
, add the numbers of$ x_{1} $
into the memory - When
$ x_{2}=-1$
, reset the memory - When
$ x_{3}=1 $
, output the number in the memory.
¶Process:
step1:
cell memory=0, $ x_{1}=1\:\:x_{2}=0\:\:x_{3}=0$
step2:
cell memory=0,$ x_{1}=3\:\:x_{2}=1\:\:x_{3}=0$
, because of $ x_{2}=1 $
, add the numbers of $ x_{1} $
into the cell memory.
step3:
cell memory=0, $ x_{1}=2\:\:x_{2}=0\:\:x_{3}=0$
, no action
step4:
cell memory=3,$ x_{1}=4\:\:x_{2}=1\:\:x_{3}=0$
,add the number of $ x_{1}=4 $
into the memory(3+4=7)
step5:
cell memory=7,$ x_{1}=2\:\:x_{2}=0\:\:x_{3}=0$
, no action
step6:
cell memory=7,$ x_{1}=1\:\:x_{2}=0\:\:x_{3}=1$
, $ x_{3}=1 $
output the number in the memory, y=7
step7:
cell memroy=7,$ x_{1}=3\:\:x_{2}=-1\:\:x_{3}=0$
, $ x_{2}=-1 $
reset the memory, make the cell memory equal 0
step8:
cell memmory=0,$ x_{1}=6\:\:x_{2}=1\:\:x_{3}=0$
, $ x_{2}=1$
add the number of $ x_{1}$
into the memory, cell memory=6(0+6=6)
step9:
cell mempry=6,$ x_{1}=1\:\:x_{2}=0\:\:x_{3}=1$
, $ x_{3}=1 $
output the number in the memory, y=6
¶Other Example
¶Init
- four input
- one output
- cell memory=0
- every number of line represent weight
¶Process
step1:
for $ x_{1}=3,x_{2}=1,x_{3}=0$
Input=3×1+1×0+0×0+1×0=3
Input Gate=3×0+1×100+0×0+1×(-10)=90≈1
Forget Gate=3×0+1×100+0×0+1×10=110≈1
Output Gate=3×0+1×0+0×100+1×(-10)=-10≈0
Because outputgate=0 close the output gate, so y=0. And forget gate=1 add the number to cell memory, so cell memory=3.
step2:
for $ x_{1}=4,x_{2}=1,x_{3}=0$
Input=4×1+1×0+0×1+1×0=4
Input Gate=4×0+1×100+0×0+1×(-10)=90≈1
Forget Gate=4×0+100×1+0×0+10×1=110≈1
Output Gate=4×0+1×0+100×0+1×(-10)=-10≈0
Because outputgate=0 close the output gate, so y=0. And forget gate=1 add the number to cell memory, so cell memory=7.
step3:
for $ x_{1}=2,x_{2}=0,x_{3}=0$
Input=2×1+0×0+0×1+1×0=2
Input Gate=2×0+0×100+0×0+1×(-10)=-10≈0
Forget Gate=2×0+100×0+0×0+10×1=10≈1
Output Gate=2×0+0×0+100×0+1×(-10)=-10≈0
Because inputgate gate=0 close the input gate,so input equal 0, outputgate=0 close the output gate, so y=0. And forget gate=1 add the number to cell memory, so cell memory=7.
step4:
for $ x_{1}=1,x_{2}=0,x_{3}=1$
Input=1×1+0×0+1×0+1×0=1
Input Gate=1×0+0×100+1×0+1×(-10)=-10≈0
Forget Gate=1×0+100×0+1×0+10×1=10≈1
Output Gate=1×0+0×0+100×1+1×(-10)=90≈1
Because inputgate gate=0 close the input gate,so input equal 0, outputgate=1 open the output gate, so y=7. And forget gate=1 add the number to cell memory, so cell memory=7.
¶Original NetWork
¶Simply replace the neurons with LSTM
replace neurons:
enter each group of x into the corresponding gate
4 time of parameters
¶Comprehend cell memory deeply
Let $ x^{t} $
multiply a vector to get $ z^{f}\:\:z^{i}\:\:z\:\:z^{o}$
.
Each of them($ z^{f}\:\:z^{i}\:\:z\:\:z^{o}$
) to be used as input vector.
¶Simplify Structure
¶Multiple-layer LSTM
This is quite standard now, and don’t worry if you cannot understand this, Keras can handle it.(Kearas supports “LSTM”,“GRU”,“SimpleRNN” layers)