Learning In RNN Part II
Learning Tager: Make the loss be minimize that evaluating by cost function.
¶Unfortunately
- RNN-based network is not always easy to learn.
- Th error surface is rought
¶Helpful Techniques
¶Long Short-term Memory(LSTM)
¶Why replace RNN to LSTM?
Can deal with gradient vanishing(消灭,等于0) (not gradient explode爆炸激增)
It can make your error surface to be flatting nor not steep.
The specify performance is that it can remove the flat regions and solve the problem of gradient vanishing, but not gradient explode.
¶How to work:
The different operation between RNN and LSTM is that RNN can reomve value in memory after each computation and store new value. But LSTM can add the previous value to new value in cell memory after each computation.(Concretely depend on the value of forget gate)
So the difference of RNN and LSTM is if a weight influence value of memory, the influence never disappears unless forget gate is closed.
If forget gate is opened, there no gradient vanishing.
¶Summarization
- can deal with gradient vanishing(not gradient explode)
- Memory and input are added
- The influence never disappears
unless forget gate is closed - No Gradient vanishing(If forget gate is opened)
Gated Recurrent Unit(GRU):simpier thant LSTM
Other helpful techniques:
¶More Applications
¶Many to one
- Input is a vector sequence, but output is only one vector.
Sentiment Analysis:(意见分析)
¶Many to Many (Output is shorter)
- Both input and output are both sequences, but the output is shorter.
Speech Recognition:
¶How to differentiate?
- Connectionist Temporal Classification(CTC,联结主义时间分类)
==Add an extra symbol “Φ” representing “null”.==
Use this method to slove the problem like differentiate “好棒” or “好棒棒”.
¶CTC Training
Acoustic Features:(声音特征)
ALL possible alignments(序列/顺序) are considered as correct because we don’t know what alignment is correct. So we can list all alignments to train.
¶CTC: example
¶Many to Many (No Limitation)
- Both input and output are both sequences with differnet lengths. ➡ Sequence to sequence learning
Machine Translate(Machine Learning ➡ 机器学习)
bag-of-word:
Above model can’t stop until it’s interrupted.
¶How to make the network stop
- Adda a symbol ‘===’(断)
¶Beyond Sequence
- Syntactic parsing(句法分析)
¶Transform Tree Structure to sequence
Conversion principle:
We can transform sentence tree to sequence by using this principle and train a sequence model to recognize sentence.
¶Sequence-to-sequence
¶Auto-encoder-Text
- To understand the meaning of a word sequence, the order of the words can not be ignored.
¶Auto-encoder-Speech
- Dimension reduction for a sequence with variable length
audio segments()word-level->Fixed-length vector
¶Audio Search Principle:
¶How to transform audio segment to vector
ps: jointly 共同地 同时地 similarity 相似 类似 embedding 埋入/埋葬
¶Visualizing embedding vectors of the words
¶Sequence-to-sequence Learning Demo:Chat-bot
Learning Principle:
Data Set:
40000 sentences in Movie album and discussion of presidential election in American.
¶Attention-based Model
Structure Version 1:
Structure Version 2:
==Neural Turing Machine(神经图灵机)==
¶Reading Comprehension
¶Visual Question Answering
Principle:
==A vector for each region==
¶Speech Question Answering
- TOEFL Listening Comprehension Test By Machine
Example:
- Audio Story: the original story is 5 min long
- Question: “what is possible of Venus’ clouds?”
- Choices:
- gased released as a result of volcanic activity
- chemical reactions caused by high surface temperatures
- bursts of radio energy from the plane’s surface
- strong winds that blow dust into the atmosphere
¶Model Architecture
Everything is learned from training examples.
¶Deep & Structure
¶Integrated together
- Speech Recognition: CNN/LSTM/DNN+HMM
Bayes theorem
- Sematic Tagging: Bi-directional LSTM+CRF/Structured SVM
Testing: