Learning In RNN Part II

Learning Tager: Make the loss be minimize that evaluating by cost function.

¶Unfortunately

RNN-based network is not always easy to learn.

Th error surface is rought

¶Helpful Techniques

¶Long Short-term Memory(LSTM)

¶Why replace RNN to LSTM?

Can deal with gradient vanishing(消灭，等于0) (not gradient explode爆炸激增)
It can make your error surface to be flatting nor not steep.
The specify performance is that it can remove the flat regions and solve the problem of gradient vanishing, but not gradient explode.

¶How to work:

The different operation between RNN and LSTM is that RNN can reomve value in memory after each computation and store new value. But LSTM can add the previous value to new value in cell memory after each computation.(Concretely depend on the value of forget gate)
So the difference of RNN and LSTM is if a weight influence value of memory, the influence never disappears unless forget gate is closed.
If forget gate is opened, there no gradient vanishing.

¶Summarization

can deal with gradient vanishing(not gradient explode)
Memory and input are added
The influence never disappears
unless forget gate is closed
No Gradient vanishing(If forget gate is opened)

Gated Recurrent Unit(GRU):simpier thant LSTM

Other helpful techniques:

¶More Applications

¶Many to one

Input is a vector sequence, but output is only one vector.

Sentiment Analysis:(意见分析)

¶Many to Many (Output is shorter)

Both input and output are both sequences, but the output is shorter.

Speech Recognition:

¶How to differentiate?

Connectionist Temporal Classification(CTC，联结主义时间分类)

==Add an extra symbol “Φ” representing “null”.==

Use this method to slove the problem like differentiate “好棒” or “好棒棒”.

¶CTC Training

Acoustic Features:(声音特征)
ALL possible alignments(序列/顺序) are considered as correct because we don’t know what alignment is correct. So we can list all alignments to train.

¶CTC: example

¶Many to Many (No Limitation)

Both input and output are both sequences with differnet lengths. ➡ Sequence to sequence learning
Machine Translate(Machine Learning ➡ 机器学习)

bag-of-word:

Above model can’t stop until it’s interrupted.

¶How to make the network stop

Adda a symbol ‘===’(断)

¶Beyond Sequence

Syntactic parsing(句法分析)

¶Transform Tree Structure to sequence

Conversion principle:

We can transform sentence tree to sequence by using this principle and train a sequence model to recognize sentence.

¶Sequence-to-sequence

¶Auto-encoder-Text

To understand the meaning of a word sequence, the order of the words can not be ignored.

¶Auto-encoder-Speech

Dimension reduction for a sequence with variable length

audio segments()word-level->Fixed-length vector

¶Audio Search Principle:

¶How to transform audio segment to vector

ps: jointly 共同地同时地 similarity 相似类似 embedding 埋入/埋葬

¶Visualizing embedding vectors of the words

¶Sequence-to-sequence Learning Demo:Chat-bot

Learning Principle:

Data Set:
40000 sentences in Movie album and discussion of presidential election in American.

¶Attention-based Model

Structure Version 1:

Structure Version 2:
==Neural Turing Machine(神经图灵机)==

¶Reading Comprehension

¶Visual Question Answering

Principle:
==A vector for each region==

¶Speech Question Answering

TOEFL Listening Comprehension Test By Machine

Example:

Audio Story: the original story is 5 min long
Question: “what is possible of Venus’ clouds?”
Choices:
1. gased released as a result of volcanic activity
2. chemical reactions caused by high surface temperatures
3. bursts of radio energy from the plane’s surface
4. strong winds that blow dust into the atmosphere