Sequence Modeling : Recurrent & Recursive Nets as introduction

MACHINE LEARNING/Artificial Neural Network

Sequence Modeling : Recurrent & Recursive Nets as introduction

24_bean 2023. 1. 15. 19:45

Keyword

Parameter sharing
Sequence
Back-propagation through time (BPTT)

* this post is based on "Deep Learning" by Ian goodfellow with my own opinion.

Intro

a recurrent nueral network is a neural network that is speialized for processing a seqeunce of values x(1),x(2)...x(i).

Parameter sharing makes it possible to extend and apply the model to examples of different forms(different lengths..) and generalize across them. If we had separate parameters for each value of the time index, we could not generalize to sequence lengths not seen during training, nor share statistical strength across different sequence lengths and across different positions in time.

Such sharing is particularly important when a specific piece of information can occur at multiple positions within the seqeunce.

A traditional fully connceted feedforward network would have separate parameters for each input features, so it would need to learn all of the rules of the language separately at each position in the sentence. By comparison, a recurrent neural network shares the same weights across several time steps.

The convolution operation allows a network to share parameters across time, but is shallow. The idea of parameter sharing manifests in the application of the same convolution kernel at each time step.

This recurrent formulation results in the sharing of parameters through a very deep computational graph.

In practice, recurrent networks usually operate on minibatches of such sequences, with a difference sequence length for each member of the minibatch.

Unfolding Computational Graphs

Unfolding a recursive or recurrent computation into a computational graph that has a repetitive structure, typically corresponding to a chain of events.

So basically, "unfolding" stands for unfold the computational graph literally. To indicate that the state is the hidden units of the networks, let's check the equation out.

Below is the basic formal equation for a dynamic system.

This would be the typical RNN's formation

As could illustrated in figure, typcial RNNs will add extra architectural features such as output layers that read information out of the state h to make predictions.

When the recurrent network is trained to perform a task that requires predicting the future from past, the network typically learns to use h(t) as a kind of lossy summary of the task-relevant aspects of the past sequence of inputs up to t.

* h(t) would be the fixed vector

A recurrent network with no outputs. This recurrent network just processes information from the input x by incorporating it into the state h that is passed forward through time.

What we call unfolding is the operation that maps a circuit as in the left side of the figure to a computational graph with repeated pieces as in the right side. The unfolded graph now has a size that depends on the sequence length.

So now, We can represent the unfolded recurrence after t steps with a function g(t) as

The function g(t) takes the whole past sequence x(i) as input and produces the current state, but the unfolded recurrent structure allows us to factorize g(t) into repeated application of a function f. The unfolding process thus introduces two major advantages

Regardless of the sequence length, the learned model always has the same input size, because it is specified in terms of transition from one state to another state, rather than specified in terms of a variable-length history of states.
It is possible to use the same transition function f with the same parameters at every time step.

This two factors make it possible to learn a single model f that operates on all time steps an all sequence lengths, rather than needing to learn a separate model for all possible time steps. also the shared model allows generalization to sequence lengths that did not appear in the training set, and allows the model to be estimated with far fewer training examples than would be required without parameter sharing.

* The content will be coming up soon...

'MACHINE LEARNING > Artificial Neural Network' 카테고리의 다른 글

Deep Learning Applications / 활용 (0)	2023.01.30
Sequence Modeling : Recurrent & Recursive Nets as RNN (0)	2023.01.16
Attention / 어텐션이란 무엇인가? (분량 주의) (0)	2023.01.09
Few-Shot Learning? 관련 논문을 중심으로 이해해보자! (0)	2022.12.27
BART 논문 리뷰 / BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (0)	2022.11.28

현재글Sequence Modeling : Recurrent & Recursive Nets as introduction

RNN, attention, Python, 베이지안, 대규모언어모델, Chatbot, HuggingFace, pytorch, LLM, 어텐션, 통계, 백준, SEQUENCE, NLP, Time Series classification, 딥러닝, 파이썬, 시계열, opencv, 알고리즘,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

청춘