• 검색 결과가 없습니다.

Large Scale Data Analysis Using Deep Learning

N/A
N/A
Protected

Academic year: 2024

Share "Large Scale Data Analysis Using Deep Learning"

Copied!
17
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Large Scale Data Analysis Using Deep Learning

Autoencoder U Kang

Seoul National University

(2)

U Kang 2

In This Lecture

Autoencoder

Motivation

Undercomplete and overcomplet autoencoders

Regularization

(3)

Autoencoder

A neural network that is trained to attempt to copy its input to output

Has a hidden layer h that describes a code used to represent the input

Consists of two parts: an encoder function h = f(x), and a decoder function r = g(h) that reconstructs the original data

The most simplest function would be an identity function for g and h; however, they are not useful to find important

features of x

Autoencoders are restricted to copy only approximately, and to copy only input that resembles the training data

This often leads to learn useful properties of data

Can be thought of as a dimensionality reduction

(4)

U Kang 4

Autoencoder

(5)

Structure of an Autoencoder

(6)

U Kang 6

Stochastic Autoencoders

(7)

Undercomplete Autoencoders

Copying the input to the output seems useless

We are not typically interested in the output of the decoder; we hope that training the autoencoder to perform the copying task will result in h taking on useful properties

Undercomplete autoencoder

h has smaller dimension than x; this allows to learn the most salient features of the data distribution

Learning process: minimizing a loss function L(x, g(f(x))

When the decoder is linear and L is the mean square error, an

undercomplete autoencoder learns to span the same subspace as PCA

Autoencoders with nonlinear encoder and decoder functions learn a more powerful nonlinear generalization of PCA

Undercomplete autoencoders fail to learn anything useful if the encoder and decoder are given too much capacity: it can learn to perform the

copying task without extracting useful information about the distribution of the data

(8)

U Kang 8

Regularized Autoencoders

Undercomplete autoencoders fail to learn anything useful if the encoder and decoder are given too much capacity

A similar problem occurs if the hidden code is allowed to have dimension equal to the input

Overcomplete case: hidden code has dimension greater than the input

In these cases, authoencoder can learn to copy input to output, without learning anything useful

Regularized autoencoder: rather than limiting the model capacity (shallow encoder/decoder, and small code size), use a loss

function that encourages the model to learn useful features

Sparse autoencoders

Denoising autoencoders

Contractive autoencoders

Autoencoders with dropout on the hidden layer

(9)

Sparse Autoencoders

Limit capacity of autoencoder by adding a term to the cost function penalizing the code for being larger

𝐿𝐿 𝑥𝑥,𝑔𝑔 𝑓𝑓 𝑥𝑥 + Ω(ℎ) where Ω ℎ = 𝜆𝜆 ∑𝑖𝑖 |ℎ𝑖𝑖|

By limiting the code h, autoencoders learn unique and important features

(10)

U Kang 10

Denoising Autoencoder

Rather than adding a penalty Ω to the cost function, we can

obtain an autoencoder that learns something useful by changing the reconstruction error term

Typical autoencoders minimize 𝐿𝐿 𝑥𝑥, 𝑔𝑔 𝑓𝑓 𝑥𝑥

Denoising autoencoder (DAE) minimizes 𝐿𝐿 𝑥𝑥, 𝑔𝑔 𝑓𝑓 �𝑥𝑥 where �𝑥𝑥 is a copy of 𝑥𝑥 with some noise or corruption

Denoising autoencoders must therefore undo this corruption rather than simply copying the input

(11)

Denoising Autoencoder

DAE training procedure

Sample a training example x from the training data

Sample a corrupted version �𝑥𝑥 from C(�𝑥𝑥|𝑥𝑥) where C is a conditional distribution of corrupted samples �𝑥𝑥 given a data sample 𝑥𝑥

Use (𝑥𝑥, �𝑥𝑥) as a training example for estimating the autoencoder

reconstruction distribution 𝑝𝑝𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝑥𝑥|ℎ) where h is the output of the encoder 𝑓𝑓(�𝑥𝑥)

(12)

U Kang 12

Denoising Autoencoders Learn a Manifold

DAE maps each data point to its nearest point on the manifold

(13)

Vector Field Learned by a

Denoising Autoencoder

(14)

U Kang 14

Contractive Autoencoder

As in sparse autoencoder, use a penalty term Ω, but with a different form

𝐿𝐿 𝑥𝑥,𝑔𝑔 𝑓𝑓 𝑥𝑥 + Ω(ℎ,𝑥𝑥)

where Ω ℎ,𝑥𝑥 = 𝜆𝜆 ∑𝑖𝑖||∇𝑥𝑥𝑖𝑖||2

This forces the model to learn a function that does not change much when 𝑥𝑥 changes slightly

For an “identity” encoder, the penalty would be large

Connection between DAE and contractive autoencoder

For a small Gaussian input noise, the denoising reconstruction error is equivalent to a contractive penalty

(15)

Representational Power, Layer Size and Depth

Autoencoders are often trained with only a single layer encoder and a single layer decoder

However, deep encoders and decoders offer many advantages

Because autoencoders are feedforward networks

Depth can exponentially reduce the computational cost of representing some functions

Depth can also exponentially decrease the amount of training data needed to learn some functions

A common strategy for training a deep autoencoder is to greedily pretrain the deep architecture by training a stack of shallow

autoencoders

Thus, we often encounter shallow autoencoders even in the case of a deep autoencoder

(16)

U Kang 16

What you need to know

Autoencoder

Motivation

Learn low dimensional embedding of data points, by learning to reconstruct output given input

Undercomplete and overcomplete autoencoders

Undercomplete autoencoders avoid learning trivial function, but with low capacity

Overcomplte autoencoders can avoid learning trivial function via regularization

Regularization

Sparse, denoising, contractive autoencoders

(17)

Questions?

참조

관련 문서

Internal environment measurement data collected from IoT sensors, device control data collected from device control actuators, and user judgment data are learned

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation”, ECCV,

지역적인 세부 특징을 효과적으로 추출하기 위해 비교적 적은 수의 계층을 사용하였으며, 최상단 계 층 외에는 모두 컨볼루션 계층으로만 구성된 CNN을 사용

ABSTRACT: In this paper, we construct a prototype model for city data prediction by using time series data of floating population, and use machine learning to analyze urban data

In this paper, we introduced a positioning technology using deep learning based on LTE Channel State Information-Reference Signal (CSI-RS) data, and confirmed the possibility

본 논문에서는 영어로 된 리뷰 상품 데이터를 기반으로 3 개의 딥러닝 알고리즘 모델(Word Count, TFIDF(Term Frequency Inverse Document Frequency),

A sentiment analysis model for small-scale unstructured policy data using transfer learning.. Table 2.1 Transfer learning for sentiment

지역적인 세부 특징을 효과적으로 추출하기 위해 비교적 적은 수의 계층을 사용하였으며, 최상단 계 층 외에는 모두 컨볼루션 계층으로만 구성된 CNN을