DQN 2013 vs DQN 2015 방식

DQN 2013

1. Build network and initialize

2. enviroment initialize

3. loop{

3-1. a = get actions (e-greedy)

3-2. buffer.push(do action (env-step(a)))

3-3. if buffer.size is enough {

3-3-1. get random data from buffer

3-3-2. train model & update

}

생긴 문제점 :

correlation between samples

non-stationary targets

해결 방안?

Go deep

Capture and replay -> correlation between samples 해결

Separate network ->non-stationary targets 해결

DQN 2015

1. Build "2" network and initialize (targetNet = mainNet)

2. enviroment initialize

3. loop{

3-1. a = get actions (e-greedy) // a => state, reward, done ...

3-2. buffer.push(do action (env-step(a)))

3-3. if buffer.size is enough {

3-3-1. loop {

3-3-1-1. get random data from buffer

3-3-1-2. Y = train target model

3-3-1-2. update main = Y

}

3-3-2. target = main

}

Convolutional layer (ConvNet, CNN) (0)	2021.04.02
DQN 논문 해석, 분석 (0)	2021.04.02
프로젝트 1 : 대출 가능 여부 예측 문제 / 스텝 5 : 데이터 처리 (0)	2021.03.24
프로젝트 1 : 대출 가능 여부 예측 문제 / 스텝 4 : feature selection (0)	2021.03.24
프로젝트 1 : 대출 가능 여부 예측 문제 / 스텝 3 : 모델 정의하기 (0)	2021.03.21

티스토리툴바