Curriculum learning

Curriculum learning은 모델이 학습을 더 잘 할 수 있는 학습 과정을 정의하는 분야입니다.

Paper Link : Curriculum Learning – Yoshua Bengio(2009)

Curriculum	Intuition	Model Name
No curriculum	Assign uniform weight to every sample uniform.	`baseline_mentornet`
Self-paced (Kuma et al. 2010)	Favor samples of smaller loss.	`self_paced_mentornet`
SPCL linear (Jiang et al. 2015)	Discount the weight by loss linearly.	`spcl_linear_mentornet`
Hard example mining (Felzenszwalb et al., 2008)	Favor samples of greater loss.	`hard_example_mining_mentornet`
Focal loss (Lin et al., 2017)	Increase the weight by loss by the exponential CDF.	`focal_loss_mentornet`
Predefined Mixture	Mixture of SPL and SPCL changing by epoch.	`mentornet_pd`
MentorNet Data-driven	Learned on a small subset of the CIFAR data.	`mentornet_dd`

Paper Link : MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels(2017)

Paper Link : Focal Loss for Dense Object Detection(2017)

간단하게 Focal loss를 요약하자면 잘 찾은 class에 대해서는 loss를 적게 줘서 loss 갱신을 거의 하지 못하게 하고 잘 찾지 못한 class에 대해서는 loss를 크게 줘서 loss 갱신을 크게 하는 것입니다.

그 차이를 만드는 역할을 하는 것이 gamma 입니다.

수식은 간단합니다 기존 cross entropy에 (1-Pt(probability)) ** gamma 를 곱하면 됩니다.

위의 결과는 잘 찾았을 때(0.9의 확률로 classification) gamma 값에 따른 focal loss 값을 표시한 것입니다. gamma가 높을수록 loss가 엄청 낮아진다는 것을 볼 수 있습니다.

위의 결과는 잘 찾지 못했을 때(0.1의 확률로 classification) gamma 값에 따른 focal loss 값을 표시한 것입니다.
gamma가 높을수록 loss가 낮아지지만 잘 찾지 못할 때 loss는 소폭 줄어들게 하고 잘 찾은 경우에서는 loss를 대폭 줄어들게 합니다.
gamma가 0일 때는 기존 cross entropy와 같은데 잘 찾을 때와 잘 찾지 못할 때의 loss 차이가 2.3배가 나지만
gamma가 3일 때는 무려 16000배나 차이가 납니다.
이렇게 되면 잘 찾은 것에 대해서는 loss를 엄청 낮게 주고 잘 찾지 못한 것에 대해서는 loss를 소폭 낮춰서
잘 찾은 것에 대한 loss 갱신을 못하게 만듭니다.
이렇게 함으로써 잘 찾지 못한 class에 대해 더 집중해서 학습하도록 하는 것이 Focal loss입니다.

Curriculum learning

Leave a Reply Cancel reply