Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overfitting problem #5

Open
pengshiqi opened this issue Nov 19, 2018 · 12 comments
Open

Overfitting problem #5

pengshiqi opened this issue Nov 19, 2018 · 12 comments

Comments

@pengshiqi
Copy link

Hello,

I cloned your repo and downloaded your dataset, but I could not get the same result as yours.

I train the model on 4 GeForce GTX 1080 Ti GPUs, and keep other arguments as the same. But it turns out to be overfitting.

After 100 epochs, I got prec@1 at 99% and prec@5 at 100% on the train set, but only prec@1 at 48% and prec@5 at 73% on the test set.

Here is part of the log:

DFL-CNN <==> Train <==> Epoch: [113][103/107]
Loss 0.5459 (0.5661)	Loss1 0.0756 (0.0739)	Loss2 0.0004 (0.0289)	Loss3 4.6985 (4.6324)
Prec@1 (99.880)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][104/107]
Loss 0.5188 (0.5656)	Loss1 0.0508 (0.0737)	Loss2 0.0076 (0.0287)	Loss3 4.6041 (4.6321)
Prec@1 (99.881)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][105/107]
Loss 0.5771 (0.5657)	Loss1 0.0774 (0.0737)	Loss2 0.0189 (0.0286)	Loss3 4.8082 (4.6338)
Prec@1 (99.882)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][106/107]
Loss 0.8110 (0.5672)	Loss1 0.3277 (0.0753)	Loss2 0.0100 (0.0285)	Loss3 4.7324 (4.6344)
Prec@1 (99.883)	Prec@5 (100.000)
DFL-CNN <==> Test <==> Epoch: [ 106] Top1:48.050% Top5:73.093%
DFL-CNN <==> Test <==> Epoch: [ 108] Top1:48.913% Top5:73.404% 
DFL-CNN <==> Test <==> Epoch: [ 110] Top1:47.515% Top5:72.575%
DFL-CNN <==> Test <==> Epoch: [ 112] Top1:48.205% Top5:72.765%

During training the model, overfitting problem is inevitable. Because there are only about 6000 images for training, but there are too many parameters in VGG16.

Have you ever met the overfitting problem? And how did you get rid of it?

Looking forward to your reply!

Thank you very much!

@fxle
Copy link

fxle commented Nov 20, 2018

hello,I get the same problem,I haven't saw the "Test Epoch"appeared worse ,it still training...now.
I set learning-rate', default=0.001,How about you?Do you have any better proposals?
DFL-CNN <==> Train Epoch: [201][1159/1494]
Loss 1.6247 (1.4218) Loss1 1.6245 (1.4152) Loss2 0.0000 (0.0000) Loss3 0.0023 (0.0649)
Top1 100.000 (100.000) Top5 100.000 (100.000)
DFL-CNN <==> Train Epoch: [201][1160/1494]
Loss 1.5835 (1.4219) Loss1 1.5780 (1.4154) Loss2 0.0000 (0.0000) Loss3 0.0543 (0.0649)
Top1 100.000 (100.000) Top5 100.000 (100.000)

@pengshiqi
Copy link
Author

@fxle Test log is saved in DFL_CNN/log/log_text.txt .

@fxle
Copy link

fxle commented Nov 20, 2018

@pengshiqi Oh,thank you very much! I find it .It seems to be improving.But the Loss2 value looks a little strange.In addition,Do you think the idea of 'filter bank' in this paper can improve rotation invariance at the same time?
DFL-CNN <==> Test <==> Epoch: [ 198] Top1:76.338% Top5:91.284%
DFL-CNN <==> Test <==> Epoch: [ 200] Top1:75.854% Top5:91.042%
DFL-CNN <==> Test <==> Epoch: [ 202] Top1:75.837% Top5:91.111%

@pengshiqi
Copy link
Author

@fxle

DFL-CNN <==> Train Epoch: [323][2/125]
Loss 3.3836 (3.4424)	Loss1 3.1195 (3.1667)	Loss2 0.0326 (0.0284)	Loss3 2.3149 (2.4724)
Top1 100.000 (100.000)	Top5 100.000 (100.000)
DFL-CNN <==> Train Epoch: [323][3/125]
Loss 3.4611 (3.4471)	Loss1 3.2075 (3.1769)	Loss2 0.0126 (0.0245)	Loss3 2.4100 (2.4568)
Top1 100.000 (100.000)	Top5 100.000 (100.000)
DFL-CNN <==> Test <==> Epoch: [ 318] Top1:72.679% Top5:90.991%
DFL-CNN <==> Test <==> Epoch: [ 320] Top1:72.592% Top5:91.094%

Sure, the loss2 is much lower than the other losses. But I don't think that loss2 is strange. It is loss1 and loss3 that look strange. After hundreds of epochs training, they are still very large, which seems abnormal.

I think a potential cause for this problem is that the 1x1 convolutional layer is not initialized randomly, as described in Section 3.3, which has not been implemented in this code.

@fxle
Copy link

fxle commented Nov 21, 2018

@pengshiqi Yes, you are right.It's not initialized .Do you know how to make it? I have some ideas to communicate with you.I suggest we add a qq.my qq numbers are :260730636

@techzhou
Copy link

@pengshiqi @fxle I changed the model using a dropout layer. During training the model, I got loss2 decrease, loss 1 & 3 is basically not reduced. do you got same situation?

@fxle
Copy link

fxle commented Nov 25, 2018

@techzhou No,I didn't use a dropout layer.Maybe you can try to use regularization or Section 3.3 Layer Initialization to make it perform better,

@Ien001
Copy link

Ien001 commented Dec 10, 2018

@techzhou
Hi, how about the accuracy after u implement dropout layer? did it increase a little bit?
I just think dropout may help.
thanks!

@chaerlo
Copy link

chaerlo commented Apr 15, 2019

@pengshiqi hi, how do you solve the overfitting problem?

@wsqat
Copy link

wsqat commented Sep 4, 2019

@pengshiqi hi, how do you solve the overfitting problem? @XieLeo @fxle @techzhou @Ien001 @

@pengshiqi
Copy link
Author

@pengshiqi hi, how do you solve the overfitting problem? @XieLeo @fxle @techzhou @Ien001 @

@wsqat The default hyper-parameters are imperfect. You can adjust the learning rate and the loss weights to obtain better results.

@aparnaambarapu
Copy link

@pengshiqi
Hi, I trained by adjusting the learning rate but ciuld only hit 52% accuracy. Can you share the weights of the model where you got 72% accuracy. Or can you mention any hyper parameters or code changes which could help me get better accuracy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants