large batch size Adam or SGD learning rate data auto augmentation ResNeSt > ResNet circle loss weight decay: WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0. ...