Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SigmoidFocalLoss [Bug] #2906

Open
2 tasks done
HKEa opened this issue Aug 21, 2023 · 3 comments
Open
2 tasks done

SigmoidFocalLoss [Bug] #2906

HKEa opened this issue Aug 21, 2023 · 3 comments

Comments

@HKEa
Copy link

HKEa commented Aug 21, 2023

Prerequisite

Environment

Hello

thank you for a great library.
But looks like a bug in SigmoidFocalLoss:

OrderedDict([('sys.platform', 'linux'), ('Python', '3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0,1,2,3,4,5,6,7', 'NVIDIA A40'), ('CUDA_HOME', '/usr/local/cuda'), ('NVCC', 'Cuda compilation tools, release 12.2, V12.2.91'), ('GCC', 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0'), ('PyTorch', '2.0.1'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.7\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.15.2'), ('OpenCV', '4.8.0'), ('MMEngine', '0.8.2'), ('MMCV', '2.0.1'), ('MMCV Compiler', 'GCC 9.3'), ('MMCV CUDA Compiler', '11.8')])

I've checked https://github.com/open-mmlab/mmcv/blob/main/mmcv/ops/csrc/common/cuda/sigmoid_focal_loss_cuda_kernel.cuh
for num_classes=1=> c=0, the result is that flag_p and flag_n just swap
T flag_p = (t == c);
T flag_n = (t != c);
for sigmoid like target, the num_classes=1 from https://github.com/open-mmlab/mmcv/blob/main/mmcv/ops/csrc/pytorch/cuda/focal_loss_cuda.cu
(int num_classes = input.size(1);)

BR

Reproduces the problem - code sample

loss=SigmoidFocalLoss(alpha=-1, gamma=2)
loss(torch.tensor([[-1000.]]).cuda(), torch.tensor([1]).cuda())
tensor(0., device='cuda:0')
loss(torch.tensor([[1000.]]).cuda(), torch.tensor([1]).cuda())
tensor(174.6731, device='cuda:0')

Reproduces the problem - command or script

none

Reproduces the problem - error message

none

Additional information

No response

@qingpeng9802
Copy link

qingpeng9802 commented Aug 30, 2023

I am currently working on fixing softmax focal algorithm in #2893 as an outside contributor.
Per my understanding, the current sigmoid focal loss implementation is correct. It is very similar to the implementation of
https://github.com/pytorch/pytorch/blob/main/modules/detectron/sigmoid_focal_loss_op.cu

If I understand your issue correctly, you are confusing with the binary classification task here. This is actually a very tricky part. The pytorch's implementaion mentioned above actually drops the background class (you can see a weird d+1). However, mmcv's implementation dose not drop it (so you may need to drop it manually if you need). See pytorch/vision#3250 for more discussion.

In short, num_classes=1 is invalid here, and you should have a background class and a foreground class. Feel free to add any comment.

@HKEa
Copy link
Author

HKEa commented Aug 31, 2023

image
as you can see the results between torchvision implementation and this are different
the num_classes=1 because size of the logits last dim=1 for binary/focal classification

@qingpeng9802
Copy link

qingpeng9802 commented Aug 31, 2023

as you can see the results between torchvision implementation and this are different the num_classes=1 because size of the logits last dim=1 for binary/focal classification

torchvision implementation is correct for both num_classes>1 and num_classes=1, but mmcv's implementation is only correct for num_classes>1.
This is the natural limitation of mmcv's implementation. In object detection tasks, we are usually assigning classes to anchors. For example, the classes are [cat, dog]. For your case, the classes should be [cat, notCat(background)], that is, you have to give 2 classes.

You can imagine that the label with values 0 and 1 is forced to expand into a 2-channel tensor (one-hot) so you have to give a 2-channel input to match the 2-channel label tensor.

I would strongly recommend using a 2-channel input for your case if you are in object detection tasks since 2-channel input solution is common in object detection tasks. However, it is also possible to add a specialized kernel function for num_classes=1 or a d+1 version like detectron to mmcv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants