Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object label? #17

Open
lumiaomiao opened this issue Oct 14, 2021 · 8 comments
Open

object label? #17

lumiaomiao opened this issue Oct 14, 2021 · 8 comments

Comments

@lumiaomiao
Copy link

Hi, could you explain the * in Table3 in ATL?
You described it as "* means we only use the boxes of the detection results", but how do you use the category of the detection results in training phrase and inference phrase ?

@zhihou7
Copy link
Owner

zhihou7 commented Oct 14, 2021

Sorry for getting confusing you.

The object detection results provide both object category information and bounding boxes. Here, we only use the bounding boxes for inferring the HOI category. The training phase is the same as the previous setting. In fact, * means we use the same model as ATL, but do not use the object category information during inference.

feel free to contact me if you have further question,

Regards,

@lumiaomiao
Copy link
Author

Thank you for your replay.

@lumiaomiao
Copy link
Author

lumiaomiao commented Oct 20, 2021

@zhihou7 Hi, I have another question about the code. The function get_new_Trainval_N in lib/ult/ult.py is definied as :
image

Why use " Trainval_N[4]" not " Trainval_N[k]" ?

@zhihou7
Copy link
Owner

zhihou7 commented Oct 20, 2021

Thanks for your comment. It should be Tranval_N[k]. It is a bug from the code of VCL. I forget to update the code. After fixing this bug, the performance will be improved a bit. This bug also does not add seen classes for zero-shot setting. Therefore, it just affects the performance a bit.

I have updated the code.

Thanks.

@lumiaomiao
Copy link
Author

Thank you for your quick reply.

@lumiaomiao
Copy link
Author

lumiaomiao commented Oct 21, 2021

@zhihou7 As following codes, if an image contains two pairs <h1, v1, o1>, <h1, v2, o1> , and the first one is in the unseen composition list, then you delete two pair from training data. Why don't you only delete the first one ? In my view, only deleting the first one is more close to your description in paper.
image

@zhihou7
Copy link
Owner

zhihou7 commented Oct 21, 2021

Here, GT[1] is HOI label list of a HOI sample, e.g., [eat apple, hold apple]. If "eat apple" is unseen category. I think it is fair to remove this HOI sample, rather than remove the annotation [eat apple]. Otherwise, the sample of "eat apple" is still existing, but is not labeled, which I think is different from the setting of zero-shot.

@lumiaomiao
Copy link
Author

I get it, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants