Annotating my own dataset - doubts #776

anon747 · 2021-10-04T11:57:24Z

anon747
Oct 4, 2021

Its me again - the guy with the hypothetical rap music dataset. :D
"I am once again asking for your support." - Bernie Sanders

I had a few doubts regarding annotating my own dataset. I am interested in speaker diarization and not speaker tracking. Consider the following scenarios:-

Say I have two audio clips that are part of my dataset - (i) Jay Z ft. J Cole and (ii) Wiz Khalifa ft KSI, When I annotate these two clips for diarization, can I allocate the speaker ID in a serial fashion?
i.e if Jay Z raps first in clip (i) and J Cole raps second can I assign them identifiers 'A' and 'B'? Similarly when start annotating the second clip next, can I assign identifier 'A' to the first guy and 'B' to the next or will these have to be 'C' and 'D'?
Basically, do my identifiers need to be unique at a clip level or do they need to be unique across clips (at the dataset level)?
Can I group speakers under different class labels? For example, can I annotate all British rappers with 'A' in all clips, American rappers with 'B' in all clips and other-country rappers with a third label say 'C'? (This, of course, assumes that same-country rappers rap in a similar style.)
I know that this assumption is not correct - but if I want to work at a speaker-class level would annotations like these make sense?

Answered by hbredin

Oct 4, 2021

For training speaker segmentation models (like voice activity detection or overlapped speech detection), using labels unique at clip level is fine. However, for training speaker embedding models, you should make sure those labels are unique at dataset level.
As long as you do not plan to later distinguish speakers within a class, it should be OK.

View full answer

hbredin · 2021-10-04T14:00:15Z

hbredin
Oct 4, 2021
Maintainer

For training speaker segmentation models (like voice activity detection or overlapped speech detection), using labels unique at clip level is fine. However, for training speaker embedding models, you should make sure those labels are unique at dataset level.
As long as you do not plan to later distinguish speakers within a class, it should be OK.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotating my own dataset - doubts #776

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Annotating my own dataset - doubts #776

anon747 Oct 4, 2021

Replies: 1 comment

hbredin Oct 4, 2021 Maintainer

anon747
Oct 4, 2021

hbredin
Oct 4, 2021
Maintainer