Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: support for joint diarization and embedding #1409

Conversation

clement-pages
Copy link
Collaborator

No description provided.

chai3 and others added 15 commits June 8, 2023 08:42
BREAKING(model): get rid of (flaky) `Model.introspection`
- fixes the dimension error between files id and probabilties arrays
- changes the way of how chunks for the embedding task are sampled
- creates two functions to draw chunks, one for each subtask

Tests are required to ensure that there are no bugs
For now this is a copy past from methods in segmentation task.
as computing this loss probably does not make sense in powerset
mode because first class (empty set of labels) does exactly this
Comment on lines 209 to 243
# keep track of list of classes for regular segmentation protocols
# Different files may be annotated using a different set of classes
# (e.g. one database for speech/music/noise, and another one for male/female/child)
if isinstance(self.protocol, SegmentationProtocol):

if "classes" in file:
local_classes = file["classes"]
else:
local_classes = file["annotation"].labels()

# if task was not initialized with a fixed list of classes,
# we build it as the union of all classes found in files
if self.classes is None:
for klass in local_classes:
if klass not in classes:
classes.append(klass)
annotated_classes.append(
[classes.index(klass) for klass in local_classes]
)

# if task was initialized with a fixed list of classes,
# we make sure that all files use a subset of these classes
# if they don't, we issue a warning and ignore the extra classes
else:
extra_classes = set(local_classes) - set(self.classes)
if extra_classes:
warnings.warn(
f"Ignoring extra classes ({', '.join(extra_classes)}) found for file {file['uri']} ({file['database']}). "
)
annotated_classes.append(
[
self.classes.index(klass)
for klass in set(local_classes) & set(self.classes)
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be removed. Double check, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this can be removed. Same for lines 419-423 I think. I can also remove self.classes and self.annotated_classes. These two attributes are not used anywhere in the code aside in setup for SegmentationProtocol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3d295dd

Comment on lines 158 to 159
elif isinstance(self.protocol, SegmentationProtocol):
classes = getattr(self, "classes", list())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be removed. Double check, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thoughts. It would be better for this class to inherit from SegmentationMixin because there is a lot of duplicated code in setup

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be removed. Double check, though.

Done in 3d295dd

metadata = self.metadata[file_id]
sample["meta"] = {key: metadata[key] for key in metadata.dtype.names}
sample["meta"]["file"] = file_id
sample["meta"]["subtask"] = subtask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, sample["meta"] already contains a "scope" key.
Therefore, I think you can safely remove this additional "subtask" key -- which is basically equivalent to checking whether "scope" is global nor not.

Bonus: you could then inherit prepare_chunk from the regular SpeakerDiarization task.

segment = np.random.choice(class_segments, p=prob_segments)

# sample chunk start time in order to intersect it with the sampled segment
start_time = np.random.uniform(segment["start"] - duration / 2, segment["start"])
Copy link
Member

@hbredin hbredin Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually sample between

  • segment["start"] - duration, because the choice of 50% of duration is a bit arbitrary and might lead to a bias in the distribution of speech within a chunk.

and

  • segment["end"], because using segment["start"] would definitely lead to a similar bias as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, I do understand that there are a lot of short audio files in VoxCeleb and choosing segment["end"] as endpoint could lead to a lot of chunks that must be zero padded. Let's discuss this in our next meeting.

clement-pages and others added 18 commits June 28, 2023 16:26
as these loop could break gradient flow and to optimize
the code
for now do the trick only for the diarization subtask
There was an issue when the number of speakers in a chunk was
greater than the maximum number per chunk set for the task.
these two methods were identical to the methods inherited from the
`SegmentationTaskMixin` class
and fix issue with the loss type during training
this version replace `StatsPool` by a concatenation of the last
outputs of TDNN (for the embedding part) and LSTM (for the diarization
part) and a LSTM layer
Now, this LSTM is bidirectionnal and has a hidden size of 1500, so
the outputs shape of this encoder is (b, s, 1500*2). This will allow
comparing with `StatsPool` version of the SPEED model
@hbredin
Copy link
Member

hbredin commented Jan 23, 2024

Closing as (I think, correct me if I am wrong) this is superseded by #1583.

@hbredin hbredin closed this Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants