Skip to content

Commit

Permalink
Merge branch 'develop' into feat/joint-diarization-and-embedding-with…
Browse files Browse the repository at this point in the history
…-prepared-data
  • Loading branch information
clement-pages committed Jun 21, 2024
2 parents ad9e435 + 2e04ec7 commit aeb147f
Show file tree
Hide file tree
Showing 18 changed files with 2,364 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8, 3.9, "3.10"]
python-version: ["3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
Expand Down
27 changes: 26 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,33 @@
# Changelog

## develop

## Version 3.3.1 (2024-06-19)

### Breaking changes

- setup: drop support for Python 3.8

### Fixes

- fix: fix support for `numpy==2.x` ([@ibevers](https://github.com/ibevers/))
- fix: fix support for `speechbrain==1.x` ([@Adel-Moumen](https://github.com/Adel-Moumen/))


## Version 3.3.0 (2024-06-14)

### TL;DR

`pyannote.audio` does [speech separation](https://hf.co/pyannote/speech-separation-ami-1.0): multi-speaker audio in, one audio channel per speaker out!

```bash
pip install pyannote.audio[separation]==3.3.0
```

### New features

- feat(task): add `PixIT` joint speaker diarization and speech separation task (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(model): add `ToTaToNet` joint speaker diarization and speech separation model (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(pipeline): add `SpeechSeparation` pipeline (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(io): add option to select torchaudio `backend`

### Fixes
Expand All @@ -15,6 +39,7 @@

- improve(io): when available, default to using `soundfile` backend
- improve(pipeline): do not extract embeddings when `max_speakers` is set to 1
- improve(pipeline): optimize memory usage of most pipelines ([#1713](https://github.com/pyannote/pyannote-audio/pull/1713) by [@benniekiss](https://github.com/benniekiss/))

## Version 3.2.0 (2024-05-08)

Expand Down
11 changes: 5 additions & 6 deletions pyannote/audio/core/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ def aggregate(
warm_up: Tuple[float, float] = (0.0, 0.0),
epsilon: float = 1e-12,
hamming: bool = False,
missing: float = np.NaN,
missing: float = np.nan,
skip_average: bool = False,
) -> SlidingWindowFeature:
"""Aggregation
Expand Down Expand Up @@ -559,9 +559,6 @@ def aggregate(
step=frames.step,
)

masks = 1 - np.isnan(scores)
scores.data = np.nan_to_num(scores.data, copy=True, nan=0.0)

# Hamming window used for overlap-add aggregation
hamming_window = (
np.hamming(num_frames_per_chunk).reshape(-1, 1)
Expand Down Expand Up @@ -613,11 +610,13 @@ def aggregate(
)

# loop on the scores of sliding chunks
for (chunk, score), (_, mask) in zip(scores, masks):
for chunk, score in scores:
# chunk ~ Segment
# score ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray
# mask ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray

mask = 1 - np.isnan(score)
np.nan_to_num(score, copy=False, nan=0.0)

start_frame = frames.closest_frame(chunk.start + 0.5 * frames.duration)

aggregated_output[start_frame : start_frame + num_frames_per_chunk] += (
Expand Down
Loading

0 comments on commit aeb147f

Please sign in to comment.