Releases: huggingface/tgi-gaudi
Releases · huggingface/tgi-gaudi
v2.0.1: SynapseAI v1.16.0
SynapseAI v1.16.0
The codebase is validated with SynapseAI 1.16.0 and optimum-habana 1.12.0.
Tested configurations
- LLama2 7B BF16 / FP8 on 1xGaudi2
- LLama2 70B BF16 / FP8 on 8xGaudi2
- Falcon 180B BF16 / FP8 on 8xGaudi2
- Mistral 7B BF16 / FP8 on 1xGaudi2
- Mixtral 8x7B BF16 / FP8 on 1xGaudi2
Highlights
- Add support for grammar feature
- Add support for Habana Flash Attention
Full Changelog: v2.0.0...v2.0.1
v2.0.0: SynapseAI v1.15.0
SynapseAI v1.15.0
The codebase is validated with SynapseAI 1.15.0 and optimum-habana 1.11.1.
Tested configurations
- LLama2 70B BF16 / FP8 on 8xGaudi2
Highlights
- Add support for FP8 precision
Full Changelog: v1.2.1...v2.0.0
v1.2.1: SynapseAI v1.14.0
SynapseAI v1.14
The codebase is validated with SynapseAI 1.14.0 and optimum-habana 1.10.4.
Tested configuration
- LLama2 70B BF16 on 8xGaudi2
Highlights
- Add support for continuous batching on Intel Gaudi
- Add batch size bucketing
- Add sequence bucketing for prefill operation
- Optimize concatenate operation
- Add speculative scheduling
Full Changelog: v1.2.0...v1.2.1