GitHub

This repository contains the code for our final project for NLP 6.8610 called "RetFormers: Hybrid Attention-Retention Mechanisms for Faster Inference".

Large-scale experiments

We can run DDP training using the following command or a variant thereof. binaryvector represents whether each layer is attention, 0, or retention, 1. torchrun --nproc_per_node=4 train_mixed_retnet_transformer.py --dffn=3072 --chunksize=128 --batchsize=24 --lr1=0.001 --lr2=0.0001 --numepochs=40 --printevery=10000 --isdistributed=1 --savenamebest=typeAbest --savenamefinal=typeAfinal --project=mixedtransformer2 --binaryvector=000000000001

Small-scale experiments and flop measurement

Run the hybrid_retformer.ipynb notebook, and to measure different inference latencies, we can change the forward function in the Retformer module to be forward_efficient_typeA, forward_efficient_typeB, or forward_efficient_typeC.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
base_tranformer.py		base_tranformer.py
hybrid_retformer.ipynb		hybrid_retformer.ipynb
infer_retnet.py		infer_retnet.py
inference_benchmark.py		inference_benchmark.py
ishank_train_transformer.py		ishank_train_transformer.py
mixed_retnet_transformer.py		mixed_retnet_transformer.py
retention.py		retention.py
retnet.py		retnet.py
rope_attention.py		rope_attention.py
test.ipynb		test.ipynb
train_mixed_retnet_transformer.py		train_mixed_retnet_transformer.py
train_retnet.py		train_retnet.py
train_retnet_ray.py		train_retnet_ray.py
train_scripts.sh		train_scripts.sh
train_transformer_distributed.py		train_transformer_distributed.py
transformer.py		transformer.py
xpos.py		xpos.py
xpos_relative_position.py		xpos_relative_position.py
zombie_retformer.py		zombie_retformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large-scale experiments

Small-scale experiments and flop measurement

About

Releases

Packages

Contributors 2

Languages

mathletema/retformer

Folders and files

Latest commit

History

Repository files navigation

Large-scale experiments

Small-scale experiments and flop measurement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages