[shardformer] sync tests modification toto sequence parallel branch #4434

flybird11111 · 2023-08-14T10:09:30Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

sync tests modification toto sequence parallel branch

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

…ch#4162)

…Plugin (hpcaitech#4141) * [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather

* [docker] fixed ninja build command * polish code

Co-authored-by: github-actions <[email protected]>

…pcaitech#4241) * added softmax kernel * added qkv_kernel * added ops * adding tests * upload tets * fix tests * debugging * debugging tests * debugging * added * fixed errors * added softmax kernel * clean codes * added tests * update tests * update tests * added attention * add * fixed pytest checking * add cuda check * fix cuda version * fix typo

* [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version

…ech#4302) * sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin

* [NFC] polish colossalai/booster/mixed_precision/mixed_precision_base.py code style

* revise shardformer readme (hpcaitech#4246) * [example] add llama pretraining (hpcaitech#4257) * [NFC] polish colossalai/communication/p2p.py code style --------- Co-authored-by: Jianghai <[email protected]> Co-authored-by: binmakeswell <[email protected]> Co-authored-by: Qianran Ma <[email protected]>

…tyle (hpcaitech#4255)

…style (hpcaitech#4256) Co-authored-by: supercooledith <[email protected]>

hpcaitech#4259)

…itech#4260)

Co-authored-by: aye42 <[email protected]>

* [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style * [NFC] polish colossalai/communication/utils.py code style --------- Co-authored-by: Minghao Huang <[email protected]>

…yle (hpcaitech#4271)

…h#4273)

…ech#4274) Co-authored-by: Yuanchen Xu <[email protected]>

…pcaitech#4275)

Co-authored-by: henryqin1997 <[email protected]>

* [release] update version * [devops] hotfix cuda extension building * [devops] pytest ignore useless folders

* [test] remove legacy zero test * [test] remove lazy distribute test * [test] remove outdated checkpoint io

* style: rename replay buffer Experience replay is typically for off policy algorithms. Use this name in PPO maybe misleading. * fix: fix wrong zero2 default arg * test: update experience tests * style: rename zero_pad fn * fix: defer init in CycledDataLoader * test: add benchmark test * style: rename internal fn of generation * style: rename internal fn of lora * fix: remove unused loss fn * fix: remove unused utils fn * refactor: remove generate_with_actor fn * fix: fix type annotation * test: add models tests * fix: skip llama due to long execution time * style: modify dataset * style: apply formatter * perf: update reward dataset * fix: fix wrong IGNORE_INDEX in sft dataset * fix: remove DataCollatorForSupervisedDataset * test: add dataset tests * style: apply formatter * style: rename test_ci to test_train * feat: add llama in inference * test: add inference tests * test: change test scripts directory * fix: update ci * fix: fix typo * fix: skip llama due to oom * fix: fix file mod * style: apply formatter * refactor: remove duplicated llama_gptq * style: apply formatter * to: update rm test * feat: add tokenizer arg * feat: add download model script * test: update train tests * fix: modify gemini load and save pretrained * test: update checkpoint io test * to: modify nproc_per_node * fix: do not remove existing dir * fix: modify save path * test: add random choice * fix: fix sft path * fix: enlarge nproc_per_node to avoid oom * fix: add num_retry * fix: make lora config of rm and critic consistent * fix: add warning about lora weights * fix: skip some gpt2 tests * fix: remove grad ckpt in rm and critic due to errors * refactor: directly use Actor in train_sft * test: add more arguments * fix: disable grad ckpt when using lora * fix: fix save_pretrained and related tests * test: enable zero2 tests * revert: remove useless fn * style: polish code * test: modify test args

Improved ColoAttention interface to support flash attention 2. Solved hpcaitech#4322

fixed an import error

* [doc] fix gradient accumulation doc * [doc] fix gradient accumulation doc

* [doc] add Series A Funding and NeurIPS news * [kernal] fix mha kernal * [CI] skip moe * [CI] fix requirements

…ech#4362) * [shardformer] supported flash attention test dependency (hpcaitech#4158) * [shardformer] fix flash attention utils test (hpcaitech#4180) * [shardformer] opt support flash attention (hpcaitech#4163) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] add performance benchmark of shardformer (hpcaitech#4175) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] benchmark fix * [shardformer] benchmark fix * [shardformer] llama support flash attention (hpcaitech#4185) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] llama support flash attention * [shardformer] llama support flash attention * [shardformer] Move the import statement for xformer outside the forward function. * [shardformer] gpt2 support flash attention. (hpcaitech#4191) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] bloom support flash attention (hpcaitech#4188) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom suport flash attention * [shardformer] add assert to sequence length * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] bert support flash attention. (hpcaitech#4206) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bert support flash attention * [shardformer] t5 support flash attention. (hpcaitech#4216) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 support flash attention * [shardformer] t5 support flash attention * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (hpcaitech#4215) * added padded causal attn mask type for ColoAttention * [shardformer]t5 flash attention fix (hpcaitech#4239) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 flash attention fix * [shardformer] update gpt2 to use coloattention. (hpcaitech#4234) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 * [shardformer] update opt and llama to use coloattention. (hpcaitech#4226) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt * [shardformer] shardformer support jit fused operator. (hpcaitech#4236) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add type hint to 'self' param of forward * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (hpcaitech#4290) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> * [shardformer] whisper support flash attention (hpcaitech#4301) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] whisper support flash attention * [shardformer] whisper support flash attention * [shardformer]whisper support jit operator --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> * [shardformer] sam support flash attention (hpcaitech#4316) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] sam support flash attention --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> * [shardformer] merge blip2/chatglm (hpcaitech#4321) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (hpcaitech#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> Co-authored-by: klhhhhh <[email protected]> * [shardformer] blip2 support flash attention and jit operator (hpcaitech#4325) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (hpcaitech#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> Co-authored-by: klhhhhh <[email protected]> * [shardformer] chatglm support flash attention and jit operator (hpcaitech#4330) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (hpcaitech#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> Co-authored-by: klhhhhh <[email protected]> * [shardformer] vit support flash attention and jit operator (hpcaitech#4334) * Feature/vit support (hpcaitech#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (hpcaitech#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (hpcaitech#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (hpcaitech#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (hpcaitech#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] vit support flash attention and jit operator * [shardformer] vit support flash attention and jit operator --------- Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> Co-authored-by: klhhhhh <[email protected]> * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] fix conflict * [pipeline] fix conflict * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * fix flash attention tests * gemini ignore whisper * fix vit * fix xformers import handle --------- Co-authored-by: Frank Lee <[email protected]> Co-authored-by: Kun Lin <[email protected]> Co-authored-by: FoolPlayer <[email protected]> Co-authored-by: klhhhhh <[email protected]>

…peline (hpcaitech#4388) * fix remaining t5 bugs/rewrite t5 tests * fix multi-tensor communication in pipeline * rearrange test_config * fix keyerror in sync_shared_params * fix get_held_layers & Randomnizer, complete t5 tests * erase printing * fix get_held_layers through modifying _release_unheld_layers * fix _get_recursive_held_layers bug

Updated coloattention tests of checking outputs and gradients

…4392) * cherry-pick flash attention 2 cherry-pick flash attention 2 * [shardformer] update shardformer to use flash attention 2 [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix

[shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations

…ech#4396)

* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * rewrite bert test * rewrite bert test * fix some bugs * del pipeline tests * del pipeline tests * del useless print * del useless print * rewrite data repeats

* [shardformer] gpt2 tests fix [shardformer] test all optimizations (hpcaitech#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer] gpt2 tests fix

* improve stablility of zero * fix wrong index * add record stream

…h#4395) * rewrite opt tests * rewrite llama tests * rewrite bloom & vit tests * rewrite chatglm tests * fix LinearCol for classfiers * add judge for other tp layers, fix lazy init in util

[shardformer] update tests for all optimization

…4407) * [shardformer] gpt2 tests fix [shardformer] test all optimizations (hpcaitech#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer]update t5 to use all optimizations

[shardformer] update bloom/llama/vit/chatglm tests [shardformer] update opt tests [shardformer] update opt tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests

[sync] update pipeline branch with main

FrankLeeeee and others added 30 commits July 4, 2023 18:11

[workflow] show test duration (hpcaitech#4159)

cc3cbe9

[dtensor] fixed readme file name and removed deprecated file (hpcaite…

190a6ea

…ch#4162)

[docker] added ssh and rdma support for docker (hpcaitech#4192)

fee32a3

Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini …

5891344

…Plugin (hpcaitech#4141) * [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather

[docker] fixed ninja build command (hpcaitech#4203)

c1cf752

* [docker] fixed ninja build command * polish code

Automated submodule synchronization (hpcaitech#4217)

4e9b09c

Co-authored-by: github-actions <[email protected]>

revise shardformer readme (hpcaitech#4246)

9a4842c

[example] add llama pretraining (hpcaitech#4257)

7ff11b5

[lazy] support init on cuda (hpcaitech#4269)

fc5cef2

* [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (hpcait…

c6f6005

…ech#4302) * sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin

[ci] support testmon core pkg change detection (hpcaitech#4305)

02192a6

[NFC] Fix format for mixed precision (hpcaitech#4253)

b366f1d

* [NFC] polish colossalai/booster/mixed_precision/mixed_precision_base.py code style

[NFC] polish applications/Chat/inference/requirements.txt code style (h…

915ed8b

…pcaitech#4265)

[NFC] polish applications/Chat/coati/models/base/actor.py code style (h…

77c469e

…pcaitech#4248)

[NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (…

dee1c96

…hpcaitech#4250)

[NFC] polish colossalai/cli/benchmark/utils.py code style (hpcaitech#…

85774f0

…4254)

[NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code s…

c614a99

…tyle (hpcaitech#4255)

[NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code …

abe4f97

…style (hpcaitech#4256) Co-authored-by: supercooledith <[email protected]>

[NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (

b2debdc

hpcaitech#4259)

[NFC] polish applications/Chat/coati/trainer/base.py code style (hpca…

798cb72

…itech#4260)

[NFC] polish unary_elementwise_generator.py code style (hpcaitech#4267)

3883db4

Co-authored-by: aye42 <[email protected]>

[NFC] polish runtime_preparation_pass style (hpcaitech#4266)

fee5532

[NFC] fix: format (hpcaitech#4270)

a50d39a

* [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style * [NFC] polish colossalai/communication/utils.py code style --------- Co-authored-by: Minghao Huang <[email protected]>

[NFC] polish applications/Chat/examples/train_reward_model.py code st…

1ce997d

…yle (hpcaitech#4271)

[NFC] fix format of application/Chat/coati/trainer/utils.py (hpcaitec…

caa4433

…h#4273)

[NFC] polish applications/Chat/inference/server.py code style (hpcait…

dc1b612

…ech#4274) Co-authored-by: Yuanchen Xu <[email protected]>

[NFC] polish applications/Chat/coati/models/generation.py code style (h…

709e121

…pcaitech#4275)

applications/Chat/.gitignore (hpcaitech#4279)

c972d65

Co-authored-by: henryqin1997 <[email protected]>

ver217 and others added 25 commits August 1, 2023 15:01

[release] update version (hpcaitech#4332)

8064771

* [release] update version * [devops] hotfix cuda extension building * [devops] pytest ignore useless folders

[hotfix] update gradio 3.11 to 3.34.0 (hpcaitech#4329)

16c0acc

[test] remove useless tests (hpcaitech#4359)

16bf4c0

* [test] remove legacy zero test * [test] remove lazy distribute test * [test] remove outdated checkpoint io

[fix] coloattention support flash attention 2 (hpcaitech#4347)

25c57b9

Improved ColoAttention interface to support flash attention 2. Solved hpcaitech#4322

[coloattention] fix import error (hpcaitech#4380)

38b792a

fixed an import error

[doc] Fix gradient accumulation doc. (hpcaitech#4349)

f40b718

* [doc] fix gradient accumulation doc * [doc] fix gradient accumulation doc

[doc] add Series A Funding and NeurIPS news (hpcaitech#4377)

089c365

* [doc] add Series A Funding and NeurIPS news * [kernal] fix mha kernal * [CI] skip moe * [CI] fix requirements

[kernel] updated unittests for coloattention (hpcaitech#4389)

458ae33

Updated coloattention tests of checking outputs and gradients

[shardformer] test all optimizations (hpcaitech#4399)

ed2c229

[shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations

[gemini] fix tensor storage cleaning in state dict collection (hpcait…

6ccecc0

…ech#4396)

[hotfix] fix unsafe async comm in zero (hpcaitech#4404)

d86ddd9

* improve stablility of zero * fix wrong index * add record stream

[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (hpcaitec…

1e518ae

…h#4395) * rewrite opt tests * rewrite llama tests * rewrite bloom & vit tests * rewrite chatglm tests * fix LinearCol for classfiers * add judge for other tp layers, fix lazy init in util

[shardformer] update tests for all optimization (hpcaitech#4413)

d4a3a10

[shardformer] update tests for all optimization

Merge branch 'main' into feature/pipeline

6990477

Merge pull request hpcaitech#4424 from ver217/sync/pipeline

60db2cc

[sync] update pipeline branch with main

[misc] resolve code factor issues (hpcaitech#4433)

9d1a6d2

[sync] update tests modification toto sequence parallel branch

2dd1b39

FoolPlayer approved these changes Aug 16, 2023

View reviewed changes

flybird11111 closed this Aug 16, 2023

flybird11111 deleted the feature/seq-parallel branch April 11, 2024 03:09

flybird11111 restored the feature/seq-parallel branch April 11, 2024 03:09

flybird11111 deleted the feature/seq-parallel branch April 11, 2024 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shardformer] sync tests modification toto sequence parallel branch #4434

[shardformer] sync tests modification toto sequence parallel branch #4434

flybird11111 commented Aug 14, 2023

[shardformer] sync tests modification toto sequence parallel branch #4434

[shardformer] sync tests modification toto sequence parallel branch #4434

Conversation

flybird11111 commented Aug 14, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?