{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":756234502,"defaultBranch":"main","name":"optimum-tpu","ownerLogin":"huggingface","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2024-02-12T08:54:38.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/25720743?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1720785872.0","currentOid":""},"activityList":{"items":[{"before":"7fc8110c027df3a6fb42c7df8b64f67f343a662e","after":null,"ref":"refs/heads/update-tgi-version","pushedAt":"2024-07-12T12:04:32.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"50ed7bd08ccd29e20cc3889ac89c3d474efe8ceb","after":"da2d1ad89d3d8a0ffb85eb5d6d6b9919e646e741","ref":"refs/heads/main","pushedAt":"2024-07-12T12:04:27.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore(tgi): update TGI base image (#75)\n\nUse a proper tag instead of a SHA1","shortMessageHtmlLink":"chore(tgi): update TGI base image (#75)"}},{"before":null,"after":"7fc8110c027df3a6fb42c7df8b64f67f343a662e","ref":"refs/heads/update-tgi-version","pushedAt":"2024-07-12T11:46:16.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore(tgi): update TGI base image\n\nUse a proper tag instead of a SHA1","shortMessageHtmlLink":"chore(tgi): update TGI base image"}},{"before":"1c4efa83c25e34b86c47f0dee299030b9e53cc4d","after":null,"ref":"refs/heads/handle-selector-exception","pushedAt":"2024-07-12T07:27:17.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"eb1d7c9c91f8dd8d360ca647285fb0c58674dc3a","after":"50ed7bd08ccd29e20cc3889ac89c3d474efe8ceb","ref":"refs/heads/main","pushedAt":"2024-07-12T07:27:12.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"Handle selector exception (#73)\n\n* fix(tgi): handle invalid generation config error and return to server\r\n\r\nIf there is an invalid generation config, the selector raises an error.\r\nThis is caught by the prefill method, that skips the slot generation,\r\nso the error is handled by the router.\r\nI had not been able to reproduce the problem with a simple HTTP request\r\nto TGI, but it seems it's possible to do it with the HTML form\r\ninterface, so it's better to handle this, even if it's unlikely to\r\nhappen.\r\n\r\n* fix(tgi): handle another exception in prefill\r\n\r\nReturning an empty batch is better than crashing.","shortMessageHtmlLink":"Handle selector exception (#73)"}},{"before":null,"after":"1c4efa83c25e34b86c47f0dee299030b9e53cc4d","ref":"refs/heads/handle-selector-exception","pushedAt":"2024-07-11T15:35:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"fix(tgi): handle another exception in prefill\n\nReturning an empty batch is better than crashing.","shortMessageHtmlLink":"fix(tgi): handle another exception in prefill"}},{"before":"0da9cb95932c263624aaac1eb1d0ae91cb67d7ed","after":null,"ref":"refs/heads/fix-secret-leak-workflow","pushedAt":"2024-07-09T13:18:05.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"e09a66b6b54f1eafba0ba478b00cea37619b1e24","after":"eb1d7c9c91f8dd8d360ca647285fb0c58674dc3a","ref":"refs/heads/main","pushedAt":"2024-07-09T13:18:01.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore(ci): try to fix secret leak workflow (#72)\n\nMaybe there was an indentation problem 🤷","shortMessageHtmlLink":"chore(ci): try to fix secret leak workflow (#72)"}},{"before":"f8c96ed8a0c43d4da8fa3735f38c89bd368bf431","after":"0da9cb95932c263624aaac1eb1d0ae91cb67d7ed","ref":"refs/heads/fix-secret-leak-workflow","pushedAt":"2024-07-09T13:14:42.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore(ci): try to fix secret leak workflow\n\nMaybe there was an indentation problem 🤷","shortMessageHtmlLink":"chore(ci): try to fix secret leak workflow"}},{"before":null,"after":"f8c96ed8a0c43d4da8fa3735f38c89bd368bf431","ref":"refs/heads/fix-secret-leak-workflow","pushedAt":"2024-07-09T13:13:06.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore(ci): try to fix secret leak workflow","shortMessageHtmlLink":"chore(ci): try to fix secret leak workflow"}},{"before":"a489076acc0aef746e9d0846a9f1e55c0e0b22ea","after":null,"ref":"refs/heads/debug-tgi-ie-pt4","pushedAt":"2024-07-09T10:29:13.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"77bebf814feb3ec6c3e48e1e1a68f65b88222d9f","after":"e09a66b6b54f1eafba0ba478b00cea37619b1e24","ref":"refs/heads/main","pushedAt":"2024-07-09T10:29:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"Lower TGI IE batch size (#71)\n\n* feat(TGI): entrypoint's default batch size set to 2\r\n\r\nTo avoid memory issues, default batch size is now lowered.\r\n\r\n* chore: version bumped to 0.1.3","shortMessageHtmlLink":"Lower TGI IE batch size (#71)"}},{"before":null,"after":"a489076acc0aef746e9d0846a9f1e55c0e0b22ea","ref":"refs/heads/debug-tgi-ie-pt4","pushedAt":"2024-07-09T10:06:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore: version bumped to 0.1.3","shortMessageHtmlLink":"chore: version bumped to 0.1.3"}},{"before":"4ffdb173c1d08d320daff80f91321e3f4642f422","after":null,"ref":"refs/heads/lower-memory-static-cache","pushedAt":"2024-07-09T09:32:10.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"7cce24ce3059e74e449c965797ee62e2a5436922","after":"77bebf814feb3ec6c3e48e1e1a68f65b88222d9f","ref":"refs/heads/main","pushedAt":"2024-07-09T09:32:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(cache): use optimized StaticCache class for XLA (#70)\n\nThis is actually a ripoff of the work originally done as a contribution\r\nto transformers:\r\n\r\nhttps://github.com/huggingface/transformers/pull/31129/\r\n\r\nThe original contribution has not been merged yet, but it shows lower\r\nmemory usage and better performance on XLA. So I think it's worth adding\r\nit here, to be integrated on optimum-tpu.","shortMessageHtmlLink":"feat(cache): use optimized StaticCache class for XLA (#70)"}},{"before":"9215b0d0549e14dd244af8efd62517477fab3cf2","after":"4ffdb173c1d08d320daff80f91321e3f4642f422","ref":"refs/heads/lower-memory-static-cache","pushedAt":"2024-07-08T12:57:25.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(cache): use optimized StaticCache class for XLA\n\nThis is actually a ripoff of the work originally done as a contribution\nto transformers:\n\nhttps://github.com/huggingface/transformers/pull/31129/\n\nThe original contribution has not been merged yet, but it shows lower\nmemory usage and better performance on XLA. So I think it's worth adding\nit here, to be integrated on optimum-tpu.","shortMessageHtmlLink":"feat(cache): use optimized StaticCache class for XLA"}},{"before":"02a855600ef8bdbf5785c4548568c7bb594900b8","after":"9215b0d0549e14dd244af8efd62517477fab3cf2","ref":"refs/heads/lower-memory-static-cache","pushedAt":"2024-07-08T12:46:58.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(cache): use optimized StaticCache class for XLA\n\nThis is actually a ripoff of the work originally done as a contribution\nto transformers:\n\nhttps://github.com/huggingface/transformers/pull/31129/\n\nThe original contribution has not been merged yet, but it shows lower\nmemory usage and better performance on XLA. So I think it's worth adding\nit here, to be integrated on optimum-tpu.","shortMessageHtmlLink":"feat(cache): use optimized StaticCache class for XLA"}},{"before":null,"after":"02a855600ef8bdbf5785c4548568c7bb594900b8","ref":"refs/heads/lower-memory-static-cache","pushedAt":"2024-07-08T12:44:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(cache): use optimized StaticCache class for XLA\n\nThis is actually a ripoff of the work originally done as a contribution\nto transformers:\n\nhttps://github.com/huggingface/transformers/pull/31129/\n\nThe original contribution has not been merged yet, but it shows lower\nmemory usage and better performance on XLA. So I think it's worth adding\nit here.","shortMessageHtmlLink":"feat(cache): use optimized StaticCache class for XLA"}},{"before":"3b073027a3480e0e0c86673cf261b43fec2ca204","after":null,"ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-08T08:30:06.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"fd295912922515f3110fffda3cdb9c5ad600a905","after":"7cce24ce3059e74e449c965797ee62e2a5436922","ref":"refs/heads/main","pushedAt":"2024-07-08T08:30:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"Few more Inference Endpoints fixes (#69)\n\n* fix(TGI): correct clear request with a give batch id\r\n\r\n* ci(tgi): create images when pushing on current branch\r\n\r\n* fix(generator): raise error if prefill receives too many requests\r\n\r\n* feat(tgi): add more prefill lenghts\r\n\r\nSince bucketing does not work for now, we add more (small) prefill\r\nlengths. This will increase the warmup time, but it will also allow to\r\nspeed up generation.\r\n\r\n* Revert \"ci(tgi): create images when pushing on current branch\"\r\n\r\nThis reverts commit 26e119330f52e46a0b450c7757c77f73fd34cb9b.\r\n\r\n* fix(test): multiple decode test require max_batch_size to be > 1\r\n\r\n* fix(test): expected result is different when model is compiled\r\n\r\nCompiled model results are not always very good. While this should be\r\nbetter investigated later on, current solution is just to use the\r\nnon-compiled version. This results in some tests generating different\r\nresults, so expectations has been updated accordingly.\r\n\r\n* chore: bump to version v0.1.2","shortMessageHtmlLink":"Few more Inference Endpoints fixes (#69)"}},{"before":"88a1c609dce0d34eea986e31e57ef0124561d06f","after":"3b073027a3480e0e0c86673cf261b43fec2ca204","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-08T08:17:34.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore: bump to version v0.1.2","shortMessageHtmlLink":"chore: bump to version v0.1.2"}},{"before":"4528bcb52be064cf78ed38eacfaa1dd40291e9cc","after":"88a1c609dce0d34eea986e31e57ef0124561d06f","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-06T19:21:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"fix(test): multiple decode test require max_batch_size to be > 1","shortMessageHtmlLink":"fix(test): multiple decode test require max_batch_size to be > 1"}},{"before":"a9ce7fcd108b641d4b2aa2ab973ba14e95dc2617","after":"4528bcb52be064cf78ed38eacfaa1dd40291e9cc","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-06T11:20:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore: bump to version v0.1.2","shortMessageHtmlLink":"chore: bump to version v0.1.2"}},{"before":"30184a1f5a3f63c104fa92bb0e50e2f56bb2aa4d","after":"a9ce7fcd108b641d4b2aa2ab973ba14e95dc2617","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-06T11:17:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"Revert \"ci(tgi): create images when pushing on current branch\"\n\nThis reverts commit 26e119330f52e46a0b450c7757c77f73fd34cb9b.","shortMessageHtmlLink":"Revert \"ci(tgi): create images when pushing on current branch\""}},{"before":"cf8ef7c549d0817fffd51a296a1e2e171a0bdcd5","after":"30184a1f5a3f63c104fa92bb0e50e2f56bb2aa4d","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-06T11:13:55.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(tgi): add more prefill lenghts\n\nSince bucketing does not work for now, we add more (small) prefill\nlengths. This will increase the warmup time, but it will also allow to\nspeed up generation.","shortMessageHtmlLink":"feat(tgi): add more prefill lenghts"}},{"before":"5d852b1dad526847c144a678ec68b15db3c6d7f7","after":null,"ref":"refs/heads/debug-tgi-ie-pt2","pushedAt":"2024-07-06T11:08:29.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"}},{"before":"246fb24bf548e9fea0c8fbed07dd0ed7420d6fa0","after":"fd295912922515f3110fffda3cdb9c5ad600a905","ref":"refs/heads/main","pushedAt":"2024-07-06T11:08:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"More Inference Endpoints features and fixes (#68)\n\n* feat(generator): better handle exceptions on multiprocessing\r\n\r\nThis will raise an error, signaling there was a problem. Before the\r\nroot thread was getting stuck waiting for the agent that was dead. This\r\nway it should exit.\r\n\r\n* feat(tgi): add more debug on server\r\n\r\n* chore(docker): entrypoint json output is set by default\r\n\r\nIt is possible to disable it by setting JSON_OUTPUT_DISABLE.\r\nIt is now possible also to play with more batch sizes.\r\n\r\n* feat(generator): add bucketing functions to use in prefill\r\n\r\n* feat(generator): store position_id in current slot\r\n\r\nThis will further simplify the implementation of prefill bucketing.\r\n\r\n* fix(generator): correct input_ids and attention_mask padding\r\n\r\n* fix(TGI): fix input truncation\r\n\r\nTruncation was sub-optimal, and it was done on the wrong side.\r\n\r\n* feat(generator): enable logs on children processes\r\n\r\n* feat(tgi): warmup runs prefill/decode on all supported combinations\r\n\r\nThis will prevent XLA compilation at inference time. Note that I had to\r\ndisable dynamo compilation though, otherwise the model was not\r\ngenerating correct results. This leads to slower generation, but at\r\nleast generation seems stable now.\r\n\r\n* ci(tgi): create images when pushing on current branch\r\n\r\nThis will allow for testing IE before release.\r\n\r\n* feat(tgi): reversed loop order in warmup to test memory limits earlier\r\n\r\n* chore(ci): remove image generation for this branch","shortMessageHtmlLink":"More Inference Endpoints features and fixes (#68)"}},{"before":"020458689c33f01278dca91c0ff528d4c72d1e73","after":"5d852b1dad526847c144a678ec68b15db3c6d7f7","ref":"refs/heads/debug-tgi-ie-pt2","pushedAt":"2024-07-05T15:48:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"chore(ci): remove image generation for this branch","shortMessageHtmlLink":"chore(ci): remove image generation for this branch"}},{"before":"3816655df02220eaf43dc4bf34b18afbb719a6fc","after":"cf8ef7c549d0817fffd51a296a1e2e171a0bdcd5","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-05T13:43:55.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(tgi): add more prefill lenghts\n\nSince bucketing does not work for now, we add more (small) prefill\nlengths. This will increase the warmup time, but it will also allow to\nspeed up generation.","shortMessageHtmlLink":"feat(tgi): add more prefill lenghts"}},{"before":null,"after":"3816655df02220eaf43dc4bf34b18afbb719a6fc","ref":"refs/heads/debug-tgi-ie-pt3","pushedAt":"2024-07-05T13:42:54.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"tengomucho","name":"Alvaro Moran","path":"/tengomucho","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6949769?s=80&v=4"},"commit":{"message":"feat(tgi): enable xla cache if possible","shortMessageHtmlLink":"feat(tgi): enable xla cache if possible"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEfc-vigA","startCursor":null,"endCursor":null}},"title":"Activity · huggingface/optimum-tpu"}