You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to optimize some things for more efficient processing of bed files through our tokenizers and models in actual production environments (like bedbase).
One bottleneck I encounter is creating tensors from lists of integers. I explain more detail in a PR over in geniml but, briefly, the current tokenizers are only capable of returning lists of integers for tokenized BED files. It could be more efficient to emit a Tensor directly. I think that this is possible using some combination of the following rust crates:
With this, users can just return a torch.Tensor object directly and there is no need to convert between types -- potentially saving time. Additionally, we could offer options for returning np.array objects with rust-numpy.
The text was updated successfully, but these errors were encountered:
I'm trying to optimize some things for more efficient processing of bed files through our tokenizers and models in actual production environments (like bedbase).
One bottleneck I encounter is creating tensors from lists of integers. I explain more detail in a PR over in
geniml
but, briefly, the current tokenizers are only capable of returning lists of integers for tokenized BED files. It could be more efficient to emit aTensor
directly. I think that this is possible using some combination of the following rust crates:tch
tch-ext
pyo3-tch
With this, users can just return a
torch.Tensor
object directly and there is no need to convert between types -- potentially saving time. Additionally, we could offer options for returningnp.array
objects withrust-numpy
.The text was updated successfully, but these errors were encountered: