KBT's: option for a book-based filter for the KBT's added to the training data. #442
Labels
enhancement
New feature or request
pipeline 3: preprocess
Issue related to preprocessing.
research
Research topics
*** This enhancement request is for research purposes. ***
When KBT's are added to the training data during preprocessing, all of the populated KBTs are included. KBT's from completed books may not be that beneficial since the completed verse text is available, and this verse-level training data is likely more suitable for model fine-tuning. Also, for projects with extensively populated KBT's, including all of these KBT's in the training data may swamp the verse-level training data and skew the model results.
The primary benefit of including KBT's in the training data is intended to be for the improvement of proper name translation for new books, so better new book drafts may be possible by only adding KBT's from new books to the training data. An optional book-based filter for limiting the KBT's added to the training data would allow this strategy to be evaluated.
The text was updated successfully, but these errors were encountered: