Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KBT's: option for a book-based filter for the KBT's added to the training data. #442

Open
mmartin9684-sil opened this issue Jul 5, 2024 · 0 comments
Labels
enhancement New feature or request pipeline 3: preprocess Issue related to preprocessing. research Research topics

Comments

@mmartin9684-sil
Copy link
Collaborator

*** This enhancement request is for research purposes. ***

When KBT's are added to the training data during preprocessing, all of the populated KBTs are included. KBT's from completed books may not be that beneficial since the completed verse text is available, and this verse-level training data is likely more suitable for model fine-tuning. Also, for projects with extensively populated KBT's, including all of these KBT's in the training data may swamp the verse-level training data and skew the model results.

The primary benefit of including KBT's in the training data is intended to be for the improvement of proper name translation for new books, so better new book drafts may be possible by only adding KBT's from new books to the training data. An optional book-based filter for limiting the KBT's added to the training data would allow this strategy to be evaluated.

@mmartin9684-sil mmartin9684-sil added enhancement New feature or request pipeline 3: preprocess Issue related to preprocessing. research Research topics labels Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pipeline 3: preprocess Issue related to preprocessing. research Research topics
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant