Skip to content

Removing Italian Stop Words #1860

Discussion options

You must be logged in to vote

Generally, you would want to prevent removing stopwords from your raw documents. The only preprocessing they need is removing strange artifacts like HTML code if you for example scraped the data. So information not related to the content of the document. Thus, you generally do not want to remove stopwords before clustering your documents.

However, removing stopwords after clustering your documents for the topic representations might be helpful. You can find more about removing stop words directly here or you can use KeyBERTInspired or MMR which do this indirectly.

Also, following the best practices guide often helps get a sense of these things.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@AlessandroZanotta98
Comment options

Answer selected by AlessandroZanotta98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants