Removing Italian Stop Words #1860
-
Hi everyone! Thank you in advance and have a nice day |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Generally, you would want to prevent removing stopwords from your raw documents. The only preprocessing they need is removing strange artifacts like HTML code if you for example scraped the data. So information not related to the content of the document. Thus, you generally do not want to remove stopwords before clustering your documents. However, removing stopwords after clustering your documents for the topic representations might be helpful. You can find more about removing stop words directly here or you can use KeyBERTInspired or MMR which do this indirectly. Also, following the best practices guide often helps get a sense of these things. |
Beta Was this translation helpful? Give feedback.
Generally, you would want to prevent removing stopwords from your raw documents. The only preprocessing they need is removing strange artifacts like HTML code if you for example scraped the data. So information not related to the content of the document. Thus, you generally do not want to remove stopwords before clustering your documents.
However, removing stopwords after clustering your documents for the topic representations might be helpful. You can find more about removing stop words directly here or you can use KeyBERTInspired or MMR which do this indirectly.
Also, following the best practices guide often helps get a sense of these things.