In this project, I have explored and compared text preprocessing and feature selection methods among word count, character count, bag of words, removing stop words, stemming, and Lemmatization. Logistic regression classifier is used to detect ham or spam SMS messages. The dataset is a collection of 5,574 text messages in English, taggled according being ham (legitimate) or spam. The originate dataset can be found at https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.
-
Notifications
You must be signed in to change notification settings - Fork 1
haojing9058/SMS-Spam-Classification
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description or website provided.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published