Skip to content

Analyze tweets related to diseases using data mining techniques to derive insights and patterns.

Notifications You must be signed in to change notification settings

Mariaorabi/Data-Mining-Disease-Tweets-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Disease Tweets Analysis

Introduction

In this project, I aim to analyze tweets related to four diseases: AIDS/HIV, cancer, Corona (COVID-19), and diabetes. The analysis involves preprocessing the tweets, extracting key information, and deriving insights through natural language processing (NLP) techniques.

Data Files

  • Four files containing tweets related to the diseases, each categorized by specific keywords.

Preprocessing

  • Remove user mentions and the word "LINK@" from the tweets.
  • Preserve social media symbols as individual words to avoid separation into meaningless signs.

Parts of Speech Analysis

a. Grammatical Analysis using Spacy

  • Utilize the Spacy package for grammatical analysis.
  • Ignore stop words, perform lemmatization, and filter words based on the English dictionary.

b. Most Common Words

  • Report the 20 most common words for each disease.
  • Discuss the relevance and value of the results.

c. Most Common Adjective Words

  • Identify the 20 most common adjective words for each disease.
  • Evaluate the logical and meaningful aspects of the results.

d. Most Common Verbs

  • Extract the 20 most common verbs for each disease.
  • Provide insights into the logical and meaningful implications of the results.

e. Most Common Noun Words

  • Analyze the 20 most common noun words for each disease.
  • Discuss the sense and meaningfulness of the results.

Parsing

a. Dependency Parsing Function

  • Implement a function for checking words directly related to the disease names in tweets.
  • Convert verbs to their base forms (lemmas) and summarize the 10 common verbs along with their relative frequency for each disease.
  • Discuss differences between diseases and compare with results from section d of question 2.

b. Most Common Adjective Heads for Disease Names

  • Find the adjectives for which the disease name is the most common head.
  • Discuss differences between diseases and compare with results from section c of question 2.

Conclusion

  • Summarize key findings from the analysis.
  • Reflect on any variations between diseases and the reasons behind them.

How to Submit

  • The code must be submitted in a Jupyter notebook file.

About

Analyze tweets related to diseases using data mining techniques to derive insights and patterns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages