Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



31 Commits

Repository files navigation

Song and Dance Man

author : Jatan Pandya  

A corpus based analysis on the works of Bob Dylan from the year 1960 to 2020


This project aims to analyze works of artist Bob Dylan by diving deep into his works that includes all songs (album releases, singles/EPs, unreleased/outtakes/demos), speeches etc.

In order to achieve the same, following ideas are explored : 

Surface Analysis :

1. Text cleaning techniques (Lemmatization, normalization, stop word, punctuation removal etc.)

2. Visualizing Dylan's Corpus (Word Cloud)

3. Total number of words in corpus (tokens) and Dylan's vocabulary (types)

4. Dylan's Lexical diversity

5. The Largest word in Dylan's corpus

6. Average Word Length

7. Calculating and visualizing word occurances (Dispersion Plots)

7. Word Frequency

8. Hapax legomena (Hapaxes in Dylan's corpus)

9. Visualizing Zipfs distribution 

10. Keywords In Dylan's Corpus (TF-IDF)

In depth Analysis :

1. Common Collocations by Dylan (Bigrams/Trigrams/4,5-grams)

2.   Which year saw the "lengthiest" album by Dylan? (Total number of words by year)

3. Has Dylan stopped asking questions? ("?" occurrences through the years)

4. I, Me, Myself : Changing point of views through the years (Conditional FreqDist on First and second/third pronoun shift)

5. Recurring motifs and ideas in Dylan's Songs (Parts-of-speech Tagging and Synsets)

6. Paul Revere in Dylan's Corpus? Who else? Finding People/locations/events etc. mentioned in Dylan's songs (Entity recognition)

7. Are Dylan's songs losing complexity as the years go by? (Sentence segmentation, Automated Readability Index (ARI))

8. Which years were sad for Dylan and vice versa? (Sentiment Analysis)

Bonus : 2:14 at