It's a course project from Udacity. A stream processing pipeline which performs the real time analysis of top hashtags on Twitter, using Apache Storm.
- Vagrant - virtual environment manager.
- Oracle VM VirtualBox - general purpose virtualizer .
- SSH client, such as PuTTY
- Java 8 or newer version - Otherwise, unsupported major.minor issue will happen.
- Spin up the VM:
vagrant up
- Using SSH client, SSH
vagrant ssh
2.1.cd ..
- Run the visualization web server
3.1. Inside the VM:cd /vagrant/viz
3.2.python app.py
- Package the topology
4.1. Inside the VM (open new SSH session):cd /vagrant
4.2.mvn clean
4.3.mvn package
- may take a while the first time. - Execute the packaged topology
5.1. Inside the VM:cd /vagrant
5.2.storm jar target/storm-twitter-top-hashtags-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.TopNTweetTopology
- Live generated results at
http://127.0.0.1:5000
. - Shutdown the VM:
vagrant halt