Skip to content
Richa Vinian edited this page Jul 7, 2024 · 47 revisions

Development Setup

Running the code

Terminologies

Fawkes is primarily designed around user reviews and the analysis on the user reviews. The terminologies therefore revolve around that. That being said, Fawkes can be used on any text based data set which require sentiment analysis, categorization or summarization.

Review

Any piece of feedback that the user leaves behind is what we call as a review. The most basic form of a review is a JSON object with a message and a time_stamp

Below is an example of how a user review might look like. Here the field content is the message and updated is the time_stamp

{
    "updated": "2020-03-15 14:13:17",
    "rating": 5,
    "version": "7.1.0",
    "content": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still can\u00e2\u20ac\u2122t add all of my financial institutions so my budget is kind of skewed. But other that I can say I\u00e2\u20ac\u2122m more aware of my spending"
}

A review can have as many additional fields and Fawkes doesn't care much about them.

NOTE: The additional fields are however preserved throughout the life cycle of a review so that the data is not lost.

Channel

A channel is any source from where a bunch of user review's can be obtained. Most channels integrated into Fawkes have an API endpoint exposed through which data is fetched. The currently supported channel in Fawkes are:

Sentiment

Sentiment of a review tells us what was the user emotion which is embedded in text. We use the popular Natural Language Processing library, NLTK Vader to do the sentiment analysis.

Sentiment output returned looks like this:

{
    "neg": 0.0,
    "neu": 0.928,
    "pos": 0.072,
    "compound": 0.4767
}

The compound value tells us the overall sentiment.

  • Compound > 0 means Positive Review 😊
  • Compound < 0 means Negative Review πŸ™
  • Compound = 0 means Neutral Review 😐

Category

A review can be classified/bucketed/grouped into different categories based on what the user is talking about. This is one of the core elements of analyzing user reviews as it helps to get insights on what pain points the users have and the current trend of the application itself.

Parsed Review

Fawkes starts with a review in the raw format and then goes through a series of transformations. The first step is to parse the user review and convert it to a single class object of type (Review).

Processed Review

After a review is parsed and converted to Review its ready to be run through different algorithms like:

  • Sentiment analysis
  • Categorization
  • Summarization

All the algorithms run only on the parsed-data.

Fawkes Config

Fawkes requires a configuration file to tell it about the different levers which can be configured. Below is the list of all configuration items.

Kindly look at the Sample Mint Config File

Fawkes Pipeline

Fetching Data

python fawkes/cli/cli.py fetch

Parsing Data

python fawkes/cli/cli.py parse

Parsing converts the raw data to a single format consumed by all further steps. We call it Review. See the below diff to understand what happens in the parsing step:

--- a/data/raw_data/sample-mint/appstore-raw-feedback.json
+++ b/data/parsed_data/sample-mint/parsed-user-feedback.json
@@ -1,8 +1,16 @@
 [
     {
-        "updated": "2020-03-15 14:13:17",
+        "message": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still cant add all of my financial institutions so my budget is kind of skewed. But other that I can say Im more aware of my spending",
+        "timestamp": "2020/03/15 14:13:17",
         "rating": 5,
-        "version": "7.1.0",
-        "content": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still can\u00e2\u20ac\u2122t add all of my financial institutions so my budget is kind of skewed. But other that I can say I\u00e2\u20ac\u2122m more aware of my spending"
+        "app_name": "sample-mint",
+        "channel_name": "appstore",
+        "channel_type": "ios",
+        "hash_id": "de848685d11742dbea77e1e5ad7b892088ada9c9",
+        "derived_insight": {
+            "sentiment": null,
+            "category": "uncategorized",
+            "extra_properties": {}
+        }
     }
 ]

Things to note:

  • message and time_stamp have been added. This is the most important thing.
  • Since the review came from App. Store, the channel_type key has been added
  • A unique hash of (message + timestamp) has been added

Run algorithms

python fawkes/cli/cli.py run.algo

Post Parsing, one can run a number of algorithms in Fawkes. The 2 which run by default are:

  • Sentiment Analysis
  • Categorization

See the below diff to understand what happens in the algorithms step:

--- a/data/parsed_data/sample-mint/parsed-user-feedback.json
+++ b/data/processed_data/sample-mint/processed-user-feedback.json
@@ -6,11 +6,25 @@
         "app_name": "sample-mint",
         "channel_name": "appstore",
         "channel_type": "ios",
-        "hash_id": "de848685d11742dbea77e1e5ad7b892088ada9c9",
+        "hash_id": "6dde3aa82726c0a9e3777623854d839184767571",
         "derived_insight": {
-            "sentiment": null,
-            "category": "uncategorized",
-            "extra_properties": {}
+            "sentiment": {
+                "neg": 0.0,
+                "neu": 0.928,
+                "pos": 0.072,
+                "compound": 0.4767
+            },
+            "category": "Application",
+            "extra_properties": {
+                "category_scores": {
+                    "User Experience": 0,
+                    "sign-in/sign-up": 0,
+                    "Notification": 0,
+                    "Application": 1,
+                    "ads": 0
+                },
+                "bug_feature": "feature"
+            }
         }
     }
 ]

Things to note:

  • sentiment has been added
  • category has been added
    • The score of the review against each category also is present
  • The review has been classified as a bug/feature/user-experience

Configuring Categorization

Fawkes provides categorization in 2 variants.

Text Match

Text Match uses keywords to determine which review gets categorized into which category. See the category-keywords.json of how a keywords file looks like for a generic application.

python fawkes/cli/cli.py generate.text_match.keywords

Now you are ready to have your reviews to your custom categories.

Deep Learning based Classification

The problem with user reviews is that its incredibly difficult to get labelled data. Text Match is an easy way to generate labelled data. Once we have enough labelled data, we can use fawkes/algorithms/categorisation/lstm/trainer.py

The module lstm_classifier can be used to train data using the LSTM's. Use multi-class-text-classification-with-lstm-using-tensorflow as a reference.

To use the trained models from the above step, modify the algorithm_config.categorization_algorithm in the config file.

Storing Data

python fawkes/cli/cli.py push.elasticsearch

After parsing and running algorithms, all the data is pushed to Elastic Search for advanced searching and indexing capabilities.

Data Viz

For visualizing and running queries on the data we use Kibana.

CircleCI Integration

Glossary