Skip to content

Implemented the SON Algorithm using the Apache Spark Framework to find frequent itemsets. Used the A-Priori Algorithm to process each chunk of the data.

Notifications You must be signed in to change notification settings

laidasani/Finding-Frequent-Itemset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Finding-Frequent-Itemset

Overview

In this project, I implemented SON Algorithm using the Apache Spark Framework to find frequent item sets.

One of the major tasks is to find all the possible combinations of the frequent itemsets in a given input file using A-Priori algorithms. The project involves working of SON algorithm on two different datasets, one simulated dataset and one real-world generated dataset.

Apart from input file, 2 separate inputs are provided: • Filter threshold: Integer that is used to filter out qualified users • Support: Integer that defines the minimum count to qualify as a frequent itemset

The steps for finding frequent itemset includes:

  1. Finding the candidates of frequent itemset (as singletons, pairs, triples, etc.) that maybe qualified as frequent given a support threshold (that maps to a frequent bucket).

  2. Calculating the combinations of frequent itemset (as singletons, pairs, triples, etc.) that are actually frequent given a support threshold.

The code is optimized to run efficiently under 500 seconds for support 50 and filter threshold 20. The printed itemsets are sorted in lexicographical order.

About

Implemented the SON Algorithm using the Apache Spark Framework to find frequent itemsets. Used the A-Priori Algorithm to process each chunk of the data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages