We will implement, step-by-step, a shared memory implementation to count the number of triangles adjacent with each node in an undirected, unweighted graph G(V,E), where V is the set of n nodes and E is the set of m edges among the nodes.
In graph theory, the triangle graph is the complete graph K3, consisting of three vertices and three edges. Triangle counting has gained increased popularity in the fields of network and graph analysis. The application of triangle detection, location, and counting are multi-fold: detection of minimal cycles, graph-theoretic clustering techniques, recognition of median, claw-free, and line graphs, and test of automorphism.
(V1). Assume that a graph is represented by its adjacency matrix A[i][j] where the rows and columns correspond to the vertices and A[i][j] == 1 when there exists an edge between nodes i and j and 0 otherwise. In our studies we deal with undirected graphs so A[i][j] == A[j][i].
Implement and run your first attempt to count triangles incident with each node as a triple loop that checks whether A[i][j] == A[j][k] == A[k][i] == 1 and if so, increments the counts on all three nodes c3[i]++; c3[j]++; c3[k]++;
To test your code, generate random graphs, using any way you can come up. It is a bit tricky to generate random graphs with given properties but this is not the purpose of this project, any random graphs will do. See Erdős-Rényi random graphs for instance to get a better idea.
Implement and test your code. Persuade yourselves it is correct, otherwise we cannot go any further!
(V2). There is an obvious limitation in this first approach. We will find the same triangle 6 different times (all permutations of three vertices), so we can make our code 6 times faster by enumerating and checking only i<j<k triplets. Make this correction.
Measure the time it takes to run the two versions above. Use clock_gettime
as described here.
(V3). So far so good, but now the data structure is really limiting our ability to process interesting graphs with millions of nodes and edges (think Facebook mutual friends, for instance).
To remedy this we will use a sparse matrix representation. For this assignment, we will use the most common sparse matrix storage scheme, the Compressed Sparse Columns (CSC) format. When the matrix is symmetric (as in our case), it is equivalent to the Compressed Sparse Row (CSR) format. Both MATLAB and Julia use the CSC format to store sparse matrices.
Now we can use graph datasets from the web. Make sure to download undirected, unweighted graphs. To load the sparse adjacency matrix use the Matrix Market (MM) format. Examples for parsing such files are found at https://math.nist.gov/MatrixMarket/mmio-c.html, e.g., example_read.c
which relies on mmio.h
and mmio.c
for reading any MM matrix.
Rewrite your code as V3 to use the CSC storage scheme and only follow edges. Adjust your input to be able to use large graphs from the web (n≈1,000,000 and m<30,000,000).
Parallelize your V3 code with OpenCilk and OpenMP, run it in parallel and compare sequential and parallel agreement in results and run times for different large datasets (we will propose which ones soon) and produce diagrams to study the scalability of your code.
We are ready to jump a bit further using the mathematical properties of the adjacency matrix A. The i,j element of the matrix-matrix product A×A is equal to the number of possible walks from node i to node j of length exactly 2, i.e. how many k exist that I can go from i to k and then from k to j. If there is also an edge A[i][j], then that is the number of triangles we are after!
Formally, the number of triangles c3 incident with each node is: C3 = (A ⊙ (A×A)) e/2
where ⊙ denotes the Hadamard (or element-wise) product and e is a vector of length n where every element equals 1. We divide the result by 2 because we count both ikj
and ijk
Check the validity of the formula by implementing it in a high-level language, such as MATLAB, Octave, Python, Julia
or whatever else you feel comfortable with.
Using the CSC storage scheme, implement in C/C++
the sparse matrix-vector product C=A×v, where v is a dense vector. Validate your implementation by comparing your results against multiple matrices A and vectors v.
(V4) Implement a routine to compute the masked sparse matrix-matrix product C=A⊙(A×A), where A is a symmetric, sparse matrix. Note: Due to the Hadamard product with A the result will have the same sparsity as the input matrix A. You do not need to compute the values C(i,j), where A(i,j)=0. The number of triangles c3 at each vertex node is C3 = C × e/2.
Caution: A×A will become dense, if there is even one node connected to most of the others. Make sure you use the masking to calculate only the elements of A×A at the nonzero positions of A.
Parallelize your masked sparse matrix-matrix product code (V4) with OpenCilk, OpenMP, and Pthreads run it in parallel and compare sequential and parallel agreement in results and run times for different large datasets (we will propose which ones soon) and produce diagrams to study the scalability of your code. Compare the triangle counting using the masked sparse matrix-matrix product and the V3 code, implemented in the preliminary.
The main project implementation is located in src/c
. Especially for the cilk version, the compiler that is used was g++ 7.5.0
. For that purpose use a gcc version under 8 to compile the program along with the cilkrts
runtime. You probably need to modify the Makefile to adjust the compiler settings. In Arch Linux, I have used the AUR package gcc7 to be compatible with cilk requirements.
All the versions (Pthread, OpenMP, Cilk) have a common main and the separation is achieved using macros. In the main.c
change the MODE
to the corresponding version you want to test. For example MODE == 1
will run all the cilk implementations including v3 and v4. If you want to test a specific function, you may need to comment some of the versions in the main.c
(not very convenient though...).
Each time you are using a new MODE
, you need to use the appropriate target for the make
. For example for MODE == 1
, then you need to run make cilk
, for MODE == 2
, you need to run make openmp
. Then the name of the executable file will be either main
, main_cilk
, main_openmp
, main_pthread
.
To adjust the number of threads for each version, there is a corresponding macro (e.g. NUM_THREADS) that you can edit manually in the source code to the desired number of threads. In the current implementation, we don't use command line arguments to configure the number of threads.
During the program execution, there are print statements for the number of threads for verification.
You need to pass one command line argument: This should be the .mtx
(Matrix Market format) file for the sparse matrix you want to test. For example under the src/c
you could create a new folder matrices
and store the <matrix file name>.mtx
. Then for the cilk version under src/c
you can compile and run the program doing the following:
make cilk
./main_cilk matrices/com-Youtube.mtx
The documentation of the functions is located in the header files, under src/c/include/
.