Skip to content

BeyondTheClouds/enoslib-paper

Repository files navigation

EnosLib TPDS Codes

EnosLib is a Python library that helps researchers in developing their experimental artifacts and executing these artifacts over different infrastructures.

This repository contains programs from the EnosLib paper for TPDS 2020.

Initial Setup

Programs in this repository assume Python 3.8 and rely on pipenv to install EnosLib with its dependencies, and execute code snippets. It also requires Vagrant (>= 2.2.5) with VirtualBox to setup an infrastructure.

After installing pipenv and Vagrant, do the following to setup the environment:

$ git clone --single-branch --branch tpds-2020 --depth 1 https://github.com/BeyondTheClouds/enoslib-paper.git
$ cd enoslib-paper/
$ pipenv install

Test Programs

Pay attention to LOGs while you execute programs to get insights about the program execution.

Fig. 3 (link)

pipenv run python fig3.py

Fig. 3 uses the declarative resources description of 5rdbms2sys-vagrant.yaml to start the infrastructure. The description tells to provision with Debian 10 3 database machines, 2 database/client machines, 1 database network and 1 monitor network.

Infra requirements: 3 CPUs, 2.5Go RAM

Fig. 4 (link)

pipenv run python fig4.py

Fig. 4 defines a setup_galera function that relies on Roles to contextualize machines differently. It installs a MariaDB Galera Cluster on one list of Hosts, and Sysbench on another one.

A Host is an abstract notion of unit of computation that can be bound to bare-metal machines, virtual machines or containers. A Role is a label that identifies Hosts and Networks that share the same behavior. Thanks to roles, an experimenter can define codes that will only be executed on specific resources.

Executing the fig6.py provisions 2 database machines, and 2 database/client machines. It then executes the setup_galera function that installs MariaDB and Galera on the 4 database machines, and Sysbench on the 2 client machines.

Infra requirements: 4 CPUs, 2Go RAM

Fig. 5 (link)

pipenv run sudo python fig5.py

Fig. 5 defines an analyze_galera function that analyzes the Galera protocol of a Network.

A Network is an abstraction for a network that can be bound to a flat network, a virtual (extensible) LAN, …

Executing the fig5.py, first, provisions 2 machines and 1 private network on VirtualBox. Then, it installs and setups Galera with an Ansible file. Especially, that file configures Galera to use the private network previously provisioned. Finally, the Python script executes the analyze_galera function. This function builds a specific pcap filter using the CIDR of the Network in argument. The analyze requires libpcap and tcpdump under the hood, and so needs to be run with sudo, unless your machine is configured properly. If the libpcap is not linked properly, you may run the script as follows.

pipenv run sudo LD_LIBRARY_PATH=path/to/libpcap/lib python fig5.py

Vagrant implements private Network in VirtualBox with the *host-only* adapter. In such mode, the traffic between VMs never leaves VBox. Hence, sniffing from the Host machine is impossible. To bypass this limitation, this example use the Libvirt backend. Someone could also do ‘sudo vagrant ssh enos-0-1’ and execute the following tcpdump there.

tcpdump -i eth1 'tcp and port 4567'

Infra requirements: 2 CPUs, 1Go RAM

Fig. 6 (link)

pipenv run python fig6.py

Fig. 6 exploits Ansible’s modules to efficiently define a reusable monitoring function.

So far, programs use the EnosLib run function to contextualize resources. This is an easy to use function that performs shell commands on machines, but lacks expressivity when it is time to develop complex artifacts. This code example uses modules which are reusable scripts from Ansible’s modules to define a generic function called monitor.

The monitor function installs a TIG stack that:

  1. Collects metrics on monitored machines with Telegraf.
  2. Stores and shows metrics on aggregator machines with InfluxDB and Grafana.
  3. Uses the monitor network to transport metrics from monitored to aggregator.

Executing the fig7.py provisions 2 database/monitored machines, 2 database/client/monitored machines, 1 aggregator machine, 1 database network and 1 aggregator network. It then executes the monitor function that installs a TIG stack. The LOG output an URL to see collected metrics in Grafana.

This example also shows that Roles break apart the code into units that are responsible for one behavior. This make these units easy to share and reuse. In this regard, the monitor function is an excerpt of the monitoring service from EnosLib.

Infra requirements: 5 CPUs, 2.5Go RAM

Fig. 2 (link)

pipenv run python fig2.py

Fig. 2 defines an experiment function that:

  1. acquires resources on testbed
  2. installs the distributed RDBMS and third party softwares, and sets latency between RDBMS nodes
  3. benchmarks the distributed RDBMS
  4. collects/backups all logs

Executing the fig4.py calls the experiment function with the VirtualBox provider and a code to install Galera. It provisions 4 machines for the distributed RDBMS and 2 private networks: one for RDBMS communication, and one for monitoring.

Infra requirements: 4 CPUs, 2Go RAM

Fig. 10 (link)

pipenv run python fig10.py

Fig. 10 is an example of an artifact with plenty of parameters to test. It divides the artifact workflow in 4 phases: deploy, bench, backup, and destroy, to iterate over the set of parameters. It relies on the existence of a sweeper that persists the information whether the current iteration is successful (done) or needs to be retried (skip).