Skip to content

A precise profiler for Python, optimized for data processing tasks in high-performance computing. Capable of sampling with metadata, using minimal instrumentation.

License

Notifications You must be signed in to change notification settings

discovery-unicamp/traceq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TraceQ

TraceQ is a specialized tool designed to provide accurate metrics measurements for Python-based data processing applications. It integrates with the Linux /proc filesystem to deliver granular and detailed memory profiling, essential for optimizing resource allocation and improving the efficiency of large-scale computational tasks. Developed as part of a comprehensive study on memory management in Python, TraceQ is particularly effective in high-performance computing settings where precise memory profiling is critical.

Features

  • High accuracy memory profiling using direct measurements from the Linux /proc filesystem.
  • Support of multiple backends for memory profiling, including psutil and tracemalloc.
  • Granular and detailed memory usage analysis.
  • Optimized for data processing tasks.
  • Useful in high-performance computing environments for optimizing resource allocation.

Installation

To install TraceQ, you can use pip:

pip install traceq

Alternatively, you can clone the repository and install it manually:

git clone https://github.com/discovery-unicamp/traceq.git
cd traceq
pip install .

Usage

TraceQ is designed to be easy to integrate into your existing Python projects. Below are some basic usage examples:

Profiling a Python Function

To profile memory usage of a specific function, you can use the profile decorator provided by TraceQ.

from traceq import profile

@profile
def task(data):
    # You function goes here
    pass

Configuration

All the behavior of TraceQ is controlled by a global configuration. Users have multiple options to set and customize this configuration according to their needs:

Configuration File

TraceQ uses a configuration file named traceq.toml, which should be placed in the root of your project directory. This file allows you to specify various settings to control the behavior of TraceQ. You can check all the available options on the traceq.toml file in this repository. Below is an example of a traceq.toml configuration file:

Example Customization

Here’s an example of how you can customize some fields in the traceq.toml file:

output_dir = "./traceq_reports"

[logger]
enabled_transports = "console,file"
level = "debug"

[profiler]
enabled_metrics = "memory_usage"
sign_traces = "true"
precision = "3"

[profiler.memory_usage]
enabled_backends = "psutil,tracemalloc"

In this example, the output directory for reports is changed, logging is enabled to both console and file with a debug level, only the memory usage metric is enabled, trace signing is turned on, and the precision for profiling is increased. Finally, memory usage backends are limited to psutil and tracemalloc.

Runtime Configuration

Alternatively, you can load the configuration file at runtime using the load_config function provided by TraceQ. This allows you to dynamically inject configuration settings while your application is running.

from traceq import load_config

load_config({
    "output_dir": "./traceq_reports",
    "logger": {
        "enabled_transports": "console,file",
        "level": "debug"
    },
    "profiler": {
        "enabled_metrics": "memory_usage",
        "sign_traces": "true",
        "precision": "3",
        "memory_usage": {
            "enabled_backends": "psutil,tracemalloc"
        }
    }
})

Environment Variables

You can also set configuration options using environment variables. All environment variables should be prefixed with TRACEQ_. This method is useful for dynamically setting configurations without modifying the code or configuration files.

Example Environment Variables

export TRACEQ_OUTPUT_DIR="./traceq_reports"
export TRACEQ_LOGGER_ENABLED_TRANSPORTS="console,file"
export TRACEQ_LOGGER_LEVEL="debug"
export TRACEQ_PROFILER_ENABLED_METRICS="memory_usage"
export TRACEQ_PROFILER_SIGN_TRACES="true"
export TRACEQ_PROFILER_PRECISION="3"
export TRACEQ_PROFILER_MEMORY_USAGE_ENABLED_BACKENDS="psutil,tracemalloc"

This flexibility allows you to tailor TraceQ's behavior to fit the specific requirements of your seismic data processing tasks, ensuring optimal performance and resource utilization.

Report

After the execution of your Python script, TraceQ will generate a report containing all the metrics collected during the execution. The report will be a .prof file, which is encoded as a Gzipped Message Pack file.

We are still under development, and we are working on a tool to visualize the reports generated by TraceQ.

Contributing

We welcome contributions to TraceQ! If you have any ideas, suggestions, or bug reports, please open an issue on the Github repository. If you would like to contribute code, please fork the repository and submit a pull request.

License

TraceQ is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgments

This tool was developed as part of a comprehensive study on memory management in Python-based seismic data processing applications, conducted by Daniel L. Fonseca and Edson Borin at the Institute of Computing, Unicamp, Brazil. Special thanks to Petrobras for their support and collaboration.

About

A precise profiler for Python, optimized for data processing tasks in high-performance computing. Capable of sampling with metadata, using minimal instrumentation.

Topics

Resources

License

Stars

Watchers

Forks