TraceQ is a specialized tool designed to provide accurate metrics measurements for Python-based data processing applications.
It integrates with the Linux /proc
filesystem to deliver granular and detailed memory profiling, essential for optimizing resource allocation and improving the efficiency of large-scale computational tasks.
Developed as part of a comprehensive study on memory management in Python, TraceQ is particularly effective in high-performance computing settings where precise memory profiling is critical.
- High accuracy memory profiling using direct measurements from the Linux
/proc
filesystem. - Support of multiple backends for memory profiling, including
psutil
andtracemalloc
. - Granular and detailed memory usage analysis.
- Optimized for data processing tasks.
- Useful in high-performance computing environments for optimizing resource allocation.
To install TraceQ, you can use pip
:
pip install traceq
Alternatively, you can clone the repository and install it manually:
git clone https://github.com/discovery-unicamp/traceq.git
cd traceq
pip install .
TraceQ is designed to be easy to integrate into your existing Python projects. Below are some basic usage examples:
To profile memory usage of a specific function, you can use the profile
decorator provided by TraceQ.
from traceq import profile
@profile
def task(data):
# You function goes here
pass
All the behavior of TraceQ is controlled by a global configuration. Users have multiple options to set and customize this configuration according to their needs:
TraceQ uses a configuration file named traceq.toml
, which should be placed in the root of your project directory.
This file allows you to specify various settings to control the behavior of TraceQ.
You can check all the available options on the traceq.toml file in this repository.
Below is an example of a traceq.toml
configuration file:
Here’s an example of how you can customize some fields in the traceq.toml
file:
output_dir = "./traceq_reports"
[logger]
enabled_transports = "console,file"
level = "debug"
[profiler]
enabled_metrics = "memory_usage"
sign_traces = "true"
precision = "3"
[profiler.memory_usage]
enabled_backends = "psutil,tracemalloc"
In this example, the output directory for reports is changed, logging is enabled to both console and file with a debug level, only the memory usage metric is enabled, trace signing is turned on, and the precision for profiling is increased. Finally, memory usage backends are limited to psutil
and tracemalloc
.
Alternatively, you can load the configuration file at runtime using the load_config
function provided by TraceQ.
This allows you to dynamically inject configuration settings while your application is running.
from traceq import load_config
load_config({
"output_dir": "./traceq_reports",
"logger": {
"enabled_transports": "console,file",
"level": "debug"
},
"profiler": {
"enabled_metrics": "memory_usage",
"sign_traces": "true",
"precision": "3",
"memory_usage": {
"enabled_backends": "psutil,tracemalloc"
}
}
})
You can also set configuration options using environment variables.
All environment variables should be prefixed with TRACEQ_
. This method is useful for dynamically setting configurations without modifying the code or configuration files.
export TRACEQ_OUTPUT_DIR="./traceq_reports"
export TRACEQ_LOGGER_ENABLED_TRANSPORTS="console,file"
export TRACEQ_LOGGER_LEVEL="debug"
export TRACEQ_PROFILER_ENABLED_METRICS="memory_usage"
export TRACEQ_PROFILER_SIGN_TRACES="true"
export TRACEQ_PROFILER_PRECISION="3"
export TRACEQ_PROFILER_MEMORY_USAGE_ENABLED_BACKENDS="psutil,tracemalloc"
This flexibility allows you to tailor TraceQ's behavior to fit the specific requirements of your seismic data processing tasks, ensuring optimal performance and resource utilization.
After the execution of your Python script, TraceQ will generate a report containing all the metrics collected during the execution.
The report will be a .prof
file, which is encoded as a Gzipped Message Pack file.
We are still under development, and we are working on a tool to visualize the reports generated by TraceQ.
We welcome contributions to TraceQ! If you have any ideas, suggestions, or bug reports, please open an issue on the Github repository. If you would like to contribute code, please fork the repository and submit a pull request.
TraceQ is licensed under the MIT License. See the LICENSE file for more details.
This tool was developed as part of a comprehensive study on memory management in Python-based seismic data processing applications, conducted by Daniel L. Fonseca and Edson Borin at the Institute of Computing, Unicamp, Brazil. Special thanks to Petrobras for their support and collaboration.