Gioel Gianni, "Dynamic analysis for Python", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2025
https://doi.org/10.26233/heallink.tuc.105001
Dynamic program analysis is a critical technique for understanding, monitoring, and controlling the behavior of applications during runtime. This thesis presents Cerbex, a new dynamic analysis framework specifically targeting the Python programming language. The tool leverages Python’s import mechanism to interpose monitoring code during library loading, inserting hooks and wrappers around functions without modifying the source code. By combining instrumentation at the Python module import level and with the use of the sys.setprofile mechanism for C extensions, Cerbex provides coverage across both Python code and low-level libraries such as math, numpy, and pandas.The core of Cerbex supports two distinct modes of operation: in learning mode, it records execution events and module dependencies into files such as events.json and dependencies.json, which are then combined to produce allowlist.json. In enforcement mode, this consolidated file is used to prevent unauthorized imports and function calls in real time. In parallel, the analyses generate additional logs, including perf.log for performance and types.log for return types. The tool’s architecture is modular, based on pluggable analyzers, enabling easy extension with new forms of control. For instance, the PerfAnalyzer logs function execution times, while the TypeExtractor collects return types—demonstrating the flexibility of the approach.The evaluation of Cerbex shows that the tool introduces a noticeable relative overhead in small-scale or short-lived executions, where the fixed cost of the instrumentation mechanism dominates. In contrast, in more complex applications, large frameworks, and computationally intensive scenarios, the relative overhead decreases significantly, allowing Cerbex to perform particularly well. In such environments, the ability to provide visibility, enforce policies, and achieve in-depth understanding of execution is combined with satisfactory performance, making the tool well-suited for medium and heavy workloads.