Hello everyone,
I am developing an object detection project on a Raspberry Pi 5.
The camera side (Picamera2 / libcamera) works perfectly fine: when running the camera alone, RAM usage is completely stable over time.
However, when I move to the inference stage using OpenVINO, I encounter a serious issue.
Problem summary
Even when:
no camera frames are used
no image preprocessing is performed
no NumPy arrays are allocated inside the loop
no OpenCV operations are executed
Simply calling infer() repeatedly causes RSS memory to increase continuously, at a rate of tens of MB per second.
This strongly suggests a memory leak or unreleased memory inside OpenVINO on ARM, rather than an issue in user code.
Environment
Device: Raspberry Pi 5 (8GB)
OS: Raspberry Pi OS (64-bit)
Architecture: ARM64 (aarch64)
Python: 3.13
OpenVINO: 2025.4.1
Model: YOLO (OpenVINO IR, static shape 1×3×640×640)
Inference device: CPU
The same logic appears to work on a desktop PC, but the issue is likely hidden due to higher available RAM.
What I already ruled out
Camera buffers (Picamera2 / libcamera tested separately)
Image preprocessing (disabled completely)
NumPy allocations (all buffers pre-allocated)
OpenCV operations (not used in isolation test)
Python GC (manual gc.collect() does not help)
Test methodology
To isolate the problem, I wrote a minimal reproducible example that:
loads the model once
forces a static input shape
creates a single InferRequest
reuses input/output tensors
repeatedly calls infer() in a tight loop
monitors RSS memory via psutil
Even in this pure inference-only loop, RAM usage increases continuously.
I am developing an object detection project on a Raspberry Pi 5.
The camera side (Picamera2 / libcamera) works perfectly fine: when running the camera alone, RAM usage is completely stable over time.
However, when I move to the inference stage using OpenVINO, I encounter a serious issue.
Problem summary
Even when:
no camera frames are used
no image preprocessing is performed
no NumPy arrays are allocated inside the loop
no OpenCV operations are executed
Simply calling infer() repeatedly causes RSS memory to increase continuously, at a rate of tens of MB per second.
This strongly suggests a memory leak or unreleased memory inside OpenVINO on ARM, rather than an issue in user code.
Environment
Device: Raspberry Pi 5 (8GB)
OS: Raspberry Pi OS (64-bit)
Architecture: ARM64 (aarch64)
Python: 3.13
OpenVINO: 2025.4.1
Model: YOLO (OpenVINO IR, static shape 1×3×640×640)
Inference device: CPU
The same logic appears to work on a desktop PC, but the issue is likely hidden due to higher available RAM.
What I already ruled out
Camera buffers (Picamera2 / libcamera tested separately)
Image preprocessing (disabled completely)
NumPy allocations (all buffers pre-allocated)
OpenCV operations (not used in isolation test)
Python GC (manual gc.collect() does not help)
Test methodology
To isolate the problem, I wrote a minimal reproducible example that:
loads the model once
forces a static input shape
creates a single InferRequest
reuses input/output tensors
repeatedly calls infer() in a tight loop
monitors RSS memory via psutil
Even in this pure inference-only loop, RAM usage increases continuously.
Code:
import osimport globimport timeimport gcimport psutilimport numpy as npimport openvino.runtime as ovMODEL_DIR = "yolo11n_openvino_model"INPUT_W, INPUT_H = 640, 640def get_rss_mb(): process = psutil.Process(os.getpid()) return process.memory_info().rss / 1024 / 1024class YoloInferenceTest: def __init__(self, model_dir): self.core = ov.Core() xml_files = glob.glob(os.path.join(model_dir, "*.xml")) if not xml_files: raise FileNotFoundError("Model XML not found") model = self.core.read_model(xml_files[0]) model.reshape([1, 3, INPUT_H, INPUT_W]) self.compiled_model = self.core.compile_model(model, "CPU") self.infer_request = self.compiled_model.create_infer_request() self.input_tensor = self.infer_request.get_input_tensor() self.input_data = self.input_tensor.data self.input_data[:] = 0.0 def infer_only(self): self.infer_request.infer() _ = self.infer_request.get_output_tensor().data[0, 0, 0]def main(): yolo = YoloInferenceTest(MODEL_DIR) frames = 0 start = time.time() while True: try: yolo.infer_only() frames += 1 if frames % 30 == 0: rss = get_rss_mb() elapsed = time.time() - start fps = frames / elapsed print(f"T={elapsed:.0f}s | FPS={fps:.1f} | RSS={rss:.1f} MB") if frames % 100 == 0: gc.collect() except KeyboardInterrupt: breakif __name__ == "__main__": main()Statistics: Posted by kwostra — Fri Jan 16, 2026 9:54 pm