Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 8026

Other • Raspberry Pi 5 (ARM64) + OpenVINO: Severe RAM growth when calling infer() in a loop (even without preprocessing)

$
0
0
Hello everyone,

I am developing an object detection project on a Raspberry Pi 5.
The camera side (Picamera2 / libcamera) works perfectly fine: when running the camera alone, RAM usage is completely stable over time.

However, when I move to the inference stage using OpenVINO, I encounter a serious issue.

Problem summary

Even when:

no camera frames are used

no image preprocessing is performed

no NumPy arrays are allocated inside the loop

no OpenCV operations are executed

Simply calling infer() repeatedly causes RSS memory to increase continuously, at a rate of tens of MB per second.

This strongly suggests a memory leak or unreleased memory inside OpenVINO on ARM, rather than an issue in user code.

Environment

Device: Raspberry Pi 5 (8GB)

OS: Raspberry Pi OS (64-bit)

Architecture: ARM64 (aarch64)

Python: 3.13

OpenVINO: 2025.4.1

Model: YOLO (OpenVINO IR, static shape 1×3×640×640)

Inference device: CPU

The same logic appears to work on a desktop PC, but the issue is likely hidden due to higher available RAM.

What I already ruled out

Camera buffers (Picamera2 / libcamera tested separately)

Image preprocessing (disabled completely)

NumPy allocations (all buffers pre-allocated)

OpenCV operations (not used in isolation test)

Python GC (manual gc.collect() does not help)

Test methodology

To isolate the problem, I wrote a minimal reproducible example that:

loads the model once

forces a static input shape

creates a single InferRequest

reuses input/output tensors

repeatedly calls infer() in a tight loop

monitors RSS memory via psutil

Even in this pure inference-only loop, RAM usage increases continuously.

Code:

import osimport globimport timeimport gcimport psutilimport numpy as npimport openvino.runtime as ovMODEL_DIR = "yolo11n_openvino_model"INPUT_W, INPUT_H = 640, 640def get_rss_mb():    process = psutil.Process(os.getpid())    return process.memory_info().rss / 1024 / 1024class YoloInferenceTest:    def __init__(self, model_dir):        self.core = ov.Core()        xml_files = glob.glob(os.path.join(model_dir, "*.xml"))        if not xml_files:            raise FileNotFoundError("Model XML not found")        model = self.core.read_model(xml_files[0])        model.reshape([1, 3, INPUT_H, INPUT_W])        self.compiled_model = self.core.compile_model(model, "CPU")        self.infer_request = self.compiled_model.create_infer_request()        self.input_tensor = self.infer_request.get_input_tensor()        self.input_data = self.input_tensor.data        self.input_data[:] = 0.0    def infer_only(self):        self.infer_request.infer()        _ = self.infer_request.get_output_tensor().data[0, 0, 0]def main():    yolo = YoloInferenceTest(MODEL_DIR)    frames = 0    start = time.time()    while True:        try:            yolo.infer_only()            frames += 1            if frames % 30 == 0:                rss = get_rss_mb()                elapsed = time.time() - start                fps = frames / elapsed                print(f"T={elapsed:.0f}s | FPS={fps:.1f} | RSS={rss:.1f} MB")                if frames % 100 == 0:                    gc.collect()        except KeyboardInterrupt:            breakif __name__ == "__main__":    main()

Statistics: Posted by kwostra — Fri Jan 16, 2026 9:54 pm



Viewing all articles
Browse latest Browse all 8026

Trending Articles