Some tips:
The flash is external running on QSPI bus of 4 bits, that's way slower than the internal RAM.
I think XIP cache misses can take more than 32 clk_sys cycles.
Usually it needs address+M field+dummy cycles (12 QSPI cycles) + 64 bit cache line (16 QSPI cycles).
Given the default QMI divider of 3 (from BOOTROM) that means (12+16)*3 = 84 clk_sys cycles for each miss.
If an interrupt occurs during this fetch, it will be stalled too until the fetch is finished.
Can speed up the Flash (worth even for the noncritical core) by tuning QMI divider in boot stage 2 (can include in board definition).
For running that stage must include PICO_EMBED_XIP_SETUP=1 in CMakeLists.txt
For catching unwanted accesses in SWD debugger (I see you use it) can use DWT Watchpoints like here:
https://interrupt.memfault.com/blog/cor ... atchpoints
For cycle counting DWT_CYCCNT can come in handy: https://mcuoneclipse.com/2017/01/30/cyc ... -with-dwt/
The flash is external running on QSPI bus of 4 bits, that's way slower than the internal RAM.
I think XIP cache misses can take more than 32 clk_sys cycles.
Usually it needs address+M field+dummy cycles (12 QSPI cycles) + 64 bit cache line (16 QSPI cycles).
Given the default QMI divider of 3 (from BOOTROM) that means (12+16)*3 = 84 clk_sys cycles for each miss.
If an interrupt occurs during this fetch, it will be stalled too until the fetch is finished.
Can speed up the Flash (worth even for the noncritical core) by tuning QMI divider in boot stage 2 (can include in board definition).
For running that stage must include PICO_EMBED_XIP_SETUP=1 in CMakeLists.txt
For catching unwanted accesses in SWD debugger (I see you use it) can use DWT Watchpoints like here:
https://interrupt.memfault.com/blog/cor ... atchpoints
For cycle counting DWT_CYCCNT can come in handy: https://mcuoneclipse.com/2017/01/30/cyc ... -with-dwt/
Statistics: Posted by gmx — Mon Jan 12, 2026 9:01 pm