Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4947

General • Re: RP2350 PIO DMA performance question

$
0
0
Would have been good if there were more instructions like JMP PIN, which not only wait but can just run another part of the code while waiting. I guess the waiting instructions like IRQ WAIT and WAIT have faster response than a JMP, but one can make a similar wait by a JMP PIN that jumps to itself and continue when the condition is not met. So if JMP could be optimized (if it isn't already) to execute in one cycle the check and jump, I don't see much WAIT type of instructions any better. Though it may be just an instruction encoding limitation, as JMP needs a target address while WAIT type instr. don't.

Another thing that comes to my mind - would have been good if a state machine (SM) could be controlled by another SM, without former having to run special instructions. For example, in this way, the first SM could have only IN or OUT instructions, achieving single cycle IO while a second SM could be checking branch conditions and controlling the first SM. If one SM could control two, then this would easily enable to have one SM do IN instr, another OUT and the third one controlling both of them. (Of course whether the DMAC can do 2x32 bit words DMA in the same cycle is something I don't know, and it is unlikely both a read and a write in the same cycle are ever necessary.)
Actually, doing an OUT and IN operation on the same cycle is necessary sometimes, because due to the input delay, if running in a two cycle loop, that may be the only way of getting the IN data on the other cycle of the loop that the data change caused by the OUT instruction happens. For example, for a fast SPI, running at 125MHz / 2 = 62.5MHz or a bit more at 133MHz. And yes, I know the RP2040 and new one can be overclocked a lot, but can anybody guarantee that that they will all work correctly and for long enough, far beyond the specified maximum frequency and voltage?
Using separate SMs for two cycles of input or output would make the data fragmented, and I don't think the DMAC supports skipping some amount of data after every byte/halfWord/word transfer.

A (perhaps simpler) alternative to the above would have been to have two or more SMs have their PCs locked together (with a certain offset). (In fact, separating an instruction word to fields to branches (single such field), in and out operations and arithmetic operations (multiple such fields) usually enables much increased performance, and ability to do those in parallel.) But the point is that, having multiple SMs sync together with IRQ instructions can work well for most cases, but in some cases if it was simply possible one SM to execute the same addresses, but offset from another SM, would mean that only one SM would have to check conditions and branch and one or more other SMs could do actual IO and other operations. Something like this is already done with sideSets but of course that is limited.
The current alternative to this, is making the SM wait at a condition to start, and then enter a tight loop and then be stopped by CPU code. But is not that great because the CPU code to reset a SM and set starting address again and so on still takes time.

Sadly, because WAITs are not branches, it is not possible to use such to exit a .wrap loop. If they could instead have option to jump one instr later instead of continuing with the next instr., this would be enough for example.
JMP !OSRE is another very powerful instruction, because the OSR data amount can give clear distinction between the data for a word cycle and for a bit cycle. Sadly there is no such option for ISR. So the only way to exit an ISR loop is to have a counter count the bits. But both could be simplified, if the IN and OUT instructions had an option to jump an instruction lower (two instr ahead and not one as usual, and override a .wrap), when OSR is full or ISR is empty.
This again shows how much more powerful an IO CPU becomes when the branching and IO/arithmetic operations are separated and enabled to be done in parallel.
Of course, whether anything like this is possible, I have no idea. But positing anyway.
Just I see it as funny, that every time I start writing a PIO program I end-up writing at least 5-7 variants of it and all are not quite good, and yet so close.

Also the reaction to external signals is a bit limited. JMP PIN is perhaps the most powerful instruction, and after it is the MOV PC, PINS (or IN PINS -> MOV... -> MOV PC, ..., but they usually need a additional instructions between them to move/format/multiply(shift)/offset the value, so their reaction speed is much lower, and they affect ISR and/or OSR). But that is about it. If a PIO program needs to check two external lines, there is only one JMP PIN (and that may require a GPIO invert due to the instruction not having an inverting option), so the other one needs to use some complicated method to do so. It becomes easier to sample all pins and react afterwards, but then, there is no instruction to jump on ISR contents, so that needs additional instructions to move it to X/Y and then masking becomes difficult and ISR is already written, so another IN may be necessary, which already might require not using an auto-push. Having the branching and control signals- checking done on one SM and the IN/OUT on other(s) could simplify this too.

Statistics: Posted by wisi — Sat Oct 05, 2024 4:16 am



Viewing all articles
Browse latest Browse all 4947

Trending Articles