Some interesting ideas here.
I got some PIO code talking to two PSRAMs over QSPI and the performance looks fine for my case. I'm getting about 8MB/s for random 32-bit reads, 18 MB/s when reading 16 bytes at a time. That's with an SPI clock of 75 MHz. The PIO code controls both chip selects using the SET instruction.
The problem with writing assembly is it's quite fun
and I'm tempted to try and optimise it more, because it spends quite a few cycles setting everything up at the beginning before clocking anything out, which is hurting the performance a bit. If there was a way to write a register value to the SET pins rather than an immediate that would be great, but I don't think there is.
I've tried just controlling the chip selects directly from the main code, but the chips really want CS to go back high immediately after the transaction, so that needs to be done by the PIO.
Oh, I just realised - the best thing to do would be to have one state machine per chip. If they can share SCK and the four data pins but have individual CS pins then that would work.
I got some PIO code talking to two PSRAMs over QSPI and the performance looks fine for my case. I'm getting about 8MB/s for random 32-bit reads, 18 MB/s when reading 16 bytes at a time. That's with an SPI clock of 75 MHz. The PIO code controls both chip selects using the SET instruction.
The problem with writing assembly is it's quite fun
I've tried just controlling the chip selects directly from the main code, but the chips really want CS to go back high immediately after the transaction, so that needs to be done by the PIO.
I tried this to begin with, with the PIO code just controlling one CS line, which I switched between the two chips by reconfiguring the state machine with sm_config_set_set_pin_base() and pio_sm_set_config(). But it's slower that way, and it feels a bit hacky.There may be some trick you could play in moving CS1\ between different GPIO pins dynamically.
Code:
; based on https://github.com/polpo/rp2040-psram.program qspi_psram.side_set 1 ; SCK; we are pulling 32 bits at a time from the TX FIFO, and they are shifted left; (MSB first) out of the OSR; send a setup word containing chip select and number of nibbles to read/write; then send 32-bit data (first word will be command + 24-bit address)begin: set pins, 0b11 side 0 ; CS0,CS1 high out x, 1 side 0 ; most significant bit of setup word selects which chip jmp !x, use_cs0 side 0 set pins, 0b01 side 0 ; CS1 low.wrap_target out x, 15 side 0 ; next 15 bits of setup word = number of nibbles to output MINUS ONE out y, 16 side 0 ; lowest 16 bits of setup word = number of nibbles to inputloop_write: out pins, 4 side 0 ; Write value on pins, lower clock jmp x--, loop_write side 1 ; Raise clock: this is when PSRAM reads the value. Loop if we have more to write jmp !y, begin side 0 ; If this is a write-only operation, jump back to beginning set x, 4 side 1 ; wait clocks (6)loop_wait: nop side 0 jmp x--, loop_wait side 1 set x, 0 side 0 mov pindirs, x side 1 ; clk 14 rising edge jmp readloop_mid side 0loop_read: in pins, 4 side 0readloop_mid: jmp y--, loop_read side 1 set x, 0b1111 side 0 mov pindirs, x side 0 jmp begin side 0use_cs0: set pins 0b10 side 0 ; CS0 low.wrap ; save a cycle by using .wrap to jump backOh, I just realised - the best thing to do would be to have one state machine per chip. If they can share SCK and the four data pins but have individual CS pins then that would work.
Statistics: Posted by kaimac — Sat Jun 07, 2025 11:41 pm