Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 8013

Other projects • Re: Deepseek cluster?

$
0
0
However, it is not clear how to use multiple CCDs to run one instance faster.
This makes me wonder whether this thread is somehow back on topic and distributed llama

https://github.com/b4rtaz/distributed-llama

could also be used to obtain faster speeds for the Epyc.
I haven't had any luck with distributed llama yet; however, the interleave setting in numactl obtained

Code:

numactl                       threads  rate-C 0 -i 0                        1     6.49-C 0-1 -i 0                      2    12.15-C 0-1,8-9 -i 0-1                3    16.06-C 0,8,16,24 -i 0-3              3    15.17-C 0,8,16,24 -i 0-3              4    16.47-C 0-1,8-9,16-17,24-25 -i 0-3    7    28.08-C 0-3,8-11 -i 0-1               8    26.21-C 0-3,8-11,16-19,24-27 -i 0-3  15    37.66-C 0-3,8-11,16-19,24-27 -i 0-3  16    37.20-C 0-31 -i 0-3                  31    41.38-C 0-31 -i 0-3                  32    38.56-C 0-31,64-95 -i 0-3            63    39.13-C 0-31,64-95 -i 0-3            64    38.03-C 0-127 -i 0-7                127    20.23-C 0-127 -i 0-7                128    17.93
which is better than before.

I think the results

Image

may encourage Purr and Scratchy to pay more attention to NUMA.

Statistics: Posted by ejolson — Tue Apr 15, 2025 6:45 pm



Viewing all articles
Browse latest Browse all 8013

Trending Articles