I haven't had any luck with distributed llama yet; however, the interleave setting in numactl obtainedThis makes me wonder whether this thread is somehow back on topic and distributed llamaHowever, it is not clear how to use multiple CCDs to run one instance faster.
https://github.com/b4rtaz/distributed-llama
could also be used to obtain faster speeds for the Epyc.
Code:
numactl threads rate-C 0 -i 0 1 6.49-C 0-1 -i 0 2 12.15-C 0-1,8-9 -i 0-1 3 16.06-C 0,8,16,24 -i 0-3 3 15.17-C 0,8,16,24 -i 0-3 4 16.47-C 0-1,8-9,16-17,24-25 -i 0-3 7 28.08-C 0-3,8-11 -i 0-1 8 26.21-C 0-3,8-11,16-19,24-27 -i 0-3 15 37.66-C 0-3,8-11,16-19,24-27 -i 0-3 16 37.20-C 0-31 -i 0-3 31 41.38-C 0-31 -i 0-3 32 38.56-C 0-31,64-95 -i 0-3 63 39.13-C 0-31,64-95 -i 0-3 64 38.03-C 0-127 -i 0-7 127 20.23-C 0-127 -i 0-7 128 17.93I think the results
may encourage Purr and Scratchy to pay more attention to NUMA.
Statistics: Posted by ejolson — Tue Apr 15, 2025 6:45 pm