Teaching and learning resources • Re: Advent of Code 2024

If I understood the internal workings of Chapel, maybe that limitation would make more sense.

Though maybe not the most efficient way to see what Chapel is doing, I decided to make a literal translation of my day24 solution into the Go programming language for comparison.

The result was

Code:

$ ./go24 # Pi 4B at 1500 MHzAdvent of Code 2024 Day 24 Crossed Wires (GOMAXPROCS=4)Part 1 The z wires output 61495910098126Part 2 Swap wires css,cwt,gdd,jmv,pqt,z05,z09,z37Total execution time 46.839487374 seconds.

Since

62.579 / 46.839 = 1.336

the Go solution is about 33 percent faster than the Chapel solution on the Pi.

On the other hand when running on the 12-core Xeon server I obtained

Code:

$ ./go24 # Xeon E5-2620 12c/24tAdvent of Code 2024 Day 24 Crossed Wires (GOMAXPROCS=24)Part 1 The z wires output 61495910098126Part 2 Swap wires css,cwt,gdd,jmv,pqt,z05,z09,z37Total execution time 22.600797331 seconds.

Since

6.65257 / 22.601 = 0.2943

the Go solution is 70 percent slower than Chapel solution on the Xeon.

The kittens looked confused. Does this mean that blue gopher is easier or more difficult to catch?

My suspicion is the slowdown is related to having multiple NUMA zones when running the code on both sockets. This theory is supported by the speed doubling when running on only one socket with 6 threads.

Code:

$ numactl -C 0-5 ./go24 # Xeon E5-2620 6c/6tAdvent of Code 2024 Day 24 Crossed Wires (GOMAXPROCS=6)Part 1 The z wires output 61495910098126Part 2 Swap wires css,cwt,gdd,jmv,pqt,z05,z09,z37Total execution time 9.587104874 seconds.

Note that one socket with 6c/12t was slower as was two sockets with 12c/12t.

I decided to let the kittens have a go to see if they could make that gopher run faster. If I were a gopher, I'd certainly run faster with Scratchy, Shy and Purr headed my way.

According to Purr the rungates routine spends an unnecessary amount of time allocating memory from the heap and releasing it. When running on both sockets the heap is slower because the memory allocator is actually NUMA aware and spends additional time placing the memory in a suitable zone. I'm skeptical that heap allocations are NUMA aware but think it's a reasonable idea to allocate the slices ahead of time and pass them in to avoid allocations in the inner loop.

It would be interesting to know whether Chapel benefits from a similar work around or not.

Statistics: Posted by ejolson — Mon Apr 28, 2025 8:42 pm

Teaching and learning resources • Re: Advent of Code 2024

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...