Wayback Machine
33 captures
27 Oct 2024 - 27 Apr 2026
Sep OCT Nov
27
2023 2024 2025
success
fail
About this capture
COLLECTED BY
Organization: Archive Team
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.

The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.

This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.

Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.

The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

Collection: Archive Team: URLs
TIMESTAMPS
loading
The Wayback Machine - https://web.archive.org/web/20241027004813/https://chipsandcheese.com/p/amds-turin-5th-gen-epyc-launched

Chips and Cheese

Share this post

AMD's Turin: 5th Gen EPYC Launched

chipsandcheese.com

AMD's Turin: 5th Gen EPYC Launched

5 Gigahertz Server CPUs

George Cozma
Oct 11, 2024
19
Share this post

AMD's Turin: 5th Gen EPYC Launched

chipsandcheese.com
4
1
Share

Hello you fine Internet folks, for today we have a video and an article for y’all.

Unlike our prior Granite Rapids coverage where we just had a video, we have had hands-on with Turin, specifically the AMD EPYC 9575F, thanks to Jordan from StorageReview.

This article is going to be a little different from our usual. It’s going to be shorter than usual because we have already covered the Zen 5 core both in mobile and in desktop and the differences between them, so this article will be focused on the memory subsystem changes that Turin has.

Serve the Home has an excellent article that has the slides that AMD has put out for the launch of Turin. But because we have our own data, I thought that our data would be more interesting to dive into.

Memory Bandwidth

First, looking at the 1T results, we see that the 9575F can pull around 52 GB/s of memory read bandwidth, 48 GB/s of memory write bandwidth, and 95 GB/s of memory add (Read-Modify-Write) bandwidth.

And looking at the results for how much memory bandwidth a single CCD can get, we can see that a single core can use just under half the total CCD memory read bandwidth, about 55% the total CCD memory write bandwidth, and over two-thirds the total CCD memory add bandwidth.

Looking a bit closer at these results, you’ll notice that the 9575F has significantly higher bandwidth to a CCD compared to the desktop Zen 5 parts. And the reason for this is the 9575F has GMI3-W which means that it has 2 GMI links to the IO die instead of the single GMI link that the 9950X gets.

And this is not only the only change to the GMI links on server. The GMI write link is now 32B per GMI link instead of the 16B per GMI link that you’d see on desktop Zen 5.

Before moving to the full socket memory performance for the 9575F, let’s clear up the memory speeds that Turin supports. Turin has 12 channels of memory that can run up to DDR5-6400MT/s, however 6400MT/s is only supported on specific validated systems and only for 1 DIMM per channel.

The system we had access to was running 6000MT/s for its memory, and DDR5-6000 MT/s is what most systems will support in a 1 DIMM per channel configuration. Should you want to run 2 DIMMs per channel, then your memory speeds drop to 4400 MT/s; and if you run 1 DIMM per channel in a motherboard with 2 DIMMs per channel then expect 5200 MT/s for your memory speed.

Now, actually diving into the memory of the full 9575F and we see that we can get nearly 99% of the theoretical 576 GB/s of memory bandwidth using reads. Writes and adds are still an impressive 435 GB/s and 453 GB/s respectively.

We also tested the socket to socket bandwidth on AMD’s Volcano Platform which only has 3 GMI links between the two 9575Fs.

These results are very similar to our Bergamo results, which isn’t surprising because that system also had the same 3 GMI link setup.

Memory Latency

Moving to memory latency, we see that Turin’s unloaded memory latency is very similar to Genoa’s unloaded memory latency.

At Hot Chips 2024, Ampere Computing showed a graph demonstrating the loaded memory latency of an AmpereOne chip and AMD’s Genoa CPU. Now Chester wanted to make a test similar to this, so he made a loaded latency test.

The way that this test works is that it runs our memory bandwidth benchmark on either 7 cores on a CCD or 7 CCDs on the 9575F. This ensures that the IOD to CCD link or the whole memory system is fully loaded. Then, with the 8th core or 8th CCD, we run the memory latency test to see what the latency of the fully loaded system is.

When a single CCD is loaded up on the 9575F, we see about a 39 nanosecond increase between the unloaded and loaded latency.

When the whole system is loaded, we see about a 31 nanosecond increase between unloaded and loaded system latency.

Looking at the single CCD results compared to the fully loaded system results, the 9575F has very similar memory latency behavior regardless if a single CCD is loaded or if the whole system is loaded.

And lastly, everyone’s favorite graph, the core to core latency graph.

For convenience, the list below are the numbers from the chart because a chart this big can be hard to read.

  • Intra-CCD latency: ~45ns

  • Inter-CCD latency: ~150ns

  • Socket to Socket latency: ~260ns

This is a latency increase compared to Genoa, especially within a CCD.

Clock Speed

Now, a note about the clock speeds we saw with the 9575F. All 64 cores could hit up to 5GHz in single threaded test. This is quite impressive, but we were able to get all 8 cores in a CCD to run at 5GHz in our memory bandwidth testing.

And with all 128 threads chugging away on Cinebench 2024, we saw the 9575F sticking around the 4.3GHz range. Wendell from Level1Techs saw about 4.9GHz all core on a web server/TLS transaction workload, which is a less vectorized workload.

Conclusion

Realistically, AMD’s Turin is the generational update you’d normally expect. Not only does AMD have high core count SKUs (9755, 9965), which the hyperscalers will be picking up, they now also have lower core count, very high frequency SKUs (9575F) which the traditional enterprise market will appreciate. Apparently we now think 64 cores is ‘lower core count’. What a world we live in.

Turin isn’t the step-function revolution that Naples to Rome was; it’s more akin to the evolution we saw with Milan to Genoa, which was a memory bandwidth increase, a core increase, and a core update. Nonetheless, this generation is set to excite a lot of people, as there’s lots of value here in a very competitive ecosystem.

If you like our articles and journalism, and you want to support us in our endeavors, then consider heading over to our Patreon or our PayPal if you want to toss a few bucks our way. If you would like to talk with the Chips and Cheese staff and the people behind the scenes, then consider joining our Discord.

19
Share this post

AMD's Turin: 5th Gen EPYC Launched

chipsandcheese.com
4
1
Share

Discussion about this post

Luke
Oct 12

What about avx-512 performance? (Yeah, also for LLM inference!)

Expand full comment
Reply
Share
1 reply
ABC8
Oct 12

Could you benchmark llama.cpp/olllama?

Expand full comment
Reply
Share
2 more comments...

No posts

Ready for more?

© 2024 Chips and Cheese
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great culture
Share