AMD Beema/Mullins Architecture & Performance Preview

Name: AMD Beema/Mullins Architecture & Performance Preview
Item: AMD Beema/Mullins Architecture & Performance Preview
Author: Anand Lal Shimpi

by Anand Lal Shimpi on April 29, 2014 12:00 AM EST

82 Comments | Add A Comment

82 Comments

New Turbo Boost

With power in perspective, let’s talk about performance and the lineup. It always made little sense that despite a very competitive microarchitecture, Jaguar both consumed more power and performed worse than Intel’s Silvermont. It turns out that’s more a function of the limited time AMD’s Jaguar team had to bring the design to market. As the basis not only for AMD’s own entry level APUs but also the semi-custom SoCs bids for consoles from Microsoft and Sony, Jaguar had to be done quickly. With Puma+ and its associated SoC designs, AMD could focus more on driving power down and introducing new features, one of which happens to be a very intelligent clock boosting scheme analogous to Intel’s Turbo Boost.

While the bulk of Kabini and Temash silicon ran up to a set maximum frequency, Beema and Mullins SoCs can take advantage of available thermal headroom to increase their maximum frequency for a limited period of time. If we look at the tables below we’ll see this in action:

Mullins vs. Temash - Frequency Gains
	TDP	Max CPU Frequency	Temash Equivalent	Temash Equivalent (TDP)	Temash Max CPU Frequency	Max Frequency Increase from Mullins
A10 Micro-6700T	4.5W	2.2GHz	A6-1450	8W	1.4GHz	57%
A4 Micro-6400T	4.5W	1.6GHz	A4-1250	9W	1.0GHz	60%
E1 Micro-6200T	3.95W	1.4GHz	A4-1200	3.9W	1.0GHz	40%

AMD no longer reports max non-turbo frequency, unfortunately following in Intel’s footsteps (as well as the rest of the mobile players), but you can assume that they are mostly unchanged from Kabini/Temash. Beema and Mullins can now turbo up to much higher frequencies. In the case of Mullins in particular, since it’s so thermally constrained, the potential upside for frequency scaling is huge.

Beema vs. Kabini - Frequency Gains
	TDP	Max CPU Frequency	Kabini Equivalent	Kabini Equivalent (TDP)	Kabini Max CPU Frequency	Max Frequency Increase from Beema
A6-6310	15W	2.4GHz	A6-5200	25W	2.0GHz	20%
A4-6210	15W	1.8GHz	A4-5000	15W	1.5GHz	20%
E2-6110	15W	1.5GHz	E2-3000/E1-2500	15W	1.65GHz/1.4GHz	-10%/7%
E1-6010	10W	1.35GHz	E1-2100	9W	1.0GHz	35%

The frequency gains aren't just limited to the CPU, the 128 GCN cores can also run at higher speeds with Beema and Mullins:

Mullins vs. Temash - GPU Frequency Gains
	TDP	Max GPU Frequency	Temash Equivalent	Temash Equivalent (TDP)	Temash Max GPU Frequency	Max GPU Frequency Increase from Mullins
A10 Micro-6700T	4.5W	500MHz	A6-1450	8W	400MHz	25%
A4 Micro-6400T	4.5W	350MHz	A4-1250	9W	300MHz	16%
E1 Micro-6200T	3.95W	300MHz	A4-1200	3.9W	225MHz	33%

Beema vs. Kabini - GPU Frequency Gains
	TDP	Max GPU Frequency	Kabini Equivalent	Kabini Equivalent (TDP)	Kabini Max GPU Frequency	Max GPU Frequency Increase from Beema
A6-6310	15W	800MHz	A6-5200	25W	600MHz	33%
A4-6210	15W	600MHz	A4-5000	15W	500MHz	20%
E2-6110	15W	500MHz	E2-3000/E1-2500	15W	450/400MHz	11%/25%
E1-6010	10W	350MHz	E1-2100	9W	300MHz	16%

How can AMD hit significantly higher frequencies without a substantial architecture change or new process node? By raising the max thermal operating point of the silicon. Similar to what Intel discovered in architecting its Bay Trail silicon, AMD realized that in ultra portable form factors it would run into a chassis temperature limit before it ever reached the maximum operating temperature of its silicon.

Previously once the silicon temperature hit 60C, AMD would cap max CPU/GPU frequency. However what really matters isn’t if the silicon is running warm but rather if the chassis is running too warm. With Beema and Mullins, AMD increases the silicon temperature limit to around 100C (still within physical limits) but instead relies on the surface temperature of the device to determine when to throttle back the CPU/GPU. In AMD’s own words, this allows the SoC to run at a much higher frequency for up to several minutes before having to scale back down. As long as the physical limits of the die aren’t exceeded, the design remains just as safe as before, but you get better performance.

The real trick is that AMD is able to enable this new chassis temperature governed boost (called Skin Temperature Aware Power Management - STAPM) without requiring any additional sensors or hardware from the OEM. What AMD does instead is gives the OEM tools to properly map SoC temperature to chassis skin temperature. My guess is the OEM runs a set workload, measuring external chassis temperature all while correlating that data with SoC temperature. This mapping will vary on a device by device basis, and obviously won’t be as accurate as having a thermal sensor on the chassis itself, but it’s good enough to get the job done.

AMD claims it’s intelligent about when to boost. The updated power management unit looks at the response to frequency scaling of a given workload and will only boost when the workload will actually benefit from being boosted. This evaluation happens at the hardware instruction level and not at the OS/software layer.

The Lineup

With the exception of compressing the Kabini family into four parts instead of five, AMD kept the same number of SKUs as last year but obviously with updated specs with Beema and Mullins:

AMD Mullins vs. Temash APUs
Model	Radeon Brand	SDP	TDP	CPU Cores	CPU Clock Speed (Max)	L2 Cache	Radeon Cores	GPU Clock Speed (Max)	DDR3 Speed (Max)
A10 Micro-6700T	R6	2.8W	4.5W	4	2.2GHz	2MB	128	500MHz	1333
A4 Micro-6400T	R3	2.8W	4.5W	4	1.6GHz	2MB	128	350MHz	1333
E1 Micro-6200T	R2	2.8W	3.95W	2	1.4GHz	1MB	128	300MHz	1066
A6-1450	HD 8250		8W	4	1.4GHz	2MB	128	400MHz	1066
A4-1250	HD 8210		9W	2	1.0GHz	1MB	128	300MHz	1333
A4-1200	HD 8180		3.9W	2	1.0GHz	1MB	128	225MHz	1066

The Mullins parts get a Micro prefix in front of their model number, implying the SoC's tablet-friendliness. AMD also supplies both TDP and Scenario Design Power (SDP) values for Mullins SoCs, similar to what Intel does with Bay Trail. The latter uses more tablet-like workloads (read: lighter weight) while determining SoC power.

With the exception of the entry level E1 Micro-6200T, TDPs go down substantially with Mullins vs. Temash. Cache sizes and GPU core count remain unchanged, but CPU frequencies and max DRAM frequency supported goes up in many cases.

AMD Beema vs. Kabini APUs
Model	Radeon Brand	SDP	TDP	CPU Cores	CPU Clock Speed (Max)	L2 Cache	Radeon Cores	GPU Clock Speed (Max)	DDR3 Speed (Max)
A6-6310	R4		15W	4	2.4GHz	2MB	128	800MHz	1866
A4-6210	R3		15W	4	1.8GHz	2MB	128	600MHz	1600
E2-6110	R2		15W	4	1.5GHz	2MB	128	500MHz	1600
E1-6010	R2		10W	2	1.35GHz	1MB	128	350MHz	1333
A6-5200	HD 8400		25W	4	2.0GHz	2MB	128	600MHz	1600
A4-5000	HD 8330		15W	4	1.5GHz	2MB	128	500MHz	1600
E2-3000	HD 8280		15W	2	1.65GHz	1MB	128	450MHz	1600
E1-2500	HD 8240		15W	2	1.4GHz	1MB	128	400MHz	1333
E1-2100	HD 8210		9W	2	1.0GHz	1MB	128	300MHz	1333

Beema sees the end of the lone 25W TDP for Kabini, everything is now at 15W or less. The lowest end Beema carries a slightly higher TDP than the entry level Kabini, but otherwise there's more performance at the same TDP across the board. Beema parts don't come with an SDP rating as they're designed for use in more traditional ultrathin notebook PC form factors (presumably running more traditional, read: heavier, workloads).

TrustZone

In 2012 AMD announced that it had signed a license agreement with ARM. Although we’ve since seen AMD announce ARM based Opteron silicon, back then the only official commitment was to ship an x86 SoC in 2013 with an integrated ARM Cortex A5 for TrustZone execution. AMD needed a hardware security platform on its SoCs to remain competitive, and it didn’t have one of its own (Intel’s TXT is proprietary and not a part of what’s licensed to AMD) so ARM’s TrustZone technology was an easy target. To support TrustZone you need an ARM core, and thus AMD committed to integrating a Cortex A5 as a dedicated security processor on some of its 2013 APUs.

Indeed both Kabini and Temash had a Cortex A5 on die, it was simply never enabled due to time constraints. With Beema and Mullins the core is fully functional in what AMD is calling its Platform Security Processor (PSP). AMD will likely publish guidelines on how developers can access and use the PSP, and I’d also expect to see it make its way into other AMD APUs moving forward.

Introduction The Discovery Tablet & Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

82 Comments

View All Comments

vlad42 - Tuesday, April 29, 2014 - link
Hey Anand,
Do you know if connected standby was enabled on the Mullins tablet you tested? I’ve heard that AMD has not yet developed the connected standby drivers yet. If this is the case, then shouldn’t that be noted in the power consumption test? Given the improvements Intel sees when connected standby is enabled, it definitely looks like the Mullins tablet was not using connected standby.
Anand Lal Shimpi - Tuesday, April 29, 2014 - link
No CS, although I believe Mullins could theoretically support it. For this particular test there should be no advantage to having CS, we're just looking at short/active idle power usage.
Anand Lal Shimpi - Tuesday, April 29, 2014 - link
Correction - there are no plans to support CS, AMD doesn't see value in it.
tomsworkshop - Tuesday, April 29, 2014 - link
this low power but high performance tiny little chip can be fit on something like the raspberry pi, we can build a really small micro pc with such hardware.
mfoley93 - Wednesday, April 30, 2014 - link
They aren't low power enough yet, so in the mean time, I suspect nVIDIA's Jetson board based off the Tegra K1 might fit the spot you are thinking of. The board is significantly larger, but its powerful enought to justify that. It's also more in the Beagleboard/Pandaboard segment, being aimed at embedded development and not education.

http://www.nvidia.com/object/jetson-tk1-embedded-d...
tomsworkshop - Thursday, May 1, 2014 - link
The Tegra K1 rated at 5W TDP, the Mullins rated at 3.95W - 4.5W TDP, i think they should be low power enough for a single board computer, i saw the price for the Jetson TK1, it was 3 times higher than the Raspberry Pi, hope that AMD will come out something at the middle of the Raspberry Pi and Jetson TK1, with the price lower than the Jetson TK1, and the performance better than the Raspberry Pi.
nemi2 - Tuesday, April 29, 2014 - link
It's good to see AMD catching up but the power consumption may still be a deciding factor when choosing between this and Baytrail - I look forward to a full review of release hardware. The other area of concern is that the Baytrail successor Airmont (with shrink to 14nm and ~30% power savings) will also be out in 2014 so AMD may only have 0-3 months at parity/competitiveness with intel.
H2323 - Tuesday, April 29, 2014 - link
Yes these are shorty terms wins, but 14nm airmont is late and will miss back to school and be just in time for christmas, AMD will net some revenue on this and still retain GPU lead with baytrail
R3MF - Tuesday, April 29, 2014 - link
a four-core mullins in an 8" tablet chassis with 4GB of memory and bios support for a 64bit ubuntu would be an awesome thing.
MrSpadge - Tuesday, April 29, 2014 - link
I'd rather have a Thinkpad 10 + Win 8.1 with this, at least as an option instead of Bay Trail. If the actual product is as impressive as the preview.

AMD Beema/Mullins Architecture & Performance Preview

New Turbo Boost

The Lineup

TrustZone

Post Your Comment

82 Comments

View All Comments

vlad42 - Tuesday, April 29, 2014 - link

Anand Lal Shimpi - Tuesday, April 29, 2014 - link

Anand Lal Shimpi - Tuesday, April 29, 2014 - link

tomsworkshop - Tuesday, April 29, 2014 - link

mfoley93 - Wednesday, April 30, 2014 - link

tomsworkshop - Thursday, May 1, 2014 - link

nemi2 - Tuesday, April 29, 2014 - link

H2323 - Tuesday, April 29, 2014 - link

R3MF - Tuesday, April 29, 2014 - link

MrSpadge - Tuesday, April 29, 2014 - link

Log in

Don't have an account? Sign up now