In February last year ARM announced its latest and greatest premium CPU core design, the Cortex-A72 – a refinement and revision of the Cortex-A57. Zoom forward about a year and we find the Cortex-A72 at the heart of SoCs like the Kirin 950 and 955, which are used in phones like the Huawei Mate 8 and the Huawei P9. Now ARM has announced another new premium 64-bit ARMv8 processor, the Cortex-A73. We knew that ARM was working on a new CPU core,code named Artemis, and now it is official. So what does the Cortex-A73 bring to the table? Is it faster? Sure… but more importantly it has made great strides in the area of power efficiency during periods of sustained usage.
Power efficiency and heat dissipation are everything when it comes to mobile CPUs and they are also factors that influence the performance of a mobile CPU. On the desktop these aren’t an issue as PCs are connected to the mains power and have big cooling fans, but the world of mobile is quite different. To keep things efficient mobile CPU designers have a few tricks they can use. One is to throttle the CPU when it becomes too warm, meaning to run it at a lower clock frequency; another is to use a heterogeneous multi-processing (HMP) setup like big.LITTLE, and use the more power efficient CPU cores for a while; and a third is to use a thermal framework like ARM’s Intelligent Power Allocation, that can dynamically manage the thermal budget of a System-on-a-Chip – reallocating the thermal budget from the CPU to the GPU (and visa versa) when necessary.
When a smartphone isn’t very busy the CPU is free to spike to its highest performance levels for short bursts. Actions like opening an app, rendering a web page, or starting a movie all make the CPU performance spike momentarily. However once the app is open the CPU usage drops, and once the web page is displayed the CPU just sits idle while you read the text, and so on.
However if you start an activity that forces the CPU performance high, like playing a complex game, then after a while the heat produced by the CPU (and the GPU) will force Android to take action and re-arrange things so that the heat can be dissipated correctly. As I mentioned before, that may very well include throttling the CPU so that it runs at a lower frequency (and hence produces less heat).
What this means is that the CPU has a peak performance level that produces more heat than its thermal budget allows, which is OK – even good, for short bursts. However when used over a sustained period the CPU usage needs to be modified so that it stays within its nominal power budget, however that comes at the expense of performance…
But what if ARM could produce a CPU core design that produces roughly the same amount of heat when the CPU performance spikes for short bursts, and when being used for sustained periods? Or to put it another way, what if ARM could design a CPU that can sustain its peak performance within its normal per-core power budget. Well, that is the goal of the Cortex-A73.
Before we dive deeper into the design of the Cortex-A73, I need to clarify a few things. First, there are several different components on a SoC that can produce heat including the GPU, the image processors, the video processor, the display processor and so on. If the overall heat level of the SoC increases due to activity by the GPU, the CPU can still be throttled even though it isn’t the part producing the heat. Secondly, how any given SoC maker implements the Cortex-A73 in silicon including which process node is used will affect the overall performance/efficiency results.
The Cortex-A73 has been specifically designed for mobile workloads and as such the internal optimizations (including branch prediction, pre-fetching, and caching) have been made with mobile in mind. There are several important architectural changes in the Cortex-A73 when compared to the Cortex-A72.
- Dual decode pipeline, compared to the 3-wide decode on the A72
- The use of a 64K 4-way instruction cache, rather than a 48K 3-way instruction cache.
- New branch predictor with a large Branch Target Address Cache (BTAC), along with a Micro-BTAC to accelerate branch prediction.
- Out-of-order execution engine optimized for high memory throughput with four full out-of-order load/store units (two load, and two store), compared to just one load and one store unit on the A72.
- New enhanced L1 and L2 cache fetching algorithms which use complex pattern detection
- The result is that the Cortex-A73’s micro-architecture is tuned for sustained peak performance without exceeding its power budget and without forcing the use of throttling.
Hexa-core rather than octa-core
The use of octa-core processors has been very successful for cheaper mid-range phones. SoCs like the Qualcomm Snapdragon 615/616 or the MediaTek P10 have proven that there is a market for devices using eight 64-bit Cortex-A53 cores. The Cortex-A53 has been very successful here due to its costs/performance ratio, as well as its high levels of power efficiency. However what is interesting is that a hexa-core Cortex-A73 SoC, with two A73 cores and four A53 cores, occupies roughly the same silicon size as an octa-core Cortex-A53 processor. The silicon footprint is everything when it comes to the cost of making a SoC and even a fraction of a square millimeter can make the difference between a profitable SoC and one that loses money for the manufacturer. The Cortex-A73 occupies less than 0.65mm2 per core.
In the case of a hexa-core A73 setup, the silicon costs should be about the same, however the single core performance will jump by over 90%, while the multi-core performance should increase by over 30%. This is an intriguing idea and one that I hope companies like Qualcomm and MediaTek explore as a hexa-core Cortex-A73 SoC is going to offer users a much better overall experience than the current octa-core Cortex-A53 SoCs.
Some of the important points to remember here are that the Cortex-A73 offers 10% general performance improvements over the Cortex-A72 when using the same process node (e.g. 16nm), 5% increase for SIMD multimedia operations, and a 15% increase in memory throughput. What that basically means is that the A73 is better for mobile than the A72 because of its design, not just because of improvements in the manufacturing process.
Amazingly these performance improvements don’t use more power, but less, so using the same process node the A73 offers a 20% power saving compared to the A72. It is also 25% smaller than the Cortex-A72. When built using a newer process node (i.e. 10nm), the Cortex-A73 offers a 30% power saving, while yielding 30% more performance and reducing the footprint by 46%.
So… faster, more efficient and smaller, all good stuff. But the killer feature is that the Cortex-A73 has almost the same heat output for short bursts of high load and for a sustained load. If used right that could dramatically change the way phone makers design handsets and open up new areas of design which don’t need to worry quite so much about long term heat dissipation.
So when will we see smartphones with Cortex-A73 cores? The new design has been widely licensed to ARM’s mobile and consumer device partners (including HiSilicon, Marvell and Mediatek), and ARM has been working with those partners in the background, long before this announcement. This means that as you read this the Cortex-A73 core design is being readied for inclusion in upcoming SoCs. When that will be exactly is unknown, however we will likely see SoCs with the Cortex-A73 towards the end of this year, and devices in early 2017.