EM•Mark active power – TI CC2340R5
Having shown how small(er) EM programs can in fact execute fast(er) – especially when fetching instructions from zero wait-state SRAM – EM•Mark now reveals the (even greater) impact of program size on the power consumption of resource-constrained MCUs.
Program size matters
Every MCU datasheet which characterizes active power consumption invariably does so by executing either: an empty while(1)
loop, or else the legacy CoreMark program.
Expressed in units such as mA or μA / Mhz , case will typically measure higher due to the extra power consumed by reading a block of instructions from the Flash when (not if) a CPU cache-miss occurs – a reasonable assumption knowing the size of legacy CoreMark.
We often focus on how the cache improves program execution time – by automatically loading instructions from slow(er) Flash into fast(er) SRAM when needed. But since Flash draws more current than SRAM when active, the cache also helps lower total power consumption.
As we reported previously, placing the entire EM•Mark program in SRAM can improve execution time; and as you'll now learn, this same sort SRAM-only configuration (which the EM language strongly encourages) can have even greater impact on active power consumption.
The envelope, please ...
Using the setup described in EM•Mark Results, the following summarizes active power consumption when executing CoreMark versus EM•Mark with different memory configurations:
CoreMark | text + const [ Flash ] | 7.77 mW | 1.37 mJ |
EM•Mark | text + const [ Flash ] | 6.84 mW | 1.03 mJ |
EM•Mark | text + const [ SRAM ] | 5.09 mW | 0.63 mJ |
Armed with a Joulescope JS220 energy analyzer, we've captured traces of power consumption over time – enabling us to visualize overall energy utilization [ E = ∫ Pdt ] when executing programs on our MCU:
To align with MCU datasheet conventions, our earlier summary reports the average amount of power in mW consumed during our ten benchmark iterations [ 7.77, 6.84, 5.09 ] ; our CoreMark result in fact matches the 2.6 mA [ @ 3V ] found in the TI CC2340R5 datasheet .
At the same time, we feel that energy measured in mJ [ 1.37, 1.03, 0.63 ] better reflects the dynamics of a "live" system executing an application program; in somewhat simple terms, hardware contributes raw power while software adds the critical dimension of time.
Case in point, consider some factors contributing to the (significantly lower) 0.63 mJ of energy consumed when executing ten iterations of EM•Mark from zero wait-state SRAM:
fetching instructions from SRAM requires less power than Flash
programs will generally execute faster when placed in SRAM
we can actually power-down the Flash and its cache
Faster clock, more power, but less energy ???
Some MCUs (but not the TI CC2340R5) allow software to change the frequency of the master clock, selecting among a set of discrete chip-specific rates ranging from (say) 4 MHz to ≥ 100 MHz. The MCU datasheet will then specify a corresponding range of active power values in mA or μA / Mhz .
At first glance, the (linear) relationship between clock frequency and current draw suggests that faster execution requires more power – a reasonable trade-off. But the MCU datasheet only presents an "instantaneous" perspective that lacks the dimension of time, masking opportunities to reduce overall energy consumption.
Looking deeper at MCU power consumption, we can distinquish between leakage (static) and switching (dynamic) current – with the latter proportionally tracking clock rate. But once we factor in the fixed amount of static current consumed by on-chip memory / peripherals, a faster clock can in fact lower energy consumption.
As silicon processes advance towards smaller transistors in higher densities, leakage current becomes an even larger factor in our energy calculus – especially for embedded applications with low active duty-cycles that spend most of their time in "deep-sleep" modes.
Finally, a special tip-of-the-hat to Joe Circello for our multiple rounds of discussions in the summer of 2022 on silicon process as well as hardware architecture, and how EM might influence future MCU designs.
These results further confirm some hypotheses put forth in Tiny code → Tiny chips, and suggest a very different SRAM-centric MCU architecture with a faster clock, low-cost Flash, and no cache – yielding a smaller chip with fewer gates which consumes less power.
All possible because EM can dramatically reduce program size :: ↓ KB ⇒ ↓ mJ
How can you get involved
review this earlier post, which talks about MCU power specs in the context of total energy usage
dive into the CoreMark Reimagined and EM•Mark Results articles for more technical details
share your experiences in measuring power / energy, as well as your thoughts on ULP MCU design
Happy coding !!!