dev-resources.site
for different kinds of informations.
Flash prefetching function of MCU
Flash Prefetching in MCUs:
Flash prefetching is a feature implemented in many microcontrollers (MCUs), particularly those with ARM Cortex-M cores, to improve the performance of flash memory access. It is designed to increase the overall speed of the MCU by preloading data from flash memory into the processor's cache before it is actually needed for execution.
This feature is critical for enhancing performance in systems where the CPU frequently accesses the flash memory, which is generally slower than SRAM. By fetching data in advance and storing it in a cache, the MCU can reduce the waiting time for accessing instructions or data that are about to be executed or processed.
1. How Flash Prefetching Works:
When the MCU needs to execute code or fetch data from flash memory, it typically needs to wait for the data to be fetched from the flash, which may take several clock cycles. Flash prefetching minimizes this delay by anticipating the code or data that will be needed next and fetching it ahead of time.
Here's a high-level overview of how it works:
- The flash memory is typically divided into pages or sectors.
- The prefetch buffer stores the data fetched from flash memory.
- The prefetch unit predicts which instructions or data will be accessed next based on the program counter (PC) and fetches them into the cache (or buffer) in advance.
- When the CPU needs data from memory, the prefetch buffer reduces the wait time by having that data readily available, instead of having to wait for it to be fetched from flash.
2. Flash Prefetching in ARM Cortex-M Microcontrollers:
Many ARM Cortex-M microcontrollers (such as those based on the Cortex-M3, Cortex-M4, and Cortex-M7 cores) feature flash prefetching as part of their memory subsystem. The general behavior of flash prefetching in these cores is:
- Prefetch Buffer: These cores are equipped with a prefetch buffer that stores a small amount of flash memory content in anticipation of the CPU's next memory access. The size of this buffer is usually 32 to 128 bytes.
- Automatic Operation: Flash prefetching is generally automatic; the microcontroller automatically fetches the next instruction or data into the prefetch buffer when the CPU executes code.
- Improved Performance: When enabled, the prefetch unit speeds up code execution, especially for applications with large code or where flash memory is accessed frequently.
3. Enabling and Configuring Flash Prefetching on STM32:
In STM32 microcontrollers (which often use ARM Cortex-M cores), flash prefetching can typically be controlled via the Flash Access Control register in the Flash memory interface. Here's how you can configure or enable it:
a. STM32 Flash Prefetching:
For STM32 devices (such as STM32F4, STM32F7, etc.), enabling or disabling flash prefetching is done by modifying the FLASH_ACR (Access Control Register). Specifically, there are the following key bits related to prefetching:
- PRFTEN (Prefetch Enable): This bit enables or disables the flash prefetch buffer.
- LATENCY (Flash Latency): This determines the number of wait states required for accessing the flash. When enabling prefetch, the latency may be reduced.
b. Example of Enabling Prefetch in STM32:
Here is an example of how you would enable flash prefetching in STM32 using HAL:
c
#include "stm32f4xx_hal.h"
// Enable Flash Prefetch
void Enable_Flash_Prefetch(void) {
// Enable the prefetch buffer by setting the PRFTEN bit in the FLASH ACR register
FLASH->ACR |= FLASH_ACR_PRFTEN;
// Optionally, you can set the Flash latency (depends on your system clock and flash speed)
FLASH->ACR |= FLASH_ACR_LATENCY_1; // 1 wait state for the flash access
}
In the example:
- FLASH->ACR |= FLASH_ACR_PRFTEN; enables the Prefetch Buffer.
- FLASH_ACR_LATENCY_1 configures one wait state for the flash. Depending on your system clock and flash speed, you might need to adjust the latency setting.
c. Disabling Prefetch:
To disable prefetching, you can simply clear the PRFTEN bit:
c
// Disable Flash Prefetch
void Disable_Flash_Prefetch(void) {
FLASH->ACR &= ~FLASH_ACR_PRFTEN; // Disable prefetch buffer
}
4. Performance Impact of Flash Prefetching:
- Improved Execution Speed: With the prefetch buffer enabled, the CPU can fetch data and instructions from flash memory faster, reducing the number of cycles required to execute code that depends on flash data.
- Memory Latency Reduction: Prefetching reduces the latency of accessing flash memory by preloading instructions and data into the cache, thus allowing the CPU to execute them with minimal delay.
- Optimized for Sequential Code: Flash prefetching works best for sequential code execution, where instructions are fetched in a predictable pattern (e.g., linear code or loops).
However, for non-sequential access patterns (such as accessing data in random locations in flash), prefetching might not be as beneficial and could potentially lead to inefficiencies, as the prefetch buffer would be fetching data that may not be used soon.
5. Flash Prefetching and Wait States:
When configuring flash prefetching, you must also consider the flash wait states. Flash memory is slower than SRAM, so when the system clock is running at high frequencies, you might need to introduce wait states to ensure stable access to the flash. Wait states are delays inserted into the memory access cycle to accommodate the flash's slower read speed.
On STM32, the FLASH_ACR register also contains a LATENCY field that configures the number of wait states required to access the flash:
- No Latency (0 wait states): For low clock speeds (up to a few MHz).
- 1 Wait State: Typically used for higher clock speeds (up to around 48 MHz).
- 2 or More Wait States: Required for even higher system frequencies.
The combination of wait states and prefetching helps optimize the balance between flash memory speed and system performance.
6. Example of Flash Prefetching with Wait States on STM32:
c
#include "stm32f4xx_hal.h"
// Set up Flash Prefetching with appropriate wait states for system clock
void Configure_Flash(void) {
// Enable Prefetch Buffer
FLASH->ACR |= FLASH_ACR_PRFTEN;
// Set Flash Latency for the system clock (e.g., 1 wait state for a clock up to ~48 MHz)
FLASH->ACR &= ~FLASH_ACR_LATENCY; // Clear previous latency settings
FLASH->ACR |= FLASH_ACR_LATENCY_1; // Set 1 wait state
}
In this example, the prefetch buffer is enabled, and the latency is configured to one wait state for typical system clock frequencies (e.g., up to 48 MHz).
7. Conclusion:
Flash prefetching is a valuable feature for improving the performance of microcontrollers, especially when accessing large codebases or data stored in flash memory. By preloading instructions and data into a buffer, the MCU reduces the wait time when accessing flash, leading to faster execution of code.
In STM32 microcontrollers, this feature is controlled through the FLASH_ACR register, and enabling it typically requires:
- Setting the PRFTEN bit to enable the prefetch buffer.
- Adjusting flash latency settings based on the system clock.
While flash prefetching is automatic in many MCUs, it is often beneficial to explicitly enable and configure it to optimize performance, particularly for systems running at higher clock frequencies or with demanding memory access patterns.
Featured ones: