Energy-efficient design of multimedia embedded systems demands optimizations in both hardware and software. However, software optimizations related to dynamically allocated data have only started receiving attention recently. While typical desktop applications select a set of Dynamic Data Types (DDT) to achieve the best performance in terms of execution time, embedded systems optimization require combinations of power consumption, memory usage and performance. This problem is NP-complete and cannot be fully explored in case of a large number of variables are present in the application under study, as occurs in latest multimedia embedded applications.
The following figure shows an example of DDTs implementation exploration. The initial code contains two containers, c1 and c2, instantiated as a vector and a list, respectively. After the exploration process, we can obtain, for example, a candidate solution that recommends c1 to be instantiated as Single Linked List (SLL) and c2 as Double Linked List of Arrays (DLLAR).
Formally, we can state that the application to optimize contains a set of containers {C}, which are candidates to be instantiated as a certain DDT from the set of possible implementation of a DDTs library {D}. This library includes the initial basic DDT and multi-layer implementations relevant for embedded multimedia applications. As a consequence, the goal of our multi-objective optimization flow is to obtain a set of pairs (container, DDT), such that minimizes memory accesses, memory usage and power consumption for the target embedded system.
In order to evaluate the different metrics we need to obtain the real execution information from the application. Unfortunately, the execution of the whole application for each DDT is not a viable solution. An alternative solution is to obtain a profiling report of the application where the following information is logged: number and location of the accesses of an element, addition of an element, removal of an element, the clearing of the container, iterator operations such as pre-increment or dereference, constructor, destructor, copy constructor and swap operation. To this end, we need to replace all the candidate variables in the application by our vector DDT implementation, which logs the access pattern information needed to evaluate them.
The profiling tool and DDT library are not available through this web site. More information can be found at:
An example of profiling report can be found here [16 MB].
In this phase, we extract all information needed from the profiling report. The purpose is to measure the quality of a solution (ci; dj) in the DDT exploration, using the following initial parameters: the number of candidate variables, number of elements stored in the container in the worst case (Ne), average of the number of elements stored (Nve), size of the elements in bytes (Te), size of the pointers in bytes (Tref ), number of read accesses (Nr), number of write accesses (Nw) and cache misses (Npa). All these parameters can be automatically extracted from the profiling report. To this end, we have developed our own tool, called Profile Analyzer. Cache misses are also obtained by means of simulation, generating memory traces from the profiling report and the DDT library, using them as input for the Dinero cache simulator, which models the particular memory configuration of the target embedded system. This phase is the most-time consuming part of the exploration (in the order of few hours), although it is done only once for each target architecture and tested application.
The Profile Analyzer tool, including a profiling report, is available here [win32-binary, 16 MB]
The last phase of our DDT design flow is the optimization process itself. It takes as input the parameters obtained in the previous phase and minimizes three objectives: memory accesses, memory usage and energy.
We have developed several optimization algorithms. All of them are included at our Java Evolutionary COmputation library (JECO). You can find this software here.
We have also developed an automatic manner to parallelize the design flow in the optimization of DDTs. It is performed using DEVS/SOA, that can also be found here. Note that DEVS/SOA is still in progress (alpha version) and requires many manual pre-configurations at the server side. Please read carefully the quick start guide.
These references are sorted in the same order they should be read: