Elsevier

Applied Soft Computing

Volume 62, January 2018, Pages 1088-1101
Applied Soft Computing

Evolutionary design of the memory subsystem

https://doi.org/10.1016/j.asoc.2017.09.047Get rights and content

Highlights

  • An optimization framework is presented that can be applied to the design of the complete memory subsystem of a computer.

  • A multi-objective optimization approach is considered for the register file and cache memory optimization, where both the energy and the performance are considered as objectives.

  • Grammatical evolution is applied in the dynamic memory optimization, where a weighted sum approach is taken into account.

  • Several simulators are integrated in the optimization loop, which makes this framework more robust, given that those simulators are commonly accepted in the research community.

Abstract

The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory intensive. This increases the pressure on the memory subsystem and affects the performance and energy consumption. In this regard, the thermal problems, performance degradation and high energy consumption, can cause irreversible damage to the devices.

We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology. Firstly, the thermal impact of register file is analyzed and optimized. Secondly, the cache memory is addressed by optimizing cache configuration according to running applications and improving both performance and power consumption. Finally, we simplify the design and evaluation process of general-purpose and customized dynamic memory manager, in the main memory. To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools. This way, we are able to evaluate the quality of each candidate solution and take advantage of the exploration of solutions given by the optimization algorithm. We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.

Introduction

Memory hierarchy has a significant impact on performance and energy consumption in the system. This impact is estimated about 50% of the total energy consumption in the chip [1]. This places the memory subsystem as one of the most important sources to improve both performance and energy consumption. Concerns such as thermal issues or high energy consumption can cause a significant performance degradation, as well as irreversible damages to the devices therefore increasing the energy cost. Previous works have shown that saving energy in the memory subsystem can effectively control transistors aging effect and can significantly extend lifetime of the internal structures [2].

Technological changes combined with the development of communications have led to the great expansion of mobile devices such as smartphones, tablets, etc. Mobile devices have evolved rapidly to adapt to the new requirements, giving support to multimedia applications. These devices are supplied with embedded systems, which are mainly battery-powered and usually have less computing resources than desktop systems.

Additionally, multimedia applications are usually memory intensive, so they have high performance requirements which implies a high energy consumption. These features increase the pressure on the whole memory subsystem.

Processor registers, smaller in size, work at the same speed than the processor and consume less energy compared with other levels of the memory subsystem. However, the energy consumption and access time rise when the file size increases due to a higher number of registers and ports.

Regarding the cache memory, it has been identified as a cold area in the chip, although the peripheral circuits and the size of the cache are the most influencing factors to cause a temperature increase [3], facing the accesses to the cache memory because of specific applications. However, cache memory affects both performance and energy consumption. In fact, energy consumption of the on chip cache memory is considered to be responsible of 20–30% of the total consumption in the chip [4]. A suitable cache configuration will improve both metrics.

In terms of performance, the main memory is the slowest component compared with the cache memory and processor registers. Running programs request the allocation and deallocation of memory blocks, and the dynamic memory manager (DMM) is in charge of this task. Current multimedia applications have highly dynamic memory requirements, so optimizing the memory allocator is a crucial task. Solving a memory allocation request is a complex task and the allocation algorithm must minimize internal and external fragmentation problems. Therefore, efficient tools must be provided to DMM designers for evaluating the cost and the efficiency of DMMs, facilitating the design of customized DMMs.

In this paper we present a methodology based on evolutionary algorithms (EA), which is divided into three layers tackling different components of the memory hierarchy and performing the optimization process of each layer according to the running applications. Then, the first layer is the registers file, the second is the cache memory and the last one is the DMM, which works on the main memory. Fig. 1 shows the three optimization layers surrounded with different dashed lines, and the tools involved within each optimization process, which will be deeply explained in the rest of the paper.

In a previous work [5], we presented an approach based on grammatical evolution (GE) with a wide design space, where the complete set of parameters defined is considered and a specific cache memory configuration was chosen as a baseline. The GE approach had good results, in the absence of other results to be compared with. The problem is clearly multi-objective and thus the GE approach considered a weighted objective function. Hence, the optimization problem was later addressed through a multi-objective approach with NSGA-II [6]. On the one hand, this approach was customized with a fixed cache size for both the instructions and data cache. On the other hand, a different cache memory configuration was used as the baseline. Thus, GE and NSGA-II approaches use a different set of parameters. As a consequence, results could not be directly compared in order to take a decision.

In this paper we provide several new contributions regarding the cache design. Firstly, we perform the experiments using the NSGA-II algorithm in the same conditions of the GE proposal, both the design space and the baseline. This configuration allows a direct comparison among both algorithms. Additionally, two baseline caches, included in general purpose devices, are added to the analysis because the first one belonged to a specific purpose device. Finally, we have added a statistical test to verify the relevance of the results. Therefore, this work completes the set of tests previously made and provide us enough information to decide the algorithm to be applied in the cache design optimization.

In addition to the cache design, we propose in this paper to apply evolutionary techniques to the register file configuration and the DMM which, considered in conjunction with the cache, comprise the whole memory subsystem in a computer. For both the register file and the DMM we propose the algorithms, perform the experiments and analyze the results on both objectives of our fitness function: execution time and energy consumption. Besides, we have incorporated statistical tests to verify the relevance or our results in both the register file and the DMM optimizations. Up to our knowledge, a complete 3-layer approach as the one we propose has not been reported previously in the literature.

We have also focused our experiments on the ARM architecture, which is present in many of the current embedded multimedia systems. Selected applications have been adapted in order to better fit to each one of the memory layers that we optimize. As we will show later in this work, the cache memory policies and the DMMs are most sensitive to improvement.

All the algorithms are coded in Java using the JECO library [7]. Besides, the experimentation has been conducted in a computer provided with an Intel i5 660 processor running at 3.3 GHz, with 8 GB of RAM and using the Ubuntu Desktop 14.04 operating system.

The rest of the paper is organized as follows. Next section summarizes the related work on the topic. Section 3 describes the thermal, performance and energy models applied. Section 4 addresses the thermal impact on the processor registers. Section 5 presents the optimization process aimed to automatically design cache configurations in order to improve performance and reduce energy consumption. Section 6 describes the optimization process to automatically evaluate and design customized DMMs, which will improve performance and reduce the memory fragmentation problem. In Section 7, we present our conclusions and describe the future work.

Section snippets

Related work

Many works can be found in the literature regarding the memory optimization. Next, we will review the closest literature to our work, separating the papers into the three memory layers we have studied.

Concerns about thermal problems, performance degradation and high energy consumption are neither new nor insignificant in the memory subsystem. The register file is identified as a component that consumes high energy, between 15% and 36% in embedded processors [8]. Multimedia applications increase

Thermal, energy and performance model

The proposed framework is based on the simulation of performance and energy consumption models. These models are used as the input for the optimization algorithms, which find an optimized design. In order to address these works, we have to apply thermal, energy and performance models, which are next described.

Register file optimization

The first layer of our methodology is the register file optimization. We present a methodology that takes into account the temperature increase due to the accesses that happen while a multimedia application is running. Then it evaluates the thermal impact of different spatial distributions of the logical registers. It applies a multi-objective evolutionary algorithm (MOEA) to obtain the optimized solutions, finally proposing the spatial distributions which better reduce the thermal impact. This

Cache memory optimization

The second layer of our methodology is the cache memory, as previously shown in Fig. 1. We propose an optimization approach which is able to determine cache configurations for multimedia embedded systems and require less execution time and energy consumption.

As seen in Fig. 5, this layer is divided into two off-line phases (labeled as 1 and 2) and a third phase devoted to optimization (labeled as 3). Firstly, the off-line phases are executed just once before the optimization. Next, the

Dynamic memory management optimization

The third layer of our methodology consists on an optimization framework based on GE and static profiling of applications to improve the dynamic memory manager (DMM) for multimedia applications, which have high dependence of dynamic memory. This is a non-intrusive method that allows to automatically evaluate complex implementations of DMMs.

In order to evaluate our proposal, we have selected six memory intensive applications: hmmer, dealII, soplex, calculix, gcc and perl. In addition, we have

Conclusions and future work

We have presented a method to optimize the memory subsystem of a computer addressing three different levels: register file, cache memory and dynamic memory management in the main memory. In all these levels we propose an evolutionary algorithm as the optimization engine, which is helped by other applications, either in a closed loop, either in off-line phases.

The optimization of the register file is based on a first step where a static profiling of the target applications is performed. Then, a

References (41)

  • J.L. Risco-Martín et al.

    Simulation of high-performance memory allocators

    Microprocess. Microsyst.

    (2011)
  • M. Zapater et al.

    Runtime data center temperature prediction using grammatical evolution techniques

    Appl. Soft Comput.

    (2016)
  • M.T. Kandemir

    Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality

    ACM Trans. Des. Autom. Electron. Syst.

    (2006)
  • H. Mahmood et al.

    Cache aging reduction with improved performance using dynamically re-sizable cache

    Proceedings of the Conference on Design, Automation & Test in Europe

    (2014)
  • M. Meterelliyoz et al.

    Analysis of SRAM and eDRAM cache memories under spatial temperature variations

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

    (2010)
  • A. Varma et al.

    Instruction-level power dissipation in the Intel XScale embedded microprocessor

    SPIEs 17th Annual Symposium on Electronic Imaging Science & Technology

    (2005)
  • J. Díaz Álvarez et al.

    Optimizing L1 cache for embedded systems through grammatical evolution

    Soft Comput.

    (2015)
  • J.D. Álvarez et al.

    Multi-objective optimization of energy consumption and execution time in a single level cache memory for embedded systems

    J. Syst. Softw.

    (2016)
  • ABSys Group

    JECO (Java Evolutionary COmputation) Library

    (2017)
  • H. Tabkhi et al.

    Application-guided power gating reducing register file static power

    IEEE Trans. Very Large Scale Integr. Syst.

    (2014)
  • J. Abella et al.

    On reducing register pressure and energy in multiple-banked register files

    21st International Conference on Computer Design, 2003. Proceedings

    (2003)
  • D. Atienza et al.

    Compiler-driven leakage energy reduction in banked register files

  • X. Zhou et al.

    Temperature-aware register reallocation for register file power-density minimization

    ACM Trans. Des. Autom. Electron. Syst.

    (2009)
  • M.M. Sabry et al.

    Thermal-aware compilation for system-on-chip processing architectures

    Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI, GLSVLSI’10

    (2010)
  • R. Wang et al.

    Futility scaling: high-associativity cache partitioning

    Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47

    (2014)
  • T. Adegbija et al.

    Energy-efficient phase-based cache tuning for multimedia applications in embedded systems

    2014 IEEE 11th Consumer Communications and Networking Conference (CCNC)

    (2014)
  • W. Wang et al.

    Dynamic cache reconfiguration for soft real-time systems

    ACM Trans. Embed. Comput. Syst.

    (2012)
  • W. Zang et al.

    A survey on cache tuning from a power/energy perspective

    ACM Comput. Surv.

    (2013)
  • ARM946E-S TM

    Technical Reference Manual

    (2014)
  • M. Feng et al.

    Dynamic access distance driven cache replacement

    ACM Trans. Archit. Code Optim.

    (2011)
  • This research has been partially supported by the Ministerio de Economía y Competitividad of Spain (Grant Refs. TIN2015-65460-C2, TIN2014-54806-R and TIN2017-85727-C4-4-P) and also by the European Regional Development Fund FEDER (Grants Refs EphemeCH TIN2014-56494-C4-1,2,3-P and IB16035) and Junta de Extremadura FEDER, project GR15068.

    View full text