# A Low Power Highly Applicable Approach for Caches Based on STT-RAM Technology

#### Neethu Anna Sabu<sup>1</sup>, Sreeja K. A.<sup>2</sup>

<sup>1</sup>M.Tech Student, Department of Electronics and Communication Engineering SCMS School of Engineering and Technology, Karukutty, Cochin, Kerala, India

<sup>2</sup>Assistant Professor, Department of Electronics and Communication Engineering SCMS School of Engineering and Technology, Karukutty, Cochin, Kerala, India

**Abstract:** The static power dissipation of the peripheral circuits of STT-RAM instruction caches is reduced in this paper .The main goal is to detect the idle time of caches in advance and thereby reduce power consumption. The architecture was further modified to reduce power consumption and to avoid data loss. PEG is introduced along with the architecture. It was applied in ATM and compared with that of conventional RAM. It was implemented and evaluated by XILINX ISE 8.1i and achieved a greater reduction in power.

Keywords: STT-RAM technology, caches, PEG- cache

#### 1. Introduction

As power consumption is becoming one of the most important constraints in VLSI field, it must be reduced to increase the efficiency of the circuit. A proper balance between power, area and speed is necessary. STT-RAM technology is becoming as one of the best memory technologies because of its characteristics like non-volatility, excellent scalability and endurance with lower power and fast read and write. Spin-transfer torque (STT) writing is a technology in which an electric current is polarized by aligning the spin direction of the electrons flowing through a magnetic tunnel junction (MTJ) element. Data writing is performed by using the spin-polarized current to change the magnetic orientation of the information storage layer in the MTJ element. It is very much applicable to embedded engineers. Various papers have been published regarding the applications of STT-RAM. It is applicable in ATMs, mobiles and other embedded applications. The major disadvantage is that there occurs power dissipation in peripheral circuits of STT-RAM instruction caches. In reference paper[1],authors have introduced an approach to reduce the static power dissipation in peripheral circuits of STT-RAM instruction caches using loop cache. Although it reduce the power consumption, there occurred loss of data and further stopped the operation of processor. Thus the efficiency of the architecture was reduced.

The loop cache made of static RAM cache was introduced between the processor and L1 instruction cache which is made of STT-RAM in ref [1].Loop cache is always just half the size of instruction cache. Whenever loop cache has an entire loop cached, L1 instruction cache is made to sleep so as to reduce power consumption. In this paper, data loss is avoided and further reduction of power is made. We are implementing the architecture using PEG-performance enhancement guaranteed cache. For STT-RAM, only the required address will be activated at a particular time.

We are adding a predefined memory to it. PEG- cache mainly consists of two counters. Hit and miss counter. Hit

counter and miss counter counts the number of hits and misses respectively. The comparator compares hits and misses and further action is made by a driving signal called throttling signal provided by it. An ATM architecture made using PEG cache with sleepy instruction caches based on STT-RAM technology and ATM using conventional RAM is compared and power comparison is made. Time and power are greatly reduced.

# 2. Performance enhancement guaranteed cache

Fig. 1 shows the architecture of PEG-cache. There are four states for it. Loop aware sleep controller in ref[1] has three states-standby, fill and active. Here, the states are-preload, active, shadow and end. Preload is same as standby in which storing of data occurs. Once the execution of the program starts, it comes in active state.

Whenever a throttling signal is provided by the comparator, it comes to the shadow state. In this state the data is loaded directly from main memory. When the execution of program finishes, it moves to end state. By the throttling performance can be improved. The loop aware sleepy instruction cache (LASIC) is modified with PEG cache

# 3. Loop aware sleepy instruction caches with PEG cache

For LASIC, based on the architecture of loop aware sleep controller, there are three states. Consider L1 instruction cache has address in 8 bit location. In standby state, storing of data in L1 instruction cache occurs ie L1 write operation. In fill mode, same data gets stored in loop cache also ie LC write operation. Then it checks for the condition. When the address is less than three, loop cache is activated and fed data to the processor making L1 instruction cache in sleep mode ie LC active. When address exceeds three, L1 is active and fed data to processor. Thus by including loop cache we can save a greater amount of power consumption. Although it reduce power, there may occur data loss. Therefore, we add a PEG cache to it. The main components are processor, mux,L1 block, a predefined main memory along with PEG cache.



Figure 1: Architecture of PEG cache

Hit counter and miss counter counts the number of hits and misses. If the data is present inside sub memory hit counter value increases and if not present miss counter value increases. If miss counter value is greater than hit counter after comparison in comparator, it will provide a throttling signal to activate the main memory. The data is already stored in main memory. In case of data loss, main memory will provide data to the processor instead of L1 block. Therefore, output will not be corrupted. If there occurs a data loss in LASIC, the output will be in don't care state as the function of the processor is stopped. The output is simulated in modelsim and the power comparison is made in Xilinx. The block diagram of LASIC modified with PEG cache is shown in fig2.A greater reduction of power compared to LASIC is obtained with absence of data loss by the introduction of PEG cache.



Figure 2: LASIC modified with PEG cache

### 4. Simulation Results

The simulation of the circuit was done in MODELSIM XE 6.3f. a and b are the given inputs along with data, address and en pins. Data out is the output. Depending upon the address and data the processor functions.



Figure 3: Simulation of proposed LASIC with PEG cache



Figure 4: Simulation of LASIC

## 5. Performance Comparison

Table below shows the performance comparison of LASIC and LASIC with PEG cache. It includes the comparison of the power in both structures. The evaluation of the existing method and proposed one is done in XILINX ISE 8.1i

| STRUCTURE               | POWER(milliwatts) |
|-------------------------|-------------------|
| LASIC                   | 159               |
| LASIC MODIFIED WITH PEG | 146               |
| CACHE                   |                   |

Table 1 shows the comparison of power in LASIC and LASIC with PEG cache.

### 6. An ATM architecture based on PEG cache

PEG architecture is implemented in Asynchronous Transfer Mode (ATM) architecture and power analysis has been done. Experimental results show that, by preloading a small number of the initial instructions, the proposed PEG-cache can achieve the same performance as a regular cache while guaranteeing performance enhancement for the worst case.PEG cache can improve the performance due to efficient reuse of cache space. PEG along with CAM provide faster searching of data. ATM using conventional RAM is compared with ATM using PEG cache in XILINX ISE 8.1i.

#### International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

| Power summary:<br>Total estimated power consumption: | I(mA)                                                                                                                                                                                                                                                                                                            | P(mW)                                                                                                                                                                |
|------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Total estimated power consumption:                   |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
|                                                      |                                                                                                                                                                                                                                                                                                                  | 89                                                                                                                                                                   |
|                                                      |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| Vccint 1.80V:                                        | 46                                                                                                                                                                                                                                                                                                               | 82                                                                                                                                                                   |
| Vcco33 3.30V:                                        | 2                                                                                                                                                                                                                                                                                                                | 7                                                                                                                                                                    |
|                                                      |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| Clocks:                                              | 0                                                                                                                                                                                                                                                                                                                | 0                                                                                                                                                                    |
| Inputs:                                              | 7                                                                                                                                                                                                                                                                                                                | 13                                                                                                                                                                   |
| Logic:                                               | 20                                                                                                                                                                                                                                                                                                               | 36                                                                                                                                                                   |
| Outputs:                                             |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| Vcco33                                               | 0                                                                                                                                                                                                                                                                                                                | 0                                                                                                                                                                    |
| Signals:                                             | 3                                                                                                                                                                                                                                                                                                                | 6                                                                                                                                                                    |
| · · · · · · · · · · · · · · · · · · ·                |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| Quiescent Vccint 1.80V:                              | 15                                                                                                                                                                                                                                                                                                               | 27                                                                                                                                                                   |
| Quiescent Vcco33 3.30V:                              | 2                                                                                                                                                                                                                                                                                                                | 7                                                                                                                                                                    |
|                                                      |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| sing PRELIMINARY data.                               |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| Node                                                 |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| scent Current differences.                           | al designs with a                                                                                                                                                                                                                                                                                                | active                                                                                                                                                               |
| ld design scenarios.                                 | ar ocorgan orea e                                                                                                                                                                                                                                                                                                | North L                                                                                                                                                              |
|                                                      |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
| ster output signals have been set.                   |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                      |
|                                                      | Vccint 1.80V:<br>Vcco3 3.30V;<br>Clocks:<br>Inputs:<br>Logic:<br>Outputs:<br>Vcco33<br>Signals:<br>Quiescent Vcco33 3.30V;<br>entry PRELIBINARY data.<br>Rode<br>seent Current difference.<br>in number 10 forfower area based on measurements of re-<br>in number 10 forfower area based on measurements of re- | Vccint 1.80V: 46   Vcco33 3.30V; 2   Clocks: 0   Inputs: 7   Logic: 20   Outputs: 7   Vcco33 0   Signals: 3   Quiescent Vccint 1.80V: 15   Quiescent Vcco33 3.30V: 2 |

Figure 5: Power summary of ATM with PEG cache

| ÷ | -                  |                   |
|---|--------------------|-------------------|
|   | STRUCTURE          | POWER(milliwatts) |
|   | ATM WITH RAM       | 97                |
|   | ATM WITH PEG cache | 89                |

Table 2 shows the power comparison of ATM with conventional RAM and ATM with PEG cache

## 7. Conclusion

This paper presented an approach to reduce the static power dissipation in peripheral circuits of STT-RAM instruction caches. Data loss is prevented by the introduction of PEG cache. Further reduction of power is achieved by the proposed paper.LASIC with spin logic can be used in high speed portable devices with extended battery life. It is used in CAM. CAM is ideally suited for many applications, including Ethernet address look-up, data compression, pattern recognition, cache tags, fast routing table look-up, high-bandwidth address filtering, user privileges, and security and encryption information. This will be promising technology in nearby future as it combined the power down strategy with improved version of loop caches & non volatility of STT RAM. Simulation results showed that there occurs significant reduction in power and thus improved the performance of the circuit.

## References

- Junwhan Ahn and Kiyoung Choi, "LASIC based on STT RAM technology", IEEE transactions on VLSI,vol no:22,May 2014.
- [2] X. Dong, X. Wu, G. Sun, Y. Xie, H. H. Li, and Y. Chen, "Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement," in *Proc. 45th Design Autom. Conf.*, Jun. 2008, pp. 554–559.
- [3] E. Chen, D. Apalkov, Z. Diao, A. Driskill-Smith, D. Druist, D. Lottis, V. Nikitin, X. Tang, S. Watts, S. Wang, S. A. Wolf, A. W. Ghosh, J. W. Lu, S. J. Poon, M. Stan, W. H. Butler, S. Gupta, C. K. A. Mewes, T. Mewes, and P. B. Visscher, "Advances and future prospects of spin transfer torque random access memory," *IEEE Trans. Magn.*, vol. 46, no. 6, pp. 1873–1878, Jun. 2010.

- [4] X. Guo, E. Ipek, and T. Soyata, "Resistive computation: Avoiding the power wall with low-leakage, STT-MRAM based computing," in *Proc. 37th Int. Symp. Comput. Archit.*, Jun. 2010, pp. 371–382.
- [5] A. Driskill-Smith, D. Apalkov, V. Nikitin, X. Tang, S. Watts, D. Lottis, K. Moon, A. Khvalkovskiy, R. Kawakami, X. Luo, A. Ong, E. Chen, and M. Krounbi, "Latest advances and roadmap for in-plane and perpendicular STT-RAM," in *Proc. 3rd IEEE Int. Memory Workshop*, May 2011, pp. 1–3.
- [6] W. Xu, H. Sun, X. Wang, Y. Chen, and T. Zhang, "Design of lastlevel on-chip cache using spin-torque transfer RAM (STT RAM)," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 3,pp. 483–493, Mar. 2011.
- [7] K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy caches: Simple techniques for reducing leakage power," in *Proc. 29th Ann. Int. Symp. Comput. Archit.*, May 2002.
- [8] S. Kaxiras, Z. Hu, and M. Martonosi, "Cache decay: Exploiting generational behavior to reduce cache leakage power," in *Proc. 28th Ann. Int. Symp. Comput. Archit.*, May 2001, pp. 240–251.
- [9] A. Jog, A. K. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. R. Das, "Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs," in *Proc. 49th Design Autom. Conf.*, Jun. 2012, pp. 243– 252.
- [10] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A tool to model large caches," HP Labs., Palo Alto, CA, USA, Tech. Rep. HPL-2009-85, 2009.
- [11] J. L. Henning, "SPEC CPU2000: Measuring CPU performance in the new millennium," *IEEE Comput.*, vol. 33, no. 7, pp. 28–35, Jul. 2000.
- [12] A. KleinOsowski and D. J. Lilja, "MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research," *IEEE Comput. Archit. Lett.*, vol. 1, no. 1, pp. 1–7, Jan. 2002.

### **Author Profile**



**Neethu Anna Sabu** received the B.Tech degree in Electronics And Communication Engineering from Kerala University, at Shahul Hameed College of Engineering And Technology in 2012 and now she is r M.Tech degree in VLSI and Embedded systems under

pursuing her M.Tech degree in VLSI and Embedded systems under Mahatma Gandhi University in SCMS School of Engineering and Technology, Cochin.