• 검색 결과가 없습니다.

Code compression

N/A
N/A
Protected

Academic year: 2022

Share "Code compression"

Copied!
20
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Chapter 2-2: CPUs

Soo-Ik Chae Soo Ik Chae

High Performance Embedded Computing 1

Topics p

„

Memory systems.

‰ Memory component models.y

‰ Caches and alternatives.

„

Code compression

„

Code compression.

(2)

Generic memory block y

High Performance Embedded Computing

3

Simple memory model p y

„

Core array is n rows x m columns.

„

Total area A = A

rr

+ A

xx

+ A

pp

+ A

cc

.

„

Row decoder area A

r

= a

r

n.

C A

„

Core area A

x

= a

x

mn.

„

Precharge circuit area A g

pp

= a

pp

m.

„

Column decoder area A

c

= a

c

m.

(3)

Simple energy and delay models p gy y

„

Δ = Δ

setup

+ Δ

r

+ Δ

x

+ Δ

bit

+ Δ

c

.

‰ Setup delay is for the precharge circuitryy g y

„

Total energy E = E

D

+ E

S

„

Total energy E = E

D

+ E

S

.

‰ Static energy component ES is a technology parameter

parameter.

‰ Dynamic energy ED = Er + Ex + Ep + Ec.

High Performance Embedded Computing

5

Multiport memories p

structure

Delay vs. memory size and number of ports.

(4)

Kamble and Ghose cache power model p

High Performance Embedded Computing

7

Kamble and Ghose cache power model p

(5)

Kamble and Ghose cache power model p

High Performance Embedded Computing

9

Kamble and Ghose cache power model p

(6)

Kamble and Ghose cache power model p

Capacity D = SLm [bytes]

Total Tag = STm [bits]

High Performance Embedded Computing

11

Kamble and Ghose cache power model p

(7)

Kamble and Ghose cache power model p

High Performance Embedded Computing

13

Kamble and Ghose cache power model p

(8)

Shiue and Chakrabarti cache energy model gy

„ add_bs: number of transitions on address bus per instruction.

„ data_bs: number of transitions on data bus per instruction.

„ word_line_size: number of memory cells on a word line.

„ bit_line_size: number of memory cells on a bit line.

„ Em: Energy consumption of a main memory access.

„ α, β, γ: technology parameters.

High Performance Embedded Computing

15

Shiue/Chakrabarti, cont’d. / ,

E Cell E_Cell

(9)

Register files g

„

First stage in the memory hierarchy.

„

When too many values are live, some values y , must be spilled onto main memory and read back later

back later.

‰ Spills cost time, energy.

R i t fil t

„

Register file parameters:

‰ Number of words.

‰ Number of ports.

High Performance Embedded Computing

17

Performance and energy vs. register file gy g size.

[Weh01] © 2001 IEEE

(10)

Caches

„

Cache design has received a lot of attention in general purpose computer design

„

Most of the lessons apply to embedded computer as well

computer as well.

„

Caches have a sweet spot: neither too small nor too large

nor too large.

High Performance Embedded Computing

19

Cache size vs. energy gy

[Li98]

© 1998 IEEE

(11)

Cache parameters p

„

Cache size:

‰ Larger caches hold more data, burn more energy, g gy take area away from other functions.

„

Number of sets: Number of sets:

‰ More independent references, more locations mapped onto each line

mapped onto each line.

„

Cache line length:

L li i f t hi b d idth

‰ Longer lines give more prefetching bandwidth, higher energy consumption.

High Performance Embedded Computing

21

Wolfe/Lam classification of program p g behavior in caches

„

Self-temporal reuse: same array element is accessed in different loop iterations.

„

Self-spatial reuse: same cache line is accessed in different loop iterations accessed in different loop iterations.

„

Group-temporal reuse: different parts of the program access the same arra element

program access the same array element.

„

Group-spatial reuse: different parts of the

program access the same cache line.

(12)

Multilevel cache optimization p

„

Gordon-Ross et al developed a method to optimize multilevel cache hierarchies, which adjust cache parameters in order:

‰ Cache size.

‰ Line size.

‰ Associativity

‰ Associativity.

„

Choose cache size for each level, then line i f h l l d fi ll i ti it f size for each level, and finally associativity for each level.

High Performance Embedded Computing

23

Scratch pad memory p y

„ Scratch pad is managed by software, not hardware.

‰ Provides predictable access time.

‰ Requires values to be allocated.

It i fi d t f th ’

„ It is a fixed part of the processor’s memory space

„ Use standard read/write instructions to access t h d

scratch pad.

(13)

Scratch pad memory p y

High Performance Embedded Computing

25

Code compression p

„

Extreme version of instruction encoding:

‰ Use variable-bit instructions.

‰ Generate encodings using compression algorithms.g

„

Generally takes longer to decode.

Can result in performance energy code size

„

Can result in performance, energy, code size improvements.

„

IBM CodePack (PowerPC) used Huffman

encoding. g

(14)

Terms

„

Compression ratio:

‰ Compressed code size/uncompressed code size * 100%.

‰ Must take into account all overheads.

High Performance Embedded Computing

27

Wolfe/Chanin approach / pp

Obj t d i f d t

„ Object code is fed to lossless compression algorithm

Source code

algorithm.

‰ Wolfe/Chanin used Huffman’s algorithm.

compiler

„ Compressed object

code becomes program image

Object code

image.

„ Code is decompressed on the fly during

compressor

on-the-fly during

execution. Compressedobject code

(15)

Wolfe/Chanin execution /

„ Instructions are

decompressed when read from main memory.

memory

from main memory.

‰ Data is not compressed or decompressed.

„ Cache holds uncompressed d

instructions.

‰ Longer latency for

decompressor

‰ Longer latency for instruction fetch.

„ CPU does not require

CPU cache

significant modifications.

High Performance Embedded Computing

29

Huffman codingg

„ Input stream is a

sequence of symbols.

„ Each symbol’s probability of

i k

occurrence is known.

„ Construct a binary tree f b biliti f th of probabilities from the bottom up.

P th f t

‰ Path from room to symbol gives code for that symbol.y

(16)

Compressed vs. uncompressed code p p

Code must be

„ Code must be

uncompressed from many different starting points during branches

during branches.

„ Code compression

algorithms are designed to decode from the start of a

add r1, r2, r3

decode from the start of a stream.

„ Compressed code is organized into blocks

mov r1, a bne r1, foo

organized into blocks.

‰ Uncompress at start of block.

U d bit b t bl k

uncompressed compressed

„ Unused bits between blocks constitute overhead (due to branch).

High Performance Embedded Computing

31

Block structure and compression p

„

Trade-off:

‰ Compression algorithms work best on long blocks.

‰ Program branchingg g works best with short blocks.

„

Labels in program move during compression.

„

Two approaches:

„

Two approaches:

‰ Wolfe and Chanin used branch table to translate branches during execution (adds code size)

branches during execution (adds code size).

‰ Lefurgy et al. patched compressed code to refer branches to compressed locations. (branch

b a c es o co p essed oca o s (b a c patching)

(17)

Compression ratio vs. block size p

[Lek99b] © 1999 IEEE

High Performance Embedded Computing

33

Pre-cache compression p

„ Decompress as

instructions come out of

th h

the cache.

„ One instruction must be

d d

decompressed many times.

P h ll

„ Program has smaller cache footprint.

(18)

Encoding algorithms g g

„

Data compression has developed a large number of compression algorithms.

„

These algorithms were designed for different constraints:

constraints:

‰ Large text files.

No real time or power constraints

‰ No real-time or power constraints.

„

Evaluate existing algorithms under the

requirements of code compressions, develop new algorithms.

High Performance Embedded Computing

35

Energy savings evaluation gy g

„ Yoshida et al. used

dictionary-based encoding.

„ Power reduction ratio:

„ Power reduction ratio:

‰ N: number of instructions in original program.

‰ m: bit width of those instructions.

‰ n: number of compressed

‰ n: number of compressed instructions.

‰ k: ratio of on-chip/off-chip memory power dissipation.

(19)

Arithmetic codingg

H ff di

„ Huffman coding maps symbols onto the

integer number line integer number line.

„ Arithmetic coding maps symbols onto the real symbols onto the real number line.

‰ Can handle arbitrarily fi di ti ti i b l fine distinctions in symbol probabilities.

„ Table-based method

?

„ Table based method allows fixed-point arithmetic to be used.

High Performance Embedded Computing

37

Code and data compression p

„

Unlike (non-modifiable) code, data must be compressed and decompressed dynamically.

„

Can substantially reduce cache footprints.

„

Requires different trade offs

„

Requires different trade-offs.

(20)

Lempel-Ziv algorithm p g

Source

„ Dictionary-based method.

Source text

„ Decoder builds dictionary during

d i

Coder Dictionary

Compressed

decompression process.

„ LZW variant uses a

fi d i b ff C d Di ti

Compressed text

fixed-size buffer.

U d

Coder Dictionary

Uncompressed source

High Performance Embedded Computing

39

참조

관련 문서

The “Asset Allocation” portfolio assumes the following weights: 25% in the S&P 500, 10% in the Russell 2000, 15% in the MSCI EAFE, 5% in the MSCI EME, 25% in the

1 John Owen, Justification by Faith Alone, in The Works of John Owen, ed. John Bolt, trans. Scott Clark, "Do This and Live: Christ's Active Obedience as the

The above-mentioned findings illustrated that the Gestalt group counseling program was one of efficient programs to boost the internal control of children

Diagrams show the spatial distribution of individual trees and stand profile(a), crown projection(b) for Group of Quercus glauca... The comparison of

The purpose of this research is to the development and effect of the anthroposophy group art therapy program for improvement of creativity and reduction

Tuple s are immutable ordered collections of arbitrary distinct objects, either of the same type or of different types.. Other data types in Julia are generally

웹 표준을 지원하는 플랫폼에서 큰 수정없이 실행 가능함 패키징을 통해 다양한 기기를 위한 앱을 작성할 수 있음 네이티브 앱과

• Spatial coherence describes the correlation between waves at different points in space.. Temporal