Area (gates)

(1)

CAP Laboratory, SNU 1

Schedule

1. Introduction

2. System Modeling Language: System C *

3. HW/SW Cosimulation *

4. C-based Design *

5. Data-flow Model and SW Synthesis

6. HW and Interface Synthesis (Midterm)

7. Models of Computation

8. Model based Design of Embedded SW

9. Design Space Exploration (Final Exam)

(Term Project)

(2)

Reference

PeaCE Approach

z Hyunuk Jung, Kangnyoung Lee, and Soonhoi Ha, “Efficient Hardware Controller Synthesis for Synchronous Data Flow in System Level Design,” IEEE Transactions on Very Large Scale Integration(VLSI) Systems Vol. 10 pp 423-428 August 2002

z Hyunuk Jung, Hoeseok Yang, and Soonhoi Ha, "Optimized RTL

Code Generation from Coarse-Grain Dataflow Specification for Fast HW/SW Cosynthesis", Journal of VLSI Signal Processing (online published) 10, June. 2007

(3)

Hardware Synthesis Problem

Automatic Hardware Synthesis from Coarse-Grained Dataflow Specification in System Level Design

z A node represents a coarse grain computation block such as FIR filter or DCT.

z A node has complex properties such as data sample rates, I/O timings, data types, and its internal states.

z A central controller should be generated automatically in order to control these complex coarse-grain HW library blocks and registers.

DFG ^B

C D

DFG ^B

C D

Block

Libraries ^B ^C ^D Block

Libraries ^B ^C ^D ^controller

C B clk

rst D

controller C

B clk

rst D

(5)

Design Size & Abstraction Level

1970s, several hundred transistors

z Transistor and gate level design

1980s

z Register Transfer Level (RTL) design

z Hardware Description Language (HDL)

1990s

z High-level synthesis (or behavioral synthesis)

z Behavioral HDL, imperative programming languages (C,C++)

Today, several million gates

z Exponential increase in transistor density

z HW/SW Co-design (System Level Design)

z We need higher-level tools for system design and hardware description

(6)

Conventional High-level Synthesis

Hardware synthesis

z Conventional architecture and logic synthesis techniques are used

Focus!

Functional Spec HDL Coding

Simulation Logic Synthesis

Layout Synthesis Back Annotation

Layout

Architecture Synthesis Register-Transfer-Level

Behavioral-Level

Gate-Level

(7)

Architecture Synthesis Problem

Scheduling Resource binding Data path synthesis Control logic design

… behavioral model

(CDFG)

constraints (timing, area, performance, resource binding)

resource

(library + module generator)

RTL description

primitives

area, delay given

area, delay estimated Architecture Synthesis

(8)

New Trend: C-based design

From C/C++ to Hardware

z Mentor Graphics(www.mentor.com): “Catapult C Cynthesis”

z Forte design systems (www.forteds.com): “Cynthesizer”

z Synfora(www.synfora.com): “PICO Express”

z Y Explorations Inc.(www.yxi.com): “eXCite”

z Celoxica(www.celoxica.com): “Agility Compiler”

− RTL level compiler

(9)

System-level HW Synthesis

Higher Level Hardware Synthesis

z Increasing need for a design methodology of higher abstraction level

z Growing complexity, fast design turn-around time

z Easy to modify and maintain

Automatic code generation from data flow graph

z SDF semantics should be preserved - “refinement”

z The kernel code of a block is already optimized in the library.

z Determine the schedule and resource allocation.

z Controller is generated according to the scheduled sequence and the resource mapping

Fundamental Question

z Can we generate the HDL code with the synthesizable area and the similar performance as manually optimized code?

(10)

Hardware Synthesis Strategies

Partitioned Graph

Behavioral level HDL

Behavioral level Synthesis

(high-level synthesis)

RT- level HDL

Synthesizable C

Logic-Synthesis C-to-HDL/HW

Cycle-based C

C-to-HW

Current PeaCE

Future

(11)

System Design Flow in PeaCE

Architecture Specification Architecture Specification

Partitioning/Scheduling Partitioning/Scheduling

SW

C code generation SW

C code generation VHDL code generationHW

VHDL code generationHW Dataflow Specification Dataflow Specification

Cosimulation/Cosynthesis Cosimulation/Cosynthesis

SW subgraph SW schedule

VHDL Code C Code

Node-PE Performance DB

Node-PE Performance DB HW Schedule Info.

HW Subgraph

(12)

HW/SW Interface

Hardware Synthesis for HW/SW Cosynthesis

z After HW/SW partitioning of an initial dataflow specification, a partitioned subgraph mapped to hardware is automatically generated.

z A partitioned subgraph has interfacing blocks such as SND(send) and RCV(receive) blocks for communication.

− These interfacing blocks have internal buffers and shared memory access logics.

1 1 4 1 1 1B

A C

D

E

1

1 4

1 1 1 4 1 1 1B C

RCV1 SND1

Mapped to hardware

Initial Dataflow Hardware

(13)

HW/SW Cosynthesis

wrapper

AHB

uProcessor

Memory

S R

R

S R

S

OS/device driver

Synthesized HW (VHDL domain)

HW SW

SW SW

(14)

GRAPE Approach

GRAPE (Graphical Rapid Prototyping Environment) is a HW/SW codesign environment for the functional emulation of DSP systems.

Using cyclo-static dataflow specification

z Sample rates are changed periodically

Distributed controls

z Using hand-shaking protocol between blocks

z FIFO buffers

z No central controller

Generated hardware implementation has one-to-one correspondence to dataflow specification

z Simple architecture generation

(16)

GRAPE Standard Interface

R data

strb rdy

data wr wr_ok

B data

wr wr_ok

data rd_ok rd

U data

rd_ok rd

data wr wr_ok

receive node buffer user task node

R1

R2

S1 B4

B3 B1

B2

U1

U2

U3 B5 U4 B6

Asynchronous communication between every blocks using hand-shaking protocol

(17)

Meyr’s Approach

Using Synchronous Dataflow

Serialized I/O of multi-rate port

Fully static I/O timing analysis

M1 M2 M3

IN SIG1

SIG2 SIG3

OU T Synchronous Dataflow

Graph

M1 M2 M3

IN

RTL Target Architecture

IF1

IF3

OUT IF2

Pattern adjust

Initial Values Shimming

registers Stall

Gen.

Reset Gen.

clock

(18)

Ptolemy Approach: VHDL domain

Sequential VHDL code generation

z For simulation

z Entire application is described in a single process using only variables.

Structural VHDL code generation

z For synthesis

z Individual firings(or invocations) of a node are instantiated in separate hardware resources

− Fully parallel HW architecture for multi-rate specification

A 4 1

B

A B B B

B B

A B

(19)

Limitations of Previous Approaches

GRAPE

z Distributed control using handshaking protocol

z Synthesis problem is simple

z Only for rapid prototyping

Meyr’s

z Supports only static I/O timings

z Resource sharing is not considered

Ptolemy

z Impractically large area overhead in case of multi-rate specification

(20)

Comparison among Approaches

Approaches Ptolemy Meyr’s GRAPE PeaCE

Implementati on of multi- rate spec.

Parallel implementat ion

Sequential implementati on

Parallel/Sequenti al/Hybrid

implementation Resource

allocation

Multiple- resource allocation

Single- resource allocation

Multiple/Single/S hared –resource allocation

Inter-block Communicati on

Synchronou s

Asynchrono us (FIFO)

Synchronous Block control Centralized

control

Centralized control

Distributed control

Centralized Control

Block exec.

time

Fixed Fixed Variable Variable

(21)

Block Libraries

Types of block implementations

z A : Combinational logic

z B : Single-cycle sequential logic

z C : Multi-cycle sequential logic with fixed execution time

z D : Multi-cycle sequential logic with variable execution time.

Timing model of HW block

z Execution time of block

z I/O of multi-rate block

(23)

Block Types & Control Signals

A B C D

^SND

Type A : combinational logic

Type B : single-cycle sequential logic

Type C : multi-cycle sequential logic with fixed execution time Type D : multi-cycle sequential logic with variable execution time RCV

state_update signal clock

reset

start signal reset

start signal

done signal

RCV A B C D SND

clock

reset clock

en_b en_c en_d

en_a

(24)

Execution Time = propagation delay / clock period (cycles)

Type A : Combinational logic

Adder

inputs output

Execution time A

B

C ADDER

(25)

VHDL Star with States

This logic is separated into combinational logic and state and implemented as Mealy machine.

A Accumulator C

Type B

: Single-cycle sequential logic

Mealy type state machine Adder

state register

State update signal

Execution Time = propagation delay / clock period (cycles)

(26)

A FIR filter C

Type C : Multi-cycle sequential logic with fixed execution time

FIR filter

clock reset start

output update signal

Multi-cycle logic (fixed)

The number of cycles is fixed.

Clock and reset signal are needed.

Controller should provide start and output update signal.

Execution time = specified number of cycles

Output Register

(27)

A Divider C

Type D : Multi-cycle sequential logic with variable execution time

Divider

clock reset start

output update signal

Multi-cycle logic (variable)

The number of cycles varies at run-time.

Clock and reset signal are needed.

Controller should provide start and output update signal.

Done signal should be generated by a library block and be used to decide its finish time by the controller.

Execution time = specified number of cycles

Output Register

done signal

(28)

Timing Model of HW Library Block

HW Block inputs

outputs

Execution time

Strict Execution

z A block can start its execution after all its inputs are valid and finish its execution after all its outputs are valid.

(29)

Timing Model of HW Library Block

Start time = 3, End time = 8

Execution time = End time – Start time + 1 = 6(cycles)

Execution time

Input valid timing

8 9 10

3 4 5 6 7

2

clock

start signal

output latch signal output valid timing counter

(30)

Timing Model of Multi-rate Block

Only Parallel I/O of multi-rate block is supported.

FRDF implementation can make it possible to serialize the I/O operation

A

1 2 A

time

I O

I A O1

O2

time I

O1 O2

Serial I/O Parallel I/O

A

1 /2 1

(31)

Controller Synthesis

Issue 1: Solving non-deterministic timing of I/O

z Communication with the outside of hardware module

− HW/SW Interface

z Communication between blocks inside of hardware module

− Blocks with variable execution time

Issue2 : Supporting various schedule

z Looped scheduling

z Resource sharing

z Buffer management

(33)

Communication between Modules

The types of communication schemes

z Synchronous communication

− Communication timing is predetermined.

− Drawback : Tasks should be scheduled assuming the worst case execution time.

z Batch communication

− It is possible to emulate synchronous communication with buffers in asynchronous interface.

− Drawback : It cannot be applied to DFG with global feedback.

z Asynchronous communication

− Communication timing is varied at run-time.

z There exist many cases in which asynchronous communication scheme is an efficient or a unique solution.

z The asynchronous communication with the outside is not considered in the previous approaches except GRAPE

(34)

Basic Idea

Counter-based solution

z simple and intuitive

RCV

B C

D SND

A

Combinational logic

count : 60 v

enable

z zero

enable signal of send buffer and state register valid signal of

receive node

10

30

20 20

state register

(35)

Multiple RCV nodes

Main goal

z Obtaining the earliest time for the readiness of output regardless of the order in which the inputs arrive

Valid timing equation of send node

RCV1 A B

D C

RCV2 SND RCV3

D1 = 60 D2 = 40 D3 = 50 30

10

20 20

) (

max

_i _i

i

RT D

VT = +

VT : valid timing

RTⁱ : receive timing of i-th receive node

Dⁱ : critical path length from the i-th receive node

(36)

Cascaded Counter: Idea

RCV1 A B

D C

RCV2 SND RCV3

30

20 60 50 40 0

10 20

RCV1 RCV3 RCV2 SND

D1 = 60, D2 = 40, D3 = 50

critical path time

length computation

count : 10 count : 10 count : 40

z z

RCV3 v

RCV2

v Enable signal for send buffer and delay register update and clear signal for valid and zero register

RCV1 v

z

SND

cascaded counter

(37)

With Multiple Send Nodes

60 50 40 30 0 time

RCV1 RCV3 RCV2 SND1 SND2

D1,1 = 30 D1,2 = 60 D2,1 = 10 D2,2 = 40 D3,2 = 50

RCV1 A B

D C

RCV2 SND2 RCV3

SND1 30

10

20 20

compare : 30 =

SND1 count : 10 count : 10 count : 40

z z

RCV3 v

RCV2

v Enable signal for send buffer and delay register update and clear signal for valid and zero register

RCV1 v

z

SND2

(38)

Delay elements

Delay elements may exist and they correspond to data registers in hardware implementation.

A C

Delay Register

Central Controller Enable signal

clock reset

B D

A B C D

With Delay Registers

(39)

With Delay Registers

40 20 0 time RCV1 SND1 D1

RCV1 A B C SND1

10 20

30

D1 : delay element

count : 40

v z

RCV1

compare : 20

= SND1

enable signal of delay register D1

(40)

Nodes with Variable Execution Time

The cascaded counter controller provides a clean solution for this node.

A B C

An asynchronous node that takes non- deterministic time unit for its execution

A

^SND

B

^RCV

C

Modify!

start done

clock

(41)

Equivalent FSM Controller Implementation

Central Controller : FSM rstclk

CounterValue IterationBound

start signals RegisterEnable signals

done or rcv signals : check point

Currently, we implement FSM controller equivalent to cascaded counter in VHDL domain of PeaCE.

In this implementation, we use only one increasing counter with multiple check logics.

(42)

Equivalent FSM Controller Implementation

If rst = ‘1’ then

Counter <= 0;

elsif rising_edge(clk) then

if Counter = CheckValue0 and CheckSig(0) = ‘0’ then Counter <= Counter; -- hold value

elsif Counter = CheckValue1 and CheckSig(1) = ‘0’ then Counter <= Counter; -- hold value

elsif Counter = LastValue then

Counter <= 0; -- initialize else

Counter <= Counter+1; -- counting..

end if;

If Counter = LastValue then

IterationBound <= ‘1’;

Else IterationBound <= ‘0’;

Example Code

(43)

Looping Control

PeaCE supports controller generation for looped schedule

Looped schedule can be structured hierarchically.

A 4 1

B

A

B B B

B time

B B B B

A

A B

B B B

A B

M U X

LOOP 4

(44)

Looping Control

A

B B B

B time

A B

M U X

Loop1 : Loop1_Counter

Top level Counter : CounterValue

control flow

A_start <=

'1' when CounterValue = 0 else '0';

B_start <=

'1' when Loop1_Counter = 0 and Loop1_busy = ‘1’ else '0';

Loop1_start <=

‘1’ when CounterValue = 20 else ‘0’;

(45)

Buffer Management

Multi-rate buffering

Data types

z Int, Macroblock(16x16), Frame(176x144)

Register : small data type

z I/O timing control

z Buffer allocation

Memory : large data type

z Memory access logic

z Synchronization

z Memory allocation

A B

2 3

A B AA

M U X

B A

Memory sync.

(46)

Resource Management

Resource sharing or Multiple instantiation

Input multiplexing and output buffer access

X

MU X

MU

X

Mux select signal

Output buffer latch signal

X

^multiplier

X X X

(47)

DFG

B C

D

Block Libraries

B C D

controller C

B clk

rst D

Schedule-based HW Synthesis

(49)

Previous Works

2-dimensional DCT algorithm

Generated Hardware Architecture from Ptolemy

Transpose

8x8 DCT1D ^Transpose

8x8 DCT1D

64 8 8

64 8 64 64 8

8 1-dimensional DCT blocks

8 1-dimensional DCT blocks DCT1D

DCT1D DCT1D DCT1D DCT1D DCT1D DCT1D DCT1D

Transpose 8x8 matrix

DCT1D DCT1D DCT1D DCT1D DCT1D DCT1D DCT1D DCT1D

Transpose 8x8 matrix

64 16bit inputs

8 16bit signals

(50)

Previous Works

Generated Hardware Architecture from Meyr’s works

Generated Hardware Architecture from GRAPE

DCT 1D ctrl ctrl

FIFO with 64 buffers

wr

wr_ok rd

rd_ok

DCT 1D

M U X M

U X

controller

(51)

Motivation

In the previous works, a single execution schedule is assumed for HW implementation.

But, proposed approach allows the designer to provide the execution schedule:

a multi-rate dataflow graph can be implemented into many hardware architectures.

Multi-rate

Dataflow Graph 1 4 1 1B C

B

C C C

C time

hardware

resources B

C

C C

C B

C C

Fully-sequential Fully-parallel Hybrid

(52)

Motivation

Sharing

multiple-instantiation

input ^DCT

1D

M U X

controller

output

DCT 1D

M U X

DCT 1D

M U X

DCT 1D

M U X

M U X Transpose

8x8 DCT1D Transpose

8x8 DCT1D

64 8 8

64 8 64 64 8

(53)

Schedule Information 1

# resource allocation table Transpose 2

DCT1D 2

# resource mapping & schedule information

# (instance name, resource number, start, duration)

# loop ( loop count, start, loop period) Transpose_0 0 0 1

Loop 8 1 2 {

DCT1D_0 0 0 2 }

Transpose_1 1 17 1 Loop 8 18 2 {

DCT1D_1 1 0 2 }

DCT 1D

M U X M

U X

controller

Transpose

8x8 DCT1D

64 8 8

64 8 64 64 8

0 1

DCT1D_0 DCT1D_1 1-to-1 mapping of

graph node ÅÆ HW resource

(54)

Schedule Information 2 : Sharing

DCT1D 1

# resource mapping & schedule information

# (instance name, resource number, start, duration)

# loop ( loop count, start, loop period) Transpose_0 0 0 1

Loop 8 1 2 {

DCT1D_0 0 0 2 }

DCT1D_1 0 0 2 }

input ^DCT

1D

M U

X output

Transpose

8x8 DCT1D

64 8 8

64 8 64 64 8

DCT1D_0 DCT1D_1

0

N-to-1 mapping of

(55)

Schedule Information 3 : Multiple Instantiation

DCT1D 4

# resource mapping & schedule information Transpose_0 0 0 1

Loop 4 1 2 {

DCT1D_0 0 0 2 DCT1D_0 1 0 2 }

DCT1D_1 2 0 2 DCT1D_1 3 0 2

} DCT

1D

M U X

controller DCT

1D

DCT 1D

M U X

DCT 1D

M U X

M U X Transpose

8x8 DCT1D

64 8 8

64 8 64 64 8

DCT1D_0 DCT1D_1

0 2

1-to-N mapping of

1 3

(56)

HW Controller Generation

Counter-based Controller

z Buffer control, Mux control, start and done signal of block

DCT1D_res0_sel

DCT 1D

M U X

Loop1 Counter Loop1 IterNum Buffer Controller

MUX Controller

DCT1D_res0_input

DCT1D_0_output_0_en DCT1D_0_output_1_en

(57)

Experiment 1 :

2-dimensional DCT Algorithm

0 50000 100000 150000 200000 250000 300000

0 100 200 300 400 500 600 700 800

1/Throughput (ns/sample)

Area (gates)

Ptolemy GRAPE Auto Manual

16 IDCT resources

1 IDCT resources : Sharing

(58)

Fractional Rate Dataflow Specification

The gap between automatic and manual design still exists.

z We cannot optimize the automatic design further because of dataflow semantic.

z Dataflow semantic has more strict rules for firing.

− This requires more buffers to satisfy firing condition.

z Real design has more freedom of implementation for efficient design

It is necessary to reduce the buffer requirements for practical efficient design

z We choose FRDF in which fractional number of data samples can be produced or consumed.

z FRDF makes the automatic design a little closer to the manual design.

(60)

Fractional Rate Dataflow Specification

Every block with multi-rate specification has its equivalent block with FRDF specification.

z

Functionally equivalent

z

Internal algorithm and its schedule can be different.

4 1

Add4

1 1/4

Add4

• In one invocation (or firing),

• Comsumes 4 input data samples

• Produces 1 output data sample

• Requires 4 input buffers

• Consumes 1 input data sample in one invocation

• Produces only 1 output data sample during 4 invocations

• Requires only 1 input buffers

• Requires 4 invocations to perform entire function

(61)

Fractional Rate Dataflow(FRDF)

Non-FRDF implementation

z “Add4” block is invoked after its all inputs are valid.

z Parallel I/O

Ramp Sink

1 4 1 1

Add4

LOOP 4

Ramp Add4 sink

Ramp Ramp Ramp Ramp Add4 sink

time Combinational logic

(62)

Fractional Rate Dataflow(FRDF)

FRDF implementation

z The execution of block “Add4” is divided into 4 phases.

z Serial I/O at each phase

Ramp ^Sink

1 1 1/4 1

Add4

Ramp Add4

LOOP 4

sink

Ramp Add4 sink

phase0 Ramp Add4

phase3

Sequential logic with internal state : sum & phase

time

(63)

DeQ IZ IDCT

Skip 1 Mux

1 1

1 1 1 1 1 1 1

DeQ IZ IDCT

Skip 1 Mux

1 1

1 1 1 1 1 1 1

DeQ IZ IDCT

Skip 1 Mux

1 1

1 1 1 1 1 1 1

Motion Compensation 4

1

4 1 1

dx dy mode

QP

Repeat Repeat 1

1 4 1

1

1 1

1

1 4

1

16bit 8x8 Block FRAME

8bit integer YBlock

UBlock

VBlock

Previous Y,U,V Frame

1/99 1/99 1/99 1/991/99

1/99

Experiment 2 :

Parts of H.263 Decoder

No Resource Sharing

Core: 282383, Buffer: 172032, Glue logic: 52575

Total Area: 506,987 gates

X

(64)

DeQ IZ IDCT Skip

1 Mux

1 1

1 1 1 1 1 1 1

DeQ IZ IDCT

Skip 1 Mux

1 1

1 1 1 1 1 1 1

DeQ IZ IDCT

Skip 1 Mux

1 1

1 1 1 1 1 1 1

Motion Compensation 4

1

4 1 1

dx dy mode

QP

Repeat Repeat 1

1 4 1

1

1 1

1

1 4

1

16bit 8x8 Block FRAME

8bit integer YBlock

UBlock

VBlock

Previous Y,U,V Frame

1/99 1/99 1/99 1/991/99

1/99

Experiment 2 :

Parts of H.263 Decoder

Maximum Resource Sharing

Core: 161164, Buffer: 172032, Glue logic: 66304

X

(65)

Experiment 2 :

Parts of H.263 Decoder

DeQ IZ IDCT

Skip

Mux

Read Prev Block & Half Pixel

Truncation

& ADD Saturation

Mux WriteBlock 16bit 8x8 Block 8bit 8x8 Block FRAME

dx dy

1/(6x99)

1

1/(6x99) 1/6

mode 1/6

CBP

1

8bit integer

SRAM

1

1 1

1 1 1 1 1 1 1 1

1

1 1

1 1 1 1

1

• More fractional rate specification

• Separate Y, U, and V data paths are merged

• MC block is divided into several small blocks for FRDF

Core: 89033, Buffer: 65536, Glue logic: 22574 Total Area: 177,143 gates

(66)

Experiment 2 :

Parts of H.263 Decoder

282380

172032

52575 506987

161164172032

66304 399500

89033 65536

22574 177143

0 100000 200000 300000 400000 500000 600000

gates

original original_shared advanced

core buffer

glue logics total

(67)

Conclusions

Synthesize efficient hardware from dataflow specification.

z We use SDF and its extension to FRDF.

The main goal of our research

z Overcoming the limitations of previous approaches

− Solving non-deterministic timing of I/O

− Schedule-based design: resource sharing, looping control

z Efficient hardware synthesis applicable to practical HW design

− Supporting FRDF specification

All of these techniques are implemented in VHDL domain of PeaCE codesign environment and verified by some examples; DCT and H.263 decoder

(69)

Future work

Extension of expressiveness

z Piggybacked dataflow

z Dynamic construct : for (data-dependent iteration), case (if-then- else, conditional execution)

Support of legacy HW platforms

z Support legacy HW IP Å SW code generation

z Support various types of BUS & memory interface

− Local SRAM, dual-port memory, shared memory

z Current : Shared memory through AMBA interface

Optimization issues

z Buffer elimination

z Buffer sharing

z FRDF

Area (gates)

Schedule

Reference

Contents

Hardware Synthesis Problem

Design Size & Abstraction Level

Conventional High-level Synthesis

Architecture Synthesis Problem

New Trend: C-based design

System-level HW Synthesis

Hardware Synthesis Strategies

System Design Flow in PeaCE

HW/SW Interface

HW/SW Cosynthesis

wrapper

uProcessor

Contents

GRAPE Approach

GRAPE Standard Interface

Meyr’s Approach

Ptolemy Approach: VHDL domain

Limitations of Previous Approaches

Comparison among Approaches

Contents

Block Libraries

Block Types & Control Signals

A B C D

Type A : Combinational logic

VHDL Star with States

Type B

: Single-cycle sequential logic

Type C : Multi-cycle sequential logic with fixed execution time

Multi-cycle logic (fixed)

Type D : Multi-cycle sequential logic with variable execution time

Multi-cycle logic (variable)

Timing Model of HW Library Block

Timing Model of HW Library Block

Timing Model of Multi-rate Block

A

A

Contents

Controller Synthesis

Communication between Modules

Basic Idea

Multiple RCV nodes

) (

max

RT D

VT = +

Cascaded Counter: Idea

With Multiple Send Nodes

Delay elements

With Delay Registers

With Delay Registers

Nodes with Variable Execution Time

A B C

A

B

C

Modify!

Equivalent FSM Controller Implementation

Equivalent FSM Controller Implementation

Looping Control

Looping Control

Buffer Management

A B

Resource Management

X

X

X X X

Contents

DFG

Block Libraries

Schedule-based HW Synthesis

Previous Works

Previous Works

Motivation

Fully-sequential Fully-parallel Hybrid

Motivation

Schedule Information 1