Program

Share it now

The 2013 Electronic System Level Synthesis Conference

May 31-June 1, 2013
Austin, Texas, USA

co-located with DAC!
50th ACM/EDAC/IEEE Design Automation Conference,
June 2-6, 2013 at the Austin Convention Center in Austin, TX

 



Friday, May 31

09:30-10:00 Registration & Welcome Coffee
10:00-10:15 Welcome - General Chair: Achim Rettberg, University of Oldenburg
10:15-11:45 Session 1: High-Level Synthesis
Ryoya Sobue (Ritsumeikan University), Yuko Hara-Azumi (Nara Institute of Science and Technology) and Hiroyuki Tomiyama (Ritsumeikan University)
Mariem Abid, Khaled Jerbi, Mickael Raulet, Olivier Deforges (INSA of Rennes) and Mohamed Abid (ENIS Sfax)
Endri Bezati, Simone Casale Brunet, Marco Mattavelli (EPFL SCI-STI-MM) and Jorn Janneck (Lund University)
11:45-12:45 Lunch Break
12:45-13:30 Invited Presentation 1
Dominique Borrione, TIMA
13:30-14:15 Invited Presentation 2
David Broman, UC Berkeley
14:15-14:30 Coffee Break
14:30-16:00 Session 2: Work-in-Progress
Benjamin Carrion Schafer (The Hong Kong Polytechnic University)
Amine Anane and El Mostapha Aboulhamid (Université de Montréal)
Marko Roessler, Jan Langer and Ulrich Heinkel (Chemnitz University of Technology)
18:00 Social Dinner

 

Saturday, June 1

08:30-09:00 Registration & Welcome Coffee
09:00-10:00 Keynote:
Professor Arvind,
Computer Science and Artificial Intelligence Laboratory,
Massachusetts Institute of Technology
10:00-10:15 Coffee Break
10:15-11:45 Session 3: System-Level Modelling and Synthesis
Wei Tang and Forrest Brewer (University of California, Santa Barbara)
Parisa Razaghi and Andreas Gerstlauer (The University of Texas at Austin)     
Karol Desnos, Maxime Pelcat, Jean-François Nezan (IETR, INSA de Rennes, CNRS UMR 6164, UEB) and Slaheddine Aridhi (Texas Instruments France)
11:45-12:45 Lunch Break
12:45-13:30 Invited Presentation 3
Jorn W. Janneck, Lund University
13:30-14:15 Invited Presentation 4
Eugenio Villar, University of Cantabria
14:15-14:30 Coffee Break
14:30-16:00 Panel Discussion: Is there a future for System Level Design?
Moderator:
Achim Rettberg, University of Oldeburg
Panelists:
Dominique Borrione, TIMA
David Broman, UC Berkeley
Jorn W. Janneck, Lund University
Eugenio Villar, University of Cantabria
Professor Arvind, MIT


PAPER PRESENTATIONS

Partial Controller Retiming in High-Level Synthesis
Ryoya Sobue (Ritsumeikan University), Yuko Hara-Azumi (Nara Institute of Science and Technology) and Hiroyuki Tomiyama (Ritsumeikan University)

Abstract
Various optimization techniques of high-level synthesis (HLS) have been studied for improving clock frequency. However, they focus only on the datapath and cannot handle the controller delay even though most critical paths lie across the controller and datapath (i.e., from state registers in the controller to storage units in the datapath) and the controller delay occupies the non-negligible portion of the paths. This paper proposes a novel HLS technique to remove such controller delays. Our method, "Register-Transfer (RT) level register retiming", is applied to only parts of the control logic, which generate control signals of multiplexers (MUXs) on critical paths, in such a way that generates and stores the signals into registers in the previous cycle. It then lets the MUXs obtain their control signals directly from the registers, leading to reduction in critical path delay. Experiments on several benchmark programs demonstrate that our RT-level retiming can achieve comparable clock improvement while mitigating area overhead, compared with conventional gate-level retiming.

System Level Synthesis Of Dataflow Programs: HEVC Decoder Case Study
Mariem Abid, Khaled Jerbi, Mickael Raulet, Olivier Deforges (INSA of Rennes) and Mohamed Abid (ENIS Sfax)

Abstract
While dealing with increasing complexity of signal processing algorithms, the primary motivation for the development of High Level Synthesis (HLS) tools for the automatic generation of Register Transfer Level (RTL) description from high level description language is the reduction of time-to-market. However most existing HLS tools operate at the component level, thus the entire system is not taken into consideration.
We provide an original technique that raises the level of abstraction to the system level in order to obtain RTL description from a dataflow description. First, we design image processing algorithms using an actor oriented language under the Reconfigurable Video Coding (RVC) standard. Once the design is achieved, we use a dataflow compilation infrastructure called Open RVC-CAL Compiler (Orcc) to generate a C-based code. Afterward, a Xilinx HLS tool called Vivado is used for an automatic generation of synthesizable hardware implementation.
In this paper, we show that a hardware implementation of High Efficiency Video Coding (HEVC) under the RVC specifications is rapidly obtained with promising preliminary results.

Synthesis and Optimization of High-Level Stream Programs
Endri Bezati, Simone Casale Brunet, Marco Mattavelli (EPFL SCI-STI-MM) and Jorn Janneck (Lund University)

Abstract
In this paper we address the problem of translating high-level stream programs, such as those written in MPEG's RVC-CAL dataflow language, into implementations in programmable hardware. Our focus is on two aspects: sufficient language coverage to make synthesis available for a large class of programs, and methodology and tool support providing analysis and guidance to improve and optimize an initial implementation. Our main results are (1) a synthesis tool that for the first time translates a complete and unmodified MPEG reference implementation into a working hardware description, and (2) a suite of profiling and analysis tools that analyze the structure of computation weighted by data obtained from the synthesis process, and accurately pinpoint parts of the program that are targets for optimization.

Scalable High Quality Hierarchical Scheduling
Wei Tang and Forrest Brewer (University of California, Santa Barbara)

Abstract
List scheduling is well known for its implementation simplicity and O(N^{2}) scalability, but not for result quality. The Ant-Colony scheduling algorithm, imitating the cooperative behaviors of ants, does generate high quality results, but like any stochastic search, has potentially long run times to assure high result quality. This paper presents a hierarchical scheduling algorithm using the ideas of ant colony, whose run time complexity is at the same scale as ordinary list scheduling, while generating results as good as the classic ant-colony scheduling algorithm. In practice, this implies a very substantial run-time improvement, enabling scheduling exploration of much larger problems while avoiding the pitfalls of over-constraint and lower quality results that hierarchical solutions are generally known for.

Multi-Core Cache Hierarchy Modeling for Host-Compiled Performance Simulation
Parisa Razaghi and Andreas Gerstlauer (The University of Texas at Austin)

Abstract
The need for early software evaluation has increased interest in host-compiled or source-level simulation techniques. For accurate real-time performance evaluation, dynamic cache effects have to be considered in this process. However, in the context of coarse-grained simulation, fast yet accurate modeling of complex multi-core cache hierarchies poses several challenges. In this paper, we present a novel generic multi-core cache modeling approach that incorporates accurate reordering in the presence of coarse-grained temporal decoupling. Our results show that our reordering approach is as accurate as a fine-grained simulation while maintaining almost the full performance benefits of a temporally decoupled simulation.

Pre- and Post-Scheduling Memory Allocation Strategies on MPSoCs
Karol Desnos, Maxime Pelcat, Jean-François Nezan (IETR, INSA de Rennes, CNRS UMR 6164, UEB) and Slaheddine Aridhi (Texas Instruments France)

Abstract
This paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC).
This method first consists of firstly deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool.
Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution.

Automatic Partitioning of Behavioral Description for High-Level Synthesis with Multiple Internal Throughputs
Benjamin Carrion Schafer (The Hong Kong Polytechnic University)

Abstract
This works presents a method for automatically partitioning single process behavioral descriptions (ANSI-C or SystemC) into separate processes under a given global throughput constraint. The proposed method identifies parts in the process with different internal Data Initiation Intervals (DIIs) and partitions it into sub-processes that can in turn be optimized independently. Experimental results show that our proposed method can reduce the overall design area by up to ~38% and on average by ~22% compared to the original single process synthesis. Our method can further reduce the overall design area by on average another ~12% if a design space exploration (DSE) for each newly generated process is performed.

From Multicore Simulation to Hardware Synthesis Using Transactions
Amine Anane and El Mostapha Aboulhamid (Université de Montréal)

Abstract
With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become necessary. This paper shows how, by using the powerful mechanism of transactions as a concurrency model, and by taking advantage of .NET introspection and attribute programming capabilities, we were able to achieve an automatic high-level synthesis flow. Indeed, we kept the same object oriented programming concepts to describe the architecture of high-level models, such as encapsulation and interfacing. However, unlike SystemC, the behaviour is no longer described as processes and events but as transactions. Transactions can be seen as atomic actions interacting through shared variables. Then, we transform such high level translational model to a SystemC behavioral model ready to be synthesized by a behavioral synthesizer.

Efficient Preemption of Loops for dynamic HW/SW partitioning on Configurable Systems on Chip

Marko Roessler, Ulrich Heinkel and Jan Langer (Chemnitz University of Technology)

Abstract
With the advance of high-level synthesis methodologies it has become possible to transform software tasks, typically running on a processor, to hardware tasks running on FPGA device. Furthermore, dynamic reconfiguration techniques allow dynamic scheduling of hardware tasks on an FPGA area at runtime. Combining these techniques allows dynamic scheduling across the hardware-software boundary. However, to interrupt and resume a task, its context has to be identified and stored. Given a set of breakpoints in the general control flow of a task that guarantees a maximum latency between interrupts, loop bodies have to be interruptible. This work presents an efficient way to synchronize loop implementations between the software and the hardware world, even if control and data flows are of fundamental different nature.


INVITED SPEAKERS & KEYNOTES



Rapid Prototyping: Why and How
Keynote Speaker: Professor Arvind, MIT

Abstract
Modern systems often contain special-purpose hardware for performance and power reasons. It is sometimes difficult to know a priori the best Hardware-Software decomposition of the system. Since different components are designed by different teams it also difficult to ensure that the whole system would function properly when various components are put together. These risks can be mitigated substantially if one can build rapidly an accurate and fast prototype of the system being designed. Such prototyping does not extend the time-to-market if the design methodology ensures that there is an automatic or semiautomatic path from the prototype design to the real product design. Our methodology has three essential aspects:  (1) Reusing complex blocks involving domain expertise; (2) Experimenting with designs to achieve goals such as cost, performance, and power; and (3) Conducting high-fidelity full system simulation, including software.  We will illustrate this methodology using several prototypes we have built over the past few years – AirBlue, Cycle-accurate High-performance Mulicore Simulator, H.264, Sparse FFT and BlueDBM.

Bio
Arvind is the Johnson Professor of Computer Science and Engineering at the Massachusetts Institute of Technology and a member of CSAIL (Computer Science and Artificial Intelligence Laboratory). From 1974 to 1978, prior to coming to MIT, he taught at the University of California, Irvine. Arvind received his M.S. and Ph.D. in Computer Science from the University of Minnesota in 1972 and 1973, respectively. He received his B. Tech. in Electrical Engineering from the Indian Institute of Technology, Kanpur, in 1969, and also taught there from 1977-78.

Arvind's current research interests are synthesis and verification of large digital systems described using Guarded Atomic Actions; and Memory Models and Cache Coherence Protocols for parallel architectures and languages.

In the past, Arvind's research interests have included all aspects of parallel computing and declarative programming languages. He has contributed to the development of dynamic dataflow architectures, the implicitly parallel programming languages Id and pH, and the compilation of these types of languages on parallel machines. Dr. R. S. Nikhil and Arvind published the book "Implicit parallel programming in pH" in 2001.

In 1992, Arvind's group, in collaboration with Motorola, completed the Monsoon dataflow machine and its associated software. A dozen of these machines were built and installed at Los Alamos National Labs and other universities, before Monsoon was retired to the Computer Museum in California.

In 2000, Arvind took a two-year leave of absence to start Sandburst, a fabless semiconductor company to produce a chip set for 10G-bit Ethernet routers. He served as its President until his return to MIT in September 2002. Sandburst was acquired by Broadcom in 2006. In 2003, Arvind co-founded Bluespec Inc, an EDA company to produce a set of tools for high-level synthesis, and serves on its board.

Arvind has served on the editorial board of many journals including the Journal of Parallel and Distributed Computing, and the Journal of Functional Programming. He has chaired and served on the program committee of many meetings sponsored by ACM and IEEE. From 1986-92, he was the Chief Technical Advisor for the UN sponsored Knowledge Based Computer Systems project in India. During 1992-93 Arvind was the Fujitsu Visiting Professor at the University of Tokyo.  Arvind managed the Nokia-CSAIL research collaboration from 2006-2010. Since 2009, Arvind is also WCU (World Class University) Distinguished Professor at the Seoul National University.

Arvind has delivered more than hundred keynote and distinguished lectures.

http://www.csg.csail.mit.edu/Users/arvind/


Automatic Prototyping of declarative properties on FPGA
Invited Speaker: Dominique Borrione, TIMA

Work performed with Fatemeh (Negin) Javaheri, Katell Morin-­‐Allory, Alexandre Porcher
TIMA Laboratory (CNRS, Grenoble INP, UJF), Grenoble, France

Abstract
Despite the use of pre-designed and pre-verified processor cores, memories and functional operators, designing a new system on a chip first time right remains a challenge: control and communications need to be specially adjusted or developed, and their complexity is reaching that of the other components. Two main strategies are classically applied to keep designs manageable: raising the abstraction level and modularity. This presentation addresses both.

We aim at automatically generating a correct by construction prototype control part, for validation and design exploration purposes. High-level synthesis (HLS) tools produce register transfer level (RTL) designs from an algorithmic description of its behavior, possibly complemented with rewrite rules to express control and concurrency. While augmenting the productivity of designers, HLS has not improved the verification problem. We believe the difficulty lies in the initial specification: for many designs that are inherently concurrent, providing a sequential algorithm is inefficient. We prefer declarative specifications, which state the expected behavior in mathematical terms. Statements are unordered, parallelism is inherent, and one or more statements may be active at any given time point.

Declarative specifications are now widely adopted in the context of verification: declarative properties about the behavior of a design (Assertions) or its environment (Assumptions) are checked using dynamic or static verification tools.

In contrast, Assertion Based Synthesis (ABS) has not reached the same maturity. In ABS, a collection of properties are the specification of the module to be designed, where some operand variables are input to the module, and others are outputs. The objective is then to directly produce the synthesizable RTL design from the properties.

The talk will review previous work, which laid the foundations for the automatic synthesis of verification IP's (monitors and generators) from PSL properties. Our method is modular: it is based on the interconnection of elementary library modules for the logical and temporal operators of the property, according to the syntactic structure of the property. Our method produces RTL designs that are correct by construction: both the library elements and the interconnection procedure are proven correct, using the formal trace semantics of the PSL language, and the PVS proof system.

Shifting from assertion-based verification to assertion-based design, we generate the design itself rather than verification IP's to be linked to the design. More precisely, we produce, for each property, a compliant RTL component called reactant: its inputs and outputs are operands of the property, it reacts to the input values and produces output values so that the property holds. The construction method is again based on the proven correct interconnection of proven correct elementary modules.

In general, the specification has many properties, and a same variable may appear in several distinct properties. The originality of our approach is to avoid combining all the properties into one big automaton. Our method constructs the dependencies between the design variables, and identifies which properties monitor a variable, and which properties generate its value. If a variable is an output for several reactants, these are combined with a solver to produce the final design.

SyntHorus-2 is a prototype software tool that implements the principles described above. It generates a synthesizable RTL circuit description from a set of PSL properties that define the expected circuit behavior. The obtained circuit is still more costly than a hand designed circuit, due to the solvers, in the presence of many duplicated signals. Yet, our method provides a reference model from the first step of the design flow, to be used for architectural verification and prototyping. The computing time is proportional to the size of the specification.

Bio
Dominique Borrione has been a Professor at the University of Grenoble since 1988. Since January 2007, she is the director of TIMA Laboratory. From the University of Grenoble, she received the MSc in Computer Science in 1972, the PhD in Computer Science in 1976, and the Thèse d'Etat in 1981.Before joining TIMA, she was director of the ARTEMIS Laboratory from 1991 to 1995. She was a team leader at ARTEMIS (1988- 1995), then at TIMA (1996-2006). From Dec. 1983 to August 1988, she was a Professor at the University of Marseille.She developed the theme of formal methods in hardware design, particularly taking as input designs described in VHDL. Most of her research has been supported by contracts, through industrial and academic cooperative projects in the context of the ESPRIT and MEDEA European programs.Professor Borrione has published over 90 refereed journal papers, international refereed conference papers, and book chapters. She has been a member of numerous working groups, and program committees of international conference and workshop series (CHDL, CHARME/FMCAD, DATE, SBCCI, VLSI-SOC). She was program chair of CHDL'81, DATE'99, CHARME'05, and FDL'09.


Precision Timed Infrastructure: Design Challenges
David Broman, UC Berkeley

David Broman, Michael Zimmer, Yooseong Kim, Hokeun Kim, Jian Cai, Aviral Shrivastava, Stephen A. Edwards, and Edward A. Lee

Abstract
In general-purpose software applications, computation time is just a quality factor: faster is better. In cyber-physical systems (CPS), however, computation time is a correctness factor: missed deadlines for hard real-time applications, such as avionics and automobiles, can result in devastating, life-threatening consequences. Although many modern modeling languages for CPS include the notion of time, implementation languages such as C lack any temporal semantics. Consequently, models and programs for CPS are neither portable nor guaranteed to execute correctly on the real system; timing is merely a side effect of the realization of a software system on a specific hardware platform. In this position paper, we present the research initiative for a precision timed (PRET) infrastructure, consisting of languages, compilers, and microarchitectures, where timing is a correctness factor. In particular, the timing semantics in models and programs must be preserved during compilation to ensure that the behavior of real systems complies with models. We also outline new research and design challenges present in such infrastructure.

Bio
David Broman is currently a visiting scholar at UC Berkeley, USA, working in the Ptolemy group at the Electrical Engineering & Computer Science department. He is an assistant professor at Linköping University in Sweden, where he also received his PhD in computer science in 2010. David's research interests include programming and modeling language theory, compiler technology, software engineering, and mathematical modeling and simulation of cyber-physical systems. He has worked five years within the software security industry, co-founded the EOOLT workshop series, and is member of the Modelica Association and the Modelica language design group.


From alchemy to chemistry? A few observations on high-level synthesis
Jorn W. Janneck, Lund University

Abstract
By now, high-level synthesis and its correlate, system-level design, have a rich history that includes a wide variety of techniques being applied to a large range of application areas with different degrees of success. In this talk I will be drawing from my own experiences in one corner of this field, and speculate that high-level synthesis might finding its proper place in future design flows by deemphasizing its alchemical quest of compiling sequential languages to hardware, and engaging in the ongoing groundswell of renewed interest in parallel programming languages, tools, and methodologies.

Bio
Jorn W. Janneck is an associate professor in the computer science department at Lund University. He graduated from the University of Bremen in 1995 and received a PhD from ETH Zurich in 2000. He worked at the Fraunhofer Institute for Material Flow and Logistics (IML) in Dortmund, was a postdoctoral scholar at the University of California at Berkeley in the EECS department, and worked in industrial research from 2003 to 2010, first at Xilinx Research in San Jose, CA, and more recently at the United Technologies Research Center in Berkeley, CA. He is one of the authors of the CAL actor language and has been working on tools and methodology focused on making dataflow a practical programming model in a wide range of application areas, including image processing, video coding, networking/packet processing, DSP and wireless baseband processing. He has made major contributions to the standardization of RVC-CAL and dataflow by MPEG and ISO. His research is focused on aspects of programming parallel computing machines, including programming languages, machine models, tools, code generation, profiling, and architecture.

 


System Synthesis from UML/MARTE Models: The PHARAON approach
Eugenio Villar, University of Cantabria

Pablo Peñil, Hector Posadas, Alejandro Nicolás, Eugenio Villar (University of Cantabria)

Abstract
Model-Driven Engineering (MDE) based on modeling languages like UML is a mature methodology for software development. However, its application to HW/SW embedded systems specification and design requires of specific features. In order to cover them, the UML/MARTE profile for Real-Time and Embedded systems was defined. It has proven to be powerful enough to support holistic system modeling under different views. This single-source model is able to capture the required information for the automatic generation of executable and configurable models for fast performance analysis without requiring additional engineering effort. As a result of this performance analysis a suitable system architecture can be decided. At this point, the SW stack to be executed by each processing node in the selected heterogeneous platform has to be generated. In the general case this is a tedious and error-prone process with little assistance from available tools. Current practices constrain the SW engineer to develop the code for each node of the many-cores platform by hand. The code has to be written for the specific architecture and architectural mapping decided thus reducing reusability. In order to overcome this limitations the FP7 PHARAON project aims to develop tools able to automatically generate the code to be executed in each node from the initial system model. This affects not only the application code, the static and run-time libraries (e.g. OpenMP/OpenCL), the middleware and communication functions but also the OS and the drivers calls in each node.

Bio
Prof. Eugenio Villar got his Ph.D. in Electronics from the University of Cantabria in 1984. Since 1992 is Full Professor at the Electronics Technology, Automatics and Systems Engineering Department of the University of Cantabria where he is currently the responsible for the area of HW/SW Embedded Systems Design at the Microelectronics Engineering Group. His research activity has been always related with system specification and modeling. His current research interests cover system specification and design, MpSoC modeling and performance estimation using SystemC and UML/Marte. He is author of more than 130 papers in international conferences, journals and books in the area of specification and design of electronic systems. Prof. Villar served in several technical committees of international conferences like the VHDL Forum, Euro-VHDL, EuroDAC, DATE, VLSI-SoC and FDL. He has participated in several international projects in electronic system design under the FP5, FP6 and FP7, ITEA, Medea-Catrene and Artemis programs. He is the representative of the University of Cantabria in the ArtemisIA JU.
Additional information can be found in: www.teisa.unican.es/gim

Share it now