ESLsyn 2013 Proceedings

Share it now


The 2013 Electronic System Level Synthesis Conference

May 31-June 1, 2013
Austin, Texas, USA

co-located with DAC!
50th ACM/EDAC/IEEE Design Automation Conference, June 2-6, 2013

 

The ESLsyn 2014 proceedings are available to ECSI members, conference attendees and presenters. Contact us to get your credentials.

The ESLsyn 2011-2013 articles are open to the public. All other ESLsyn materials are available to ECSI members and conference attendees and presenters only. To find out more about becoming an ECSI member, please click here.

ESLsyn 2013 Program (public)

ESLsyn 2013 Proceedings (restricted access)

ESLsyn 2013 Proceedings on IEEE Xplore

BibTex of ESLsyn 20313 Proceedings

ESLsyn 2013 Proceedings Publication Information

ISSN 2117-4628

ISBN - ECSI Media
978-2-9539987-8-8

ISBN - IEEE Xplore Compliant PDF Files
978-2-9539987-9-5

Editors
Dr. Adam Morawiec
Jinnie Hinderscheit

ECSI
Electronic Chips & Systems design Initiative

Parc Equation
2, Avenue de Vignate
38610 Gières, France
office [at] ecsi [dot] org

 

 

 

 

 

 

ESLsyn 2013 Welcome

Keynote: Rapid Prototyping: Why and How

Arvind, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology

Invited Presentation 1: Automatic Prototyping of declarative properties on FPGA

Dominique Borrione, TIMA

Invited Presentation 2: Precision Timed Infrastructure: Design Challenges

David Broman, UC Berkeley

Paper (restricted access)                                                     IEEE Xplore         

Invited Presentation 3: From alchemy to chemistry? A few observations on high-level synthesis

Jorn W. Janneck, Lund University

Invited Presentation 4: System Synthesis from UML/MARTE Models: The PHARAON approach

Eugenio Villar, University of Cantabria

Paper (restricted access)          Slides (restricted access)          IEEE Xplore
 

Panel Discussion: Is there a future for System Level Design? (restricted access)

Session 1: High-Level Synthesis

Partial Controller Retiming in High-Level Synthesis
Ryoya Sobue, Yuko Hara-Azumi, and Hiroyuki Tomiyama

Various optimization techniques of high-level synthesis (HLS) have been studied for improving clock frequency. However, they focus only on the datapath and cannot handle the controller delay even though most critical paths lie across the controller and datapath (i.e., from state registers in the controller to storage units in the datapath) and the controller delay occupies the non-negligible portion of the paths. This paper proposes a novel HLS technique to remove such controller delays. Our method, "Register-Transfer (RT) level register retiming", is applied to only parts of the control logic, which generate control signals of multiplexers (MUXs) on critical paths, in such a way that generates and stores the signals into registers in the previous cycle. It then lets the MUXs obtain their control signals directly from the registers, leading to reduction in critical path delay. Experiments on several benchmark programs demonstrate that our RT-level retiming can achieve comparable clock improvement while mitigating area overhead, compared with conventional gate-level retiming.

Paper (restricted access)                                                     IEEE Xplore

System Level Synthesis Of Dataflow Programs: HEVC Decoder Case Study
Mariem Abid, Khaled Jerbi, Mickael Raulet, Olivier Deforges, and Mohamed Abid

While dealing with increasing complexity of signal processing algorithms, the primary motivation for the development of High Level Synthesis (HLS) tools for the automatic generation of Register Transfer Level (RTL) description from high level description language is the reduction of time-to-market. However most existing HLS tools operate at the component level, thus the entire system is not taken into consideration. We provide an original technique that raises the level of abstraction to the system level in order to obtain RTL description from a dataflow description. First, we design image processing algorithms using an actor oriented language under the Reconfigurable Video Coding (RVC) standard. Once the design is achieved, we use a dataflow compilation infrastructure called Open RVC-CAL Compiler (Orcc) to generate a C-based code. Afterward, a Xilinx HLS tool called Vivado is used for an automatic generation of synthesizable hardware implementation. In this paper, we show that a hardware implementation of High Efficiency Video Coding (HEVC) under the RVC specifications is rapidly obtained with promising preliminary results.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

Synthesis and Optimization of High-Level Stream Programs
Endri Bezati, Simone Casale Brunet, Marco Mattavelli, and Jorn Janneck

In this paper we address the problem of translating high-level stream programs, such as those written in MPEG's RVC-CAL dataflow language, into implementations in programmable hardware. Our focus is on two aspects: sufficient language coverage to make synthesis available for a large class of programs, and methodology and tool support providing analysis and guidance to improve and optimize an initial implementation. Our main results are (1) a synthesis tool that for the first time translates a complete and unmodified MPEG reference implementation into a working hardware description, and (2) a suite of profiling and analysis tools that analyze the structure of computation weighted by data obtained from the synthesis process, and accurately pinpoint parts of the program that are targets for optimization.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

 

Session 2: Work-in-Progress

Automatic Partitioning of Behavioral Description for High-Level Synthesis with Multiple Internal Throughputs
Benjamin Carrion Schafer

This works presents a method for automatically partitioning single process behavioral descriptions (ANSI-C or SystemC) into separate processes under a given global throughput constraint. The proposed method identifies parts in the process with different internal Data Initiation Intervals (DIIs) and partitions it into sub-processes that can in turn be optimized independently. Experimental results show that our proposed method can reduce the overall design area by up to ~38% and on average by ~22% compared to the original single process synthesis. Our method can further reduce the overall design area by on average another ~12% if a design space exploration (DSE) for each newly generated process is performed.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

From Multicore Simulation to Hardware Synthesis Using Transactions
Amine Anane and El Mostapha Aboulhamid

With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become necessary. This paper shows how, by using the powerful mechanism of transactions as a concurrency model, and by taking advantage of .NET introspection and attribute programming capabilities, we were able to achieve an automatic high-level synthesis flow. Indeed, we kept the same object oriented programming concepts to describe the architecture of high-level models, such as encapsulation and interfacing. However, unlike SystemC, the behaviour is no longer described as processes and events but as transactions. Transactions can be seen as atomic actions interacting through shared variables. Then, we transform such high level translational model to a SystemC behavioral model ready to be synthesized by a behavioral synthesizer.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

Efficient Preemption of Loops for dynamic HW/SW partitioning on Configurable Systems on Chip
Marko Roessler, Ulrich Heinkel, and Jan Langer

With the advance of high-level synthesis methodologies it has become possible to transform software tasks, typically running on a processor, to hardware tasks running on FPGA device. Furthermore, dynamic reconfiguration techniques allow dynamic scheduling of hardware tasks on an FPGA area at runtime. Combining these techniques allows dynamic scheduling across the hardware-software boundary. However, to interrupt and resume a task, its context has to be identified and stored. Given a set of breakpoints in the general control flow of a task that guarantees a maximum latency between interrupts, loop bodies have to be interruptible. This work presents an efficient way to synchronize loop implementations between the software and the hardware world, even if control and data flows are of fundamental different nature.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

 

Session 3: System-Level Modelling and Synthesis

Scalable High Quality Hierarchical Scheduling
Wei Tang and Forrest Brewer

List scheduling is well known for its implementation simplicity and O(N^{2}) scalability, but not for result quality. The Ant-Colony scheduling algorithm, imitating the cooperative behaviors of ants, does generate high quality results, but like any stochastic search, has potentially long run times to assure high result quality. This paper presents a hierarchical scheduling algorithm using the ideas of ant colony, whose run time complexity is at the same scale as ordinary list scheduling, while generating results as good as the classic ant-colony scheduling algorithm. In practice, this implies a very substantial run-time improvement, enabling scheduling exploration of much larger problems while avoiding the pitfalls of over-constraint and lower quality results that hierarchical solutions are generally known for.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

Multi-Core Cache Hierarchy Modeling for Host-Compiled Performance Simulation
Parisa Razaghi and Andreas Gerstlauer

The need for early software evaluation has increased interest in host-compiled or source-level simulation techniques. For accurate real-time performance evaluation, dynamic cache effects have to be considered in this process. However, in the context of coarse-grained simulation, fast yet accurate modeling of complex multi-core cache hierarchies poses several challenges. In this paper, we present a novel generic multi-core cache modeling approach that incorporates accurate reordering in the presence of coarse-grained temporal decoupling. Our results show that our reordering approach is as accurate as a fine-grained simulation while maintaining almost the full performance benefits of a temporally decoupled simulation.

Paper (restricted access)          Slides (restricted access)          IEEE Xplore

Pre- and Post-Scheduling Memory Allocation Strategies on MPSoCs
Karol Desnos, Maxime Pelcat, Jean-François Nezan, and Slaheddine Aridhi

This paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC). This method first consists of firstly deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool. Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution.

Paper (restricted access)                                                     IEEE Xplore

 

List of ESLsyn 2013 Participants (restricted access)

Share it now