Electronic System Level Synthesis Conference
ESLsyn 2014 Proceedings
The 2014 Electronic System Level Synthesis Conference
May 31 - June 1, 2014
San Francisco, CA, USA
co-located with DAC!
The ESLsyn 2014 proceedings are available to ECSI members, conference attendees and presenters. Contact us to get your credentials.
The ESLsyn 2011-2013 articles are open to the public. All other ESLsyn materials are available to ECSI members and conference attendees and presenters only. To find out more about becoming an ECSI member, please click here.
ESLsyn 2014 Proceedings on IEEE Xplore (available after the conference)
BibTex of ESLsyn 2014 Proceedings (available after the conference)
ESLsyn 2014 Proceedings Publication Information
Print ISBN - IEEE Xplore
Keynote 1: Lech Jozwiak, Eindhoven University of Technology, The Netherlands
Architecture Synthesis of Heterogeneous MPSoCs for Highly-Demanding Applications
Keynote 2: Jorn W. Janneck, Lund University, Sweden
Wither High-Level Synthesis?
Invited Talk 1: Kazutoshi Wakabayashi, NEC
FPGA+HLS: New computing Paradigm for Complex Algorithm Synthesis
Invited Talk 2: Andres Takach, Calypto
HLS Current State, Adoption Drivers, and Future Directions
Invited Talk 3: Andy Pimentel, University of Amsterdam, The Netherlands
Perspectives on System-level MPSoC Design Space Exploration
Session 1: Application Analysis
Accelerating Full-System Simulation ESL Design and Application Analysis Through Multi-Granularity and Focused Profiling
Tzu-Hsiang Su, Wei-Shan Wu, Chen-Te Chou, Yuan-Chun Cheng, Meng-Ting Tsai, and Tien-Fu Chen
Faced with the rapid divergence of hardware used on embedded devices, there is a need for a tool that can efficiently assist with hardware/software co-design and architecture verification. Speeding up those ESL phases greatly reduces the length of development periods. To address this issue, our work implements a novel multi-granularity tracer for Android’s simulator to provide ESL hardware design performance analysis and verification. In addition, we propose a flexible ESL module interface for system hardware designers to explore new hardware components via simple modules. Our work also enables software developers to identify performance bottleneck and assess software performance of new hardware components. Our case studies and experimental results show that our multi-granularity Android tracer can strip away irrelevant information to shave time off the architecture development period.
Precise Deadlock Detection for Polychronous Data-flow Specifications
Chan Ngo, Jean-Pierre Talpin, and Thierry Gautier
Dependency graphs are a commonly used data structure to encode the streams of values in data-flow programs and play a central role in scheduling instructions during automated code generation from such specifications. In this work, we propose a precise and effective method that combines a structure of dependency graph and first order logic formulas to check whether multi-clocked data-flow specifications are deadlock free before generating code from them.We represent the flow of values in the source programs by means of a dependency graph and attach first-order logic formulas to condition these dependencies. We use an SMT solver to effectively reason about the implied formulas and check deadlock freedom.
Session 2: Work in Progress
A Memory-First Language and Model for Hardware-Software Cosynthesis
Kunal Arya and Forrest Brewer
This paper presents a memory-centric model & language tailored for hardware/software co-synthesis. The model sets up a large potential design space where any part of the application is realizable on any software/hardware component. This is achieved by enforcing data & control locality, coupled with a unique copying semantic based on explicit knowledge of variable lifetime. The language eschews traditional array indexing for an iterator model which not only codifies access to large arrays, but also enables exploitation of concurrency while simplifying implementation in either software or hardware. A hierarchical guarded-rule language describes applications and is backed by a fully featured compiler & simulator. We demonstrate a realworld iterator-based FFT and discuss two different architectural realizations of that design.
An Assisted Single Source Verification Metric Model Code Generation Methodology
Christoph Kuznik, Gilles Bertrand Defo, and Wolfgang Mueller
The ever-increasing complexity of heterogeneous electronic systems demand for intensified abstraction and automation efforts to improve design, verification and validation productivity, especially in earlier phases of system engineering. Within the verification activity various metrics can be applied to determine functional correctness or the overall progress. Here, a supporting verification methodology defining high-level verification planning down to the actual metric code development is essential. Moreover, an advanced assistance for the designer, such as a tooling infrastructure to automatize and accelerate the metric code implementation, is needed to minimize the influence of errorprone manual coding. In this article we present a single-source verification metric code-generation methodology for improved coverage automation. We determine (i) a suitable metric model for model-based capture of verification metrics as well as (ii) an assisted model-based processing and generation flow of the verification environment and metric skeletons. We apply our method to a SystemC case-study, in doing so, targeting metric code implementation productivity and consistency enhancement.
Automated Implementation of Operand Isolation on Netlists
Matthias Sauppe, Thomas Horn, Erik Markert, Ulrich Heinkel, and Klaus-Holger Otto
Due to the increasing microchip design complexity and its growing ecologic and economic system requirements, minimizing energy consumption is a crucial target in today’s microchip design process. Operand isolation is a well-known technique to reduce energy consumption of hardware components. If certain signals do not influence the overall system behavior in specific system states, their calculation can be omitted in these states by introducing an isolation logic, resulting in less switching power. Using a use case based approach, isolation candidates can be found automatically and the saved power for the isolated circuitry can be estimated. In this paper, an algorithm is presented which processes a synthesized netlist and corresponding toggle data from a simulation run to generate and implement operand isolation logic for netlists without affecting overall system behavior. The approach has been tested on two industrial network components and the results are presented.
Session 3: High-Level Synthesis
Machine-Learning based Simulated Annealer method for High Level Synthesis Design Space Exploration
Anushree Mahapatra and Benjamin Schafer
This paper presents a modified technique of simulated annealing, based on machine learning for effective multi- objective design space exploration in High Level Synthesis (HLS). In this work, we present a more efficient simulated annealing called Fast Simulated Annealer (FSA) which is based on a decision tree machine learning algorithm. Our proposed exploration method makes use of a standard simulated annealer to generate a training set, and uses this set to implement a decision tree. Based on the outcome of the decision tree, the algorithm fixes the synthesis directives (pragmas) which contribute to minimizing/maximizing one of the cost function objectives and continues the annealing procedure using the decision tree. Experimental results show that the average execution time of our proposed tree based simulated annealing algorithm is on average 36% faster than the standard annealer and can be up to 48% faster, while leading to similar results.
A Hierarchical Framework to Enhance Scalability and Performance of Scheduling and Mapping Algorithms
Wei Tang and Forrest Brewer
Crucial to design productivity, architecture level synthesis algorithms trade off between design quality and algorithm complexity. The well-known list scheduling algorithm has a O(N) complexity but has well known deficiencies. Ant Colony, FDLS and Simulated Annealing have at least O(N3) time complexity. These considerations force a limitation on the scale of design instances that can be synthesized. A hierarchical analysis framework is proposed that improves both the run-time and ultimate performance of classical scheduling and mapping algorithms. Since the design hierarchy is not imposed, classical induced constraint issues from hierarchy are avoided. Compared to state-of-theart heuristics, the framework runs an order of magnitude faster while achieving 12% performance improvement. The framework is able to efficiently address designs with more than 104 operations which is beyond the capability of any high quality flat heuristics.
Session 4: System-Level Synthesis
Considering Variation and Aging in a Full Chip Design Methodology at System Level
Domenik Helms, Kim Gruettner, Reef Eilers, Malte Metzdorf, Kai Hylla, Frank Poppen, and Wolfgang Nebel
We present a new system-level design methodology enabling the consideration of process variations and degradation due to aging in early stages of the design process. By mapping an executable system specification to SoC processing, communication and memory components in combination with component wise timing and power characterization with a source-level backannotation, we enable efficient full SoC power and temperature over time simulations. Based on the resulting temporal and spatial power and temperature distribution we use a high-level multiphysics simulation to assess the impact of degradation and aging. We evaluate our approach using an ARM7 based SoC design.
Coarse Grain Clock Gating of Streaming Applications in Programmable Logic Implementations
Endri Bezati, Simone Casale Brunet, Marco Mattavelli, and Jorn W Janneck
Streaming applications describe a broad class of computing algorithms in areas such as signal processing, media coding and compression, cryptography, video analytics, network touting and packet processing and many others. For many of these applications, programmable logic devices such as FPGAs are the implementation platform of choice due to their higher flexibility compared to ASICs and lower power consumption and higher performance compared to processors. This paper presents a set of techniques for taking advantage of the streaming character of the algorithm by selectively switching off parts of the circuit that cannot execute, thus saving power. The implementation is integrated into an existing high-level synthesis flow, and applied to a variety of applications, resulting in up to 20% power reduction with a very small additional logic footprint and no loss in throughput.
System Level Synthesis of Many-Core Architectures using Parallel Stream Rewriting
Lars Middendorf and Christian Haubelt
When designing the software and hardware architecture of many-core systems with hundreds of processors on a single chip, a central problem is the scheduling and binding of work-items to execution units. We present a novel synthesis flow for applications with highly dynamic and unpredictable behaviour, which is based on the concept of parallel stream rewriting. In our model, tasks are self-timed and do not require explicit book-keeping by a central scheduler, so that also dynamic and recursive tasks can be managed and synchronized by local rewriting operations on the stream. Complex examples, evaluated using an FPGA prototype, show the effectiveness of our approach.
ESLsyn 2014 List of Participants (restricted access)