DASIP 2010 Proceedings

Share it now

The 2010 Conference on Design and Architectures
for Signal and Image Processing

October 26-28, 2010
Edinburgh, Scotland


The DASIP 2013 papers and presentations are available for ECSI members and conference attendees only. Please contact the ECSI office for the credentials.

The DASIP 2007-2012 articles are open to the public. All other DASIP materials are available for ECSI members, conference attendees, and presenters only. To find out more about becoming an ECSI member, please click here.

DASIP 2010 Program

DASIP 2010 Proceedings

DASIP 2010 Proceedings on IEEE Xplore

BibTex of DASIP 2010 Proceedings

DASIP 2011 Proceedings Publication Information

2010 Conference on Design and Architectures for Signal and Image Processing (DASIP), Edinburgh, Scotland, November 26-28, 2010


Print ISBN - IEEE Xplore


Dr. Adam Morawiec
Jinnie Hinderscheit

Electronic Chips & Systems design Initiative

Parc Equation - 2, Avenue de Vignate
38610 Gières, France

office [at] ecsi [dot] org

DASIP 2010 Welcome

Keynote 1: Transport Triggered Architecture: Development Tools and Applications
(restricted access)

Jarmo Takala, Tampere University of Technology, Finland

Keynote 2: Dataflow-based Design and Implementation of Signal Processing Systems: State-of-the-Art and Emerging Trends (restricted access)

Shuvra S. Bhattacharyya, University of Maryland, USA

Keynote 3: Joint Algorithm/Architecture Design of Communication Systems (restricted access)

Emmanuel Boutillon, Université de Bretagne Sud, France

Session 1: Reconfigurable Computing Architectures - Part 1

Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture
Erwan Raffin, Christophe Wolinski, François Charot, Krzystof Kuchcinski, Stéphane Guyetant, Stéphane Chevobbe, and Emmanuel Casseau

This paper presents a system for application scheduling, binding and routing for a run-time reconfigurable operator based multimedia architecture (ROMA). We use constraint programming to formalize our architecture model together with a specific application program. For this purpose we use an abstract representation of our architecture, which models memories, reconfigurable operator cells and communication networks. We also model network topology. The use of constraints programming makes it possible to model the application scheduling, binding and routing as well as architectural and temporal constraints in a single model and solve it simultaneously. We have used several multimedia applications from the Mediabench set to evaluate our system. In 78% of cases, our system provides results that are proved optimal.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

Designing dynamically reconfigurable SoCs: From UML MARTE models to automatic code generation
Imran Rafiq Quadri, Samy Meftali, and Jean-Luc Dekeyser

Due to continuous hardware/software evolution related to Systems-on-Chip (SoC) and the addition of features such as Partial Dynamic Reconfiguration , the complexity of SoC design and development has escalated exponentially. This has resulted in increased time to market and development costs. Without the usage of effective design tools and methodologies, large complex SoCs are becoming increasingly difficult to manage, resulting in a productivity gap . The design space, representing all technical decisions that need to be elaborated by the SoC design team is therefore, becoming immense and difficult to explore. Similarly, manipulation of these systems at low implementation levels such as Register Transfer Level (RTL) can be hindered by human interventions and the subsequent errors. This paper presents a novel design methodology that decreases the design complexity by raising the design abstraction levels. It makes use of Model Driven Engineering and the UML MARTE profile to move from high level UML models to automatic code generation, for implementing dynamically reconfigurable SoCs.

         Slides (restricted access)          BibTex          IEEE Xplore

A High-Level Language for Programming a NoC-based Dynamic Reconfiguration Infrastructure
Wim Vanderbauwhede and Waqar Nabi

We present an infrastructure for dynamic reconfiguration of heterogeneous coarse-grained reconfigurable architectures (CGRAs) based on our Gannet SoC platform. We introduce the infrastructure and in particular its domain specific high level programming language Gannet-C and discuss the language features that support dynamic reconfiguration and the way they are supported by the compiler and the hardware. We illustrate our approach with simulation results obtained using a cycle approximate SystemC model of the Gannet platform.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Session 2: Reconfigurable Computing Architectures - Part 2

FPGA-Based Rectification of Stereo Images
João Rodrigues and João Canas Ferreira

In order to obtain depth information about a scene in computer vision, one needs to process pairs of stereo images. The calculation of dense depth maps in real-time is computationally challenging as it requires searching for matches between objects in both images. The task is significantly simplified if the images are rectified, a process which horizontally aligns the objects in both images. The process of stereo images rectification has several steps with different computational requirements. The steps include 2D searches for high fidelity matches, precise matrix calculations, and fast pixel coordinate transformations and interpolations. In this project, the complete process is effectively implemented in a Spartan-3 FPGA, taking advantage of a MicroBlaze soft core for slow but precise calculations, and of fast dedicated hardware support for achieving the real-time requirements. The implemented system successfully performs real-time rectification on the images from two video cameras, with a resolution of 640 480 pixels and a frame rate of 25 fps, and is easily configured for videos with higher resolutions. The experimental results show very good quality, with rectified images having a maximum vertical disparity of two pixels, thereby showing that stereo image rectification can be efficiently achieved in an low-resource FPGA (with 64 KB for program instructions and data).

         Slides (restricted access)          BibTex          IEEE Xplore

Real-Time Classification Based on Color and Texture Attributes on a FPGA-Based Architecture
Mario-Alberto Ibarra-Manzano, Michel Devy, and Jean-Louis Boizard

The design and the implementation of algorithms on FPGA based architectures, is a complex task, above all for image processing. Many vision applications (video monitoring, obstacle detection from a vehicle) require real time performance. This paper analyzes only a classical function involved in these applications: pixel characterization by an attribute vector, and pixel classification as belonging or not to an interest class. Typical attributes are color and texture. Color is described by the chrominance given by the a and b coordinates in the CIE Lab color space. Texture is only computed from the L∗ coordinate, describing the local intensity variations in a neighborhood of every pixel. AdaBoost has been selected in order to learn how to classify every pixel from its attribute vector. From a learning data base, it is learnt off line how to select and combine a given number of weak classifiers; then, the classifier parameters are loaded on an FPGA-based kit. This paper proposes different architectures and presents some results obtained from images acquired from a robot, in order to classify a pixel as Ground or Obstacle.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Session 3: Smart Image Sensors

On-Chip Compression for HDR Image Sensors
Fadoua Guezzi Messaoud, Arnaud Peizerat, Antoine Dupret, and Yves Blanchard

An image compressing technique for High Dynamic Range (HDR) image sensors is introduced. Compression is performed in two steps: Pixel value coding optimization followed by DCT-based (Discrete Cosine Transform) compression. A floating point coding technique is first used with a common exponent shared between pixels of the same block, and then a DCT is applied to each group of pixels. This new concept, while maintaining low complexity architecture, shows a compression ratio of 75 % and retains a good image quality with a PSNR of about 40 dB.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

Architectures and Signal Reconstruction Methods for Nanosecond Resolution Integrated Streak Camera in Standard CMOS Technology
Martin Zlatanski, Wilfried Uhring, Virginie Zint, Jean-Pierre Le Normand, and Daniel Mathiot

This paper presents the state of the art of the Integrated Streak Camera (ISC) architectures in standard CMOS technology. It focuses on some of the methods required for reconstructing the luminous events profile from the chip raw data. Two main ISC architectures are presented. The first adopts the traditional for the most silicon imagers pixel array configuration, where the photocharges-induced signal is processed directly in-pixel. The second approach is based on a single light detecting vector, comparable to the slit of a Conventional Streak Camera (CSC), coupled to an amplifier stage and an analog sampling and storage unit. For both architectures, depending on the on-chip processing of the photocharges, appropriate signal reconstruction techniques are required in order to restore the luminous signal shape. A novel single vector ISC front-end architecture with an asynchronous photodiode reset scheme is presented. Algorithms allowing the luminous event reconstruction are proposed and validated through simulations for all the ISCs considered.

         Slides (restricted access)          BibTex          IEEE Xplore

Exploration Platform of Embedded SIMD Architecture for Autonomous Retinas
Stéphane Chevobbe, Suresh Pajaniradja, and Laurent Letellier

An integrated smart camera is a single chip composed of a sensor tightly coupled with one or more processing elements. The image processing applications that are mapped on such systems can require processing power in the range of supercomputer. To face the increasing application needs we propose in this paper a SIMD based processor optimized for the low and intermediate level of image processing. The architecture is composed of several SIMD cluster. Each cluster includes a configurable number of 2-Way PE (Processing Element) ranging from 32 to 256 running at 200 MHz. These cluster configurations provide between 12 to 102 GOPS.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Session 4: Advances in Reconfigurable Video Coding (RVC) – Part 1

A Portable Video Tool Library for MPEG Reconfigurable Video Coding Using LLVM Representation
Jérôme Gorin, Françoise Prêteux, Jean-François Nezan, Matthieu Wipliez, and Mickaël Raulet

MPEG Reconfigurable Video Coding (RVC) represents the last answer of MPEG to overcome the lack of interoperability between codecs deployed in the market nowadays. The main goal of MPEG RVC is to provide a set of coding tools employed in all MPEG standards, the Video Tools Library (VTL), encapsulated into independent entities called Functional Units (FUs). FUs are described as dataflow actors in RVC-CAL actor language (RVC-CAL) and decoders are described as dataflow programs with the Abstract DecoderModels (ADMs). Therefore, an ADM of an MPEG decoder corresponds in MPEG RVC to a network of FUs taken from the VTL. The typical use of MPEG RVC is to translate an ADM into a hardware or software description language that target one specific platform. In [1], we propose to skip this synthesis process of ADM and to directly integrate a portable version of VTL described in the Low-Level Virtual Machine Intermediate Representation (LLVM IR) inside platforms. This portable VTL is couple with a new RVC Decoder, we called Just-In-Time Adaptive Decoder Engine (Jade), that dynamically instantiates ADM to decode any encoded video using its associated network description. In this paper, we introduce the different compiling steps required to obtain an automatically translation of a VTL described in RVC-CAL into a portable VTL described in LLVM. This translation is based on a new RVC-CAL compiler called Open RVC-CAL Compiler (Orcc).

Paper          Slides (restricted access)          BibTex          IEEE Xplore

RVC: a Multi-Decoder CAL Composer Tool
Francesca Palumbo, Danilo Pani, Emanuele Manca, Luigi Raff, Marco Mattavelli, and Ghislain Roquier

The Reconfigurable Video Coding (RVC) framework is a recent ISO standard aiming at providing a unified specification of MPEG video technology in the form of a library of components. The word “reconfigurable” evokes run-time instantiation of different decoders starting from an on-the-fly analysis of the input bitstream. In this paper we move a first step towards the definition of systematic procedures that, based on the MPEG RVC specification formalism, are able to produce multi-decoder platforms, capable of fast switching between different configurations. Looking at the similarities between the decoding algorithms to implement, the papers describes an automatic tool for their composition into a single configurable multi-decoder built of all the required modules, and able to reuse the shared components so as to reduce the overall footprint (either from a hardware or software perspective). The proposed approach, implemented in C++ leveraging on Flex and Bison code generation tools, typically exploited in the compilers front-end, demonstrates to be successful in the composition of two different decoders MPEG-4 Part 2 (SP): serial and parallel.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

High-Level Design Space Exploration of RVC Codec Specifications for Multicore Heterogeneous

Christophe Lucarz, Marco Mattavelli, and Ghislain Roquier

Nowadays, the design flow of complex signal processing embedded systems starts with a specification of the application by means of a large and sequential program (usually in C/C++). As we are entering in the multicore era, sequential programs are no longer the most appropriate way to specify algorithms targeted to run on several processing units. The new ISO/MPEG Reconfigurable Video Coding (RVC) standard is proposing a new paradigm for specifying and designing complex signal processing systems. The RVC standard enables specifying new codecs by assembling blocks, or so called Functional Units (FUs) from a standard Video Tool Library (VTL). Flexibility, reusability, and modularity are the key features of RVC. This new way of specifying algorithms clearly simplifies the task of designing future  video coding applications by allowing software and hardware reuse across multiple video coding standards. Specifications are provided in the form of an actor and dataflow-based language called CAL. Although the RVC standard does not imply any specific implementation design flow, it is an appropriate starting point for targeting multiple processing units platforms. This paper describes a new model-driven design flow which considers both algorithm and architecture to map RVC codec specifications onto heterogeneous and multi-core systems.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Session 5: Advances in Reconfigurable Video Coding (RVC) – Part 2

Classification and Transformation of Dynamic Dataflow Programs
Matthieu Wipliez and Mickaël Raulet

Dataflow programming has been used to describe signal processing applications for many years, traditionally with cyclostatic dataflow (CSDF) or synchronous dataflow (SDF) models that restrict expressive power in favor of compile-time analysis and predictability. Dynamic dataflow is not restricted with respect to expressive power, but it does require runtime scheduling in the general case. Fortunately, most signal processing applications are far from being entirely dynamic, and parts with static behavior need not be dynamically scheduled. This paper presents a method to automatically analyze and classify blocks of a dynamic dataflow program within more restrictive dataflow models when possible, and to transform the blocks classified as static to improve execution speed by reducing the number of FIFO accesses. We used this method on actors of two dynamic dataflow descriptions of an MPEG- 4 part 2 decoder, and study how classification and transformation increases decoding speed.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

A Coarse-Grain Reconfigurable Hardware Architecture for RVC-CAL-Based Design
Cecile Beaumin, Olivier Sentieys, Emmanuel Casseau, and Arnaud Carer

MPEG Reconfigurable Video Coding project aims at providing more flexible and easier solutions to specify video coders and decoders. Many contributions are devoted to the RVCCAL language, the standard description language. There are also contributions about the general framework of this new model of video coding, and many CAL descriptions for video algorithms. However, RVC compliant implementations have been only studied next to code generation which does not take advantage of major characteristics of CAL networks as they are dataflow graphs. Consequently, there are no dedicated architectures that are inherently bound to the CAL language. The objective of this article is to present preliminary work about the design of a co-processor based architecture. The co-processor is a reconfigurable architecture that uses CAL network features and proposes a dynamic memory allocation system, which improves the communication between the processes that are implemented, and enables to allocate minimum memory.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

Hardware Code Generation from Dataflow Programs
Nicolas Siret, Aimad Rhatay, Matthieu Wipliez, and Jean-Francois Nezan

The elaboration of new systems on embedded targets is becoming more and more complex. In particular, multimedia devices are now implemented using mixed hardware and software architecture, which improve the computational power but also increase the design complexity and the time to market. New design flows have been developed to help designers in the development of complex architecture. These design flows are often based on the use of languages with a higher level of abstraction. RVC-CAL is a dataflow programming language which provides the good features in this context. An RVC-CAL dataflow program can be compiled to various target software languages (e.g. C, Java, LLVM) with the Open RVC-CAL Compiler (Orcc). In this paper, we will present a new hardware code generator that generates a high-quality portable VHDL code with hierarchical architecture from a RVC-CAL dataflow program in a matter of seconds. The paper explains the underlying principles of the hardware code generator, and presents the results obtained from an Inverse DCT described as an RVC-CAL dataflow program.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

RVC-CAL Dataflow Implementations of MPEG AVC/H.264 CABAC Decoding
Endri Bezati, Marco Mattavelli, and Mickaël Raulet

This paper describes the implementation of the MPEG AVC CABAC entropy decoder using the RVC-CAL dataflow programming language. CABAC is the Context based Adaptive Binary Arithmetic Coding entropy decoder that is used by the MPEG AVC/H.264 main and high profile video standard. CABAC algorithm provides increased compression efficiency, however presents a higher complexity compared to other entropy coding algorithms. This implementation of the CABAC entropy decoder using RVC-CAL proofs that complex algorithms can be implemented using a high level design language. This paper analyzes in detail two possible methods of implementing the CABAC entropy decoder in the dataflow paradigm.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Session 6: Technologies for Novel Applications

An FPGA Softcore Based Implementation of a Bird Call Recognition System for Sensor Networks
Hongzhi Liu and Neil Bergmann

To investigate the on-sensor processing capabilities of FPGAs, this paper presents a bird call recognition system based on linear predictive cepstral coefficients (LPCC) and dynamic time warping (DTW) algorithms for sensor network applications, and compares two different implementations on a Xilinx Spartan-3E FPGA with MicroBlaze soft processor. The experimental results show that compared to the software-only solution, the software / hardware (SW/HW) implementation with hardware coprocessor for DTW yields significant performance improvement by the factor of 13.8 and 33.4 respectively for two example inputs, and achieves about 31.1 times energy efficiency by using only 7.5% more power.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

Design and Hardware Implementation of a Low-Complexity Multiuser Vector Precoder
Maitane Barrenechea, Mikel Mendicute, Luis Barbero, and John Thompson

In the multiuser MIMO broadcast channel, the use of precoding techniques is required in order to detect the signal at the users’ terminals without any cooperation between them. This contribution presents the design and hardware implementation of a high-capacity precoder based on vector perturbation. The most challenging part of the vector precoding (VP) scheme, that is, the search for the perturbing signal in an infinite lattice, has been completed by the reduced-complexity albeit high-performance Fixed Sphere Encoder (FSE) algorithm. The most remarkable feature of the FSE is its fixed complexity, which makes it highly suitable for hardware implementation on field-programmable gate arrays (FPGA), where the parallelization and pipelining of resources can be applied to enhance the system throughput. An optimized reduced-complexity implementation is proposed, which achieves high performance with a reduced hardware resource usage.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Low Power Noise Detection Circuit Utilizing Switching Activity Measurement Method
Zulhakimi Razak, Ahmet Erdogan, and Tughrul Arslan

Noise often limits the performance of transmitted signals and degrades signals quality. Moreover, stochastic nature of noise makes it difficult to predict, and hence, is hard to detect. In hardware implementation, the reduction of noise can only be optimized in the baseband where complex and intensive computation is executed using digital signal processors (DSPs). Although analog pre-filtering is applied at receiver front-end to reduce interferences, fractions of noise still exist due to non-ideality of the device. We present a method to detect noise in signal using switching activity measurement (SWAM) of analog-to-digital converter (ADC) output. Simulation results show that switching activity of digital outputs varies with different signal-to-noise ratio (SNR) values where high SNR value leads to low switching activity. A noise detection unit (NDU) is implemented and is synthesized using AMS 0.35μm/3.3V CMOS standard library. The result shows minimum overhead where NDU occupies only 774 equivalent 2-input NAND gates and consumes only 978μW of power. With advantage of small complexity and power usage, NDU is attractive to be used in ADCbased systems and noise prone devices in order to detect noise and leads to further signal improvement.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

A Roadmap for Autonomous Fault-Tolerant Systems
Xabier Iturbe, Khaled Benkrid, Tughrul Arslan, I. Martinez, M. Azkarate, and M. D. Santambrogi

An Autonomous Fault-Tolerant System (AFTS) refers to a system that is able to configure its own resources in the presence of permanent defects and spontaneous random faults occurring in its silicon substrate in order to maintain its functionality. This work analyzes how AFTS could be built, specifically focusing on hardware platform dependant issues, and gives an overview of the state-of-the-art in this field, which is still in its infancy. Three technological levels are used for classifying the research efforts conducted to date. By describing the current state-of-the-art and the constraints imposed by current technology, this work tries to envision future trends towards the ultimate objective of achieving a fully-adaptive system capable of modifying its architecture on-the-fly as needed. Finally, the general structure and organization of a Reliable Reconfigurable Real-Time Operating System (R3TOS) is presented. This OS aims at making the aforementioned adaptability easily exploitable by future commercial applications.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Session 7: Applications and Algorithms for Cameras

Camera-Based System for Tracking and Position Estimation of Humans
Robert Hartmann, Fadi Al Machot, Philipp Mahr, and Christophe Bobda

The human population is getting older and older, as stated by current studies. Because elderly people are at a higher risk of in house accidents there is an increasing  need for ambient assisted living systems. These systems should detect accidents or dangerous situations in order to improve the quality of life for these people. The goal of this work is to build a robust and intelligent system which estimates the position of humans using only one camera. The position is used to detect falls and to allow an immediate call for help. The solution is based on a foreground-background-segmentation using Gaussian Mixture Models to first detect people and than analyze their main and ideal orientation using moments. This allows to decide whether a person is staying or lying on the floor. The system has a low latency and a detection rate of 88% in our case study.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

A Wavelet-Based Demosaicking Algorithm for Embedded Applications
Sébastien Courroux, Stéphane Guyetant, Stéphane Chevobbe, and Michel Paindavoine

This paper presents an alternative to the spatial reconstruction of the sampled color lter array acquired through a digital image sensor. A demosaicking operation has to be applied to the raw image to recover the full-resolution color image. We present a low-complexity demosaicking algorithm processing in the wavelet domain. Produced images are available at the output of the algorithm either in the spatial representation or directly in the wavelet domain for high-level post processing in the latter domain. Results show that the computational complexity has been lowered by a factor of ve compared to state of the art demosaicking algorithms.

Paper          Slides (restricted access)          BibTex          IEEE Xplore

Dual-Core Reconfigurable Demosaicing Engine for Next Generation of Portable Camera Systems
Xin Zhao, Ying Yi, Ahmet Erdogan, and Tughrul Arslan

This paper presents a high performance dual-core reconfigurable processor implementation methodology for a demosaicing system that targets next generation camera systems. The implementation methodology is based on dual-core architecture with coarse-grained dynamically reconfigurable processors. The demosaicing system adopts Freeman’s algorithm that has been partitioned and mapped onto two customized and tailored heterogeneous processor cores. The demosaicing engine’s implementation has been optimized by compilation techniques and special approaches for the targeting processor. Simulation results demonstrate that the resulting demosaicing system provides high throughput reaches up to 241.6Mpixels/s, which represents a 1.82x speedup compared to a single-core implementation.

Paper          Slides (restricted access)          BibTex          IEEE Xplore


Poster Session

A Dynamically Reconfigurable Asynchronous Processor for Low Power Applications
Khodor Ahmad Fawaz, Tughrul Arslan, Sami Khawam, Mark Muir, Ioannis Nousias, Iain Lindsay, and Ahmet Erdogan

There is an increasing demand in high-throughput mobile applications for programmability and energy efficiency. Conventional mobile Central Processing Units (CPUs) and Very Long Instruction Word (VLIW) processors cannot meet these demands. In this paper, we present a novel dynamically reconfigurable processor that targets these requirements. The processor consists of a heterogeneous array of coarse grain asynchronous cells. The architecture maintains most of the benefits of custom asynchronous design, while also providing programmability via conventional high-level languages. When compared to an equivalent synchronous design, our processor results in a power reduction of up to 18%. Additionally, our processor delivers considerably lower power consumption when compared to a market leading VLIW and a low-power ARM processor, while maintaining their throughput performance. Our processor resulted in a reduction in power consumption over the ARM7 processor of around 9.5 times when running the bilinear demosaicing algorithm at the same throughput.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

A Hybrid Dual-Core Reconfigurable Processor for EBCOT Tier-1 Encoder in JPEG2000 on Next Generation Digital Cameras
Xin Zhao, Ahmet Erdogan, and Tughrul Arslan

In this paper, we present a JPEG2000 EBCOT tier-1 encoder based on a hybrid dual-core processor composed of a coarsegrained Dynamically Reconfigurable Processor (DRP) and an ARM core targeting next generation of cameras. The complete EBCOT tier-1 encoder is partitioned into two tasks and mapped onto the two cores respectively according to different potentials of the two processors. A Partial Parallel Architecture (PPA) for the Context Modeling (CM) is employed which can be easily tailored for DRP implementation for higher performance. The Arithmetic Encoder (AE) has been optimized as well, with a shared Dual-Port RAM (DPRAM) acting as the communication intermediate between the two cores. For the entire application, the two tasks can be pipelined via the global DPRAM for better performance. Simulation results show that the resulting architecture provides throughput reaching up to 40fps for a 256x256 8-bit grayscale standard Lena test image and shows its advantage compared with some DSP&VLIW applications. In addition, this hybrid processor also shows its high potential for implementing the complete JPEG2000 encoder on it targeting next generation of camera applications.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

A Methodology for Precise Comparison of Processor Core Architectures for Homogeneous Many-Core DSP Platforms
Bertrand Rousseau, Philippe Manet, Igor Loiselle, Jean-Didier Legat, and H. Vandierendonck

The power efficiency of an HMCP heavily depends on the architecture of its processor cores. It is thus very important to choose it carefully. When comparing processing architectures for their use in a many-core platform, one must evaluate its IPC, but also its power and area. Precise power and area evaluations can only be done with real implementations. However, comparing processor implementations is a difficult task since the implementation specifities introduce interferences on the performances. This paper proposes a methodology that allows to realize precise comparisons of performance for different processor architectures. Using this methodology, it is possible to choose the best architecture for an HMCP targeting DSP applications. The methodology is based on the use of a common architural template to build the cores, and on the application of specific optimizations when relevant. In order to validate the methodology, three RISC cores are implemented: a single-issue core, and two VLIW processors with respectively 3 and 5 issues. The implemented cores are precisely compared on a set of DSP kernels.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

An In-band Reconfigurable Network Node Based on a Heterogeneous Platform
Erik Markert, Enrico Billich, Ulrich Heinkel, Claudia Tischendorf, Uwe Proß, Thilo Leibelt, Axel Schneider, and Joachim Knäblein

This paper describes the implementation of a heterogeneous network node as a reconfigurable application based on embedded ASIC technology. The key points of the paper are the distribution of the reconfiguration data in-band over the network and the in-service-reconfiguration of the network node itself.The node consists of a static ASIC part and three reconfigurable various-grained FPGA-like areas included in one chip. The overall goal is to implement a system that can monitor an Ethernet data stream, extracts configuration data marked by the EtherType field in the Ethernet header and updates its functionality during operation time.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Task Placement for Dynamic and Partial Reconfigurable Architecture
Antoine Eiche, Daniel Chillet, Sébastien Pillement, and Olivier Sentieys

Managing tasks and resources of reconfigurable system-onchip is a complex problem which needs specific operating system (OS) functionalities. One of the most important is the task placement which must be done on-line when the application requires flexibility. To ensure an efficient task placement within the reconfigurable resource, OS services must consider the heterogeneity of the reconfigurable resource. While most publications model the reconfigurable resource as homogeneous area, modern reconfigurable circuits are clearly heterogeneous, i.e. there are based on rectangular grid containing logic blocks but also other blocks such as memories, digital signal processing blocks or hard processor cores. In this paper, we tackle the problem of task placement within a reconfigurable resource and we consider a heterogeneous reconfigurable area. Our solution is based on a neural network structure specifically designed to optimize the task placement problem. Our proposition is based on the knowledge of task instantiations within the reconfigurable resource. Compared with other methods, our proposal provides better results in terms of task rejection.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Wireless Sensor Network Node Global Energy Consumption Modeling
Antoine Courtay, Alain Pegatoquet, Michel Auguin, and Chiraz Chabaane

This paper deals with global power consumption modeling dedicated to Wireless Sensor Network (WSN) nodes. After having introduced various existing approaches for the energy modeling, our choices and experiments using the Ns-2 simulator and the iMote2 hardware platform (over an IEEE 802.15.4 protocol) are presented. First results show that Ns-2 simulator provides RF energy consumption metrics very close to values measured on the real platform for the same experimental conditions. An extension of Ns-2 for processor energy model is also discussed. Finally, considering different use cases for an audio transfer over a ZigBee network, it is shown the need for a global approach for optimizing power consumption.

Paper          Poster (restricted access)          BibTex          IEEE Xplore


An In-Memory Monitoring Database for Self Adaptive MP²SoCs
Etienne Faure, François Pêcheux, Gabriel Marchesan Almeida, Pascal Benoit, Gilles Sassatelli, Lionel Torres, Mounir Benabdenbi, and Fabien Clermidy

The complexity of MP2SoC architectures to come is such that many issues arise simultaneously, such as multicore programming, system performance, reliability, scalability, etc. The key to solve these issues is self-adaptability: the chips to come have to integrate the required software and hardware means to monitor and self-react to the various kinds of events that are likely to occur during chip’s lifetime. The paper describes the design principles of a software based approach used to monitor manycore architectures running multithreaded functional applications. The approach takes advantage of an in-memory database (as well as appropriate means to store, handle, and retrieve accumulated monitoring data) to achieve the monitoring task as simply as possible. After a presentation of the distributed database (called DRET), a case study is presented, along with performance results that clearly show the interest of the approach.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Runtime Adaptive Allocation of Dynamically Mixed Tasks on a Heterogeneous MPSoC Platform
Jia Huang, Andreas Raabe, Christian Buckl, and Alois Knoll

Multiprocessor System-on-Chip platforms are typically used for co-hosting multiple tasks, which may start and stop execution independently at time instants unknown at design time. In such systems, the runtime resource manager is responsible for allocating adequate and appropriate resources to each task. We identify a key issue in existing work that the resource management algorithms consider the problem only at task-level, i.e. the optimization is performed for each individual task upon activation. However, it can be shown that such strategies are suboptimal from the system point of view. In contrast, we propose in this paper a new task allocation flow that considers the system-level resource management. Comparing with traditional techniques, significant performance improvement (up to 29.5%) is observed during evaluation using a standard benchmark set. In addition, the proposed task allocator features runtime self-adaptability with respect to changes in hardware and/or applications.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Physical Layer Study in a Goal of Robustness and Energy Efficiency for Wireless Sensor Networks
Denis Dessales, Anne-Marie Poussard, Rodolphe Vauzelle, François Gaudaire, and Christophe Martinsons

In this paper we propose a method to specify a robust and energy efficiency physical layer for wireless sensor networks. Thus, from an energy model we show the influence of the channel on energy consumption. Our channel model is a realistic propagation model that takes into account all environmental specificity (geometric, electric) according to a real environment. The Bit Error Rate (BER) is used to assess the radio link quality and calculate the energy per bit transmitted successfully. Simulation results show clearly that significant gains are obtained by adjusting the transmission power in function of the kind of link between the central node and individual nodes. This study show also the importance of a cross-layered approach to optimize effectively the robustness and the lifetime of a wireless sensor network.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Energy Modeling of the Virtual Memory Subsystem for Real-Time Embedded Systems
Mickaël Lanoe and Eric Senn

While operating systems are now largely used in embedded system design, their energy consumption is far from negligible. Being able to determine the part of this consumption in the system’s overall energy budget is therefore essential. This paper proposes a methodology to model the power and energy consumption of virtual memory management mechanisms in complex operating systems. Of course, this work is only a part of a bigger project in which all the consuming components in embedded systems are considered. The virtual memory subsystem of a complete and recent Linux (patched for realtime) is studied here, with its relation with the processor’s memory management ressources (Memory Management Unit and Translation Look-aside Buffer). A method is proposed to generate different categories of page faults, and to model the incurred time and energy penalties for different page allocation strategies. The precision of the model is presented, and finally checked against actual measurements for an image processing application.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

An Inter-Task Real Time DVFS Scheme for Multiprocessor Embedded Systems
Muhammad Khurram Bhatti, Cécile Belleudy, and Michel Auguin

In this paper1, we have addressed energy-efficient scheduling of real time applications intended to be executed on multiprocessor systems. Our proposed technique, called Deterministic Stretch-to-Fit (DSF) technique, is based on inter-task real time dynamic voltage and frequency scaling (RT-DVFS). It mainly comprises of three components. Firstly, we propose an online algorithm to reclaim energy by adapting to the variations in actual workload of target application tasks. Secondly, we extend our online algorithm with an adaptive and speculative speed adjustment mechanism. This mechanism anticipates early completion of future task instances based on the information of their average workload. Thirdly, we propose a one-task extension technique for multi-task multiprocessor systems. No real time constraints of target application are violated while applying our proposed technique. Simulation results show that our online slack reclamation algorithm alone gives up to 53% gains on energy consumption and our extended speculative speed adjustment mechanism, along with the one-task extension technique, gives additional gains, reaching a theoretical low-bound on the scalable frequency and voltage.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Hardware/Software Co-Design of H.264/AVC Encoders for Multi-Core Embedded Systems
Tiago Dias, Nuno Roma, and Leonel Sousa

This paper presents a multi-core H.264/AVC encoder suitable for implementations in small and medium complexity embedded systems. The proposed structure results from an efficient hardware/software co-design methodology, where the encoder software application is highly optimized and structured in a very modular and efficient manner, so as to allow its most complex and time consuming operations to be offloaded to dedicated hardware accelerators. The considered methodology adopts a simple and efficient core interconnection mechanism to easily allow the inclusion and the removal of such optimized processing cores. Experimental results obtained with the implementation in a Virtex4 FPGA of an H.264/AVC encoder using an ASIP IP core as a ME hardware accelerator have proven the advantages of this methodology. For the considered system, speedup factors greater than 15 were obtained with a very modest increase of the involved hardware resources.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Automated Generation of an Efficient MPEG-4 Reconfigurable Video Coding Decoder Implementation
Ruirui Gu, Jonathan Piat, Mickael Raulet, Jorn W. Janneck, and Shuvra S. Bhattacharyya

This paper proposes an automatic design flow from userfriendly design to efficient implementation of video processing systems. This design flow starts with the use of coarsegrain dataflow representations based on the CAL language, which is a complete language for dataflow programming of embedded systems. Our approach integrates previously developed techniques for detecting synchronous dataflow (SDF) regions within larger CAL networks, and exploiting the static structure of such regions using analysis tools in The Dataflow interchange format Package (TDP). Using a new XML format that we have developed to exchange dataflow information between different dataflow tools, we explore systematic implementation of signal processing systems using CAL, SDF-like region detection, TDP-based static scheduling, and CAL-to-C (CAL2C) translation. Our approach, which is a novel integration of three complementary dataflow tools —the CAL parser, TDP, and CAL2C — is demonstrated on an MPEG Reconfigurable Video Coding (RVC) decoder.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

A Case Study of the Stochastic Modeling Approach for Range Estimation
Andrei Banciu, Thierry Michel, Emmanuel Casseau, and Daniel Menard

The floating-point to fixed-point conversion is an important part of the hardware design in order to obtain efficient implementations. When trying to optimize the integer wordlength under performance constraints, the dynamic variations of the variables during execution must be determined. Traditional range estimation methods based on simulations are data dependent and time consuming whereas analytical methods like interval and affine arithmetic give pessimistic results as they lack of a statistical background. Recently, a novel approach, based on the Karhunen-Loève expansion (KLE) was presented for linear time-invariant (LTI) systems offering a solid stochastic foundation. Our paper presents an implementation of this theory and shows its efficiency for an OFDM modulator test case study. We also present a review of the uncertainty quantifications problem and the different phases of the range estimation methodology.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

GPU Architecture Evaluation for Multispectral and Hyperspectral Image Analysis

Virginie Fresse, Dominique Houzet, and Christophe Gravier

Graphical Processing Units (GPU) architectures are massively used for resource-intensive computation. Initially dedicated to imaging, vision and graphics, these architectures serve nowadays a wide range of multi-purpose applications. The GPU structure, however, does not suit all applications. This can lead to performance shortage. Among several applications, the aim of this work is to analyze GPU structures for image analysis applications in multispectral to ultraspectral imaging. Algorithms used for the experiments are multispectral and hyperspectral imaging dedicated to art authentication. Such algorithms use a high number of spatial and spectral data, along with both a high number of memory accesses and a need for high storage capacity. Timing performances are compared with CPU architecture and a global analysis is made according to the algorithms and GPU architecture. This paper shows that GPU architectures are suitable to complex image analysis algorithm in multispectral.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

A New Single-Error Correction Scheme Based on Self-Diagnosis Residue Number Arithmetic
Yangyang Tang, Emmanuel Boutillon, Christophe Jego, and Michel Jezequel

With the rapid size shrinking in electronic devices, radiation-induced soft-error has emerged as a major concern to the current circuit manufacturing. In this paper, we present a new error correction scheme based on the residue number arithmetic to cope with the single soft-error issue. The proposed technique called bidirectional redundant residue number system requires the redundant moduli to satisfy some constraints to achieve fast error correction. In this system, both the iterations for decoding the valid number and the error-correcting table that contains all combinations of erroneous digit, are not necessary. The detection and the diagnosis are simultaneously performed in plural parallel consistent-checking that has the capability of locating the corrupt digit. Finally, efficient pipeline architecture for the self-diagnosis decoder is detailed.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

A Flexible Implementation of a Global Navigation Satellite System (GNSS) Receiver for On-Board Satellite Navigation
Arnaud Dion, Vincent Calmettes, Emmanuel Boutillon, and Emmanuel Liegon

In this paper, we present the implementation of the acquisition algorithm of a versatile Global Navigation Satellite System (GNSS) receiver for satellite applications. For versatility purpose, the choice of the receiver algorithms has been motivated by 1) their capability to fulfill the application requirements with a moderate complexity, 2) their capability of being factorized in a small set of elementary modules that can be configured and combined in various ways in order to process both GPS and Galileo current and future signals. These algorithms have been specified using SystemC, a modeling language that can be common to hardware and software flow. The use of a virtual platform for simulation allows us to identify bottleneck of the architecture and to propose algorithm modification to solve them.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Generation of Static Tables in Embedded Memory with Dense Scheduling
Benoît Miramond and Liliana Cucu-GrosJean

In a real-time context, designing the software relies on insuringdeterministic behavior and predictability. With system controlling several sensors and actuators sampled at different rates the scheduling theory associates the notion of Hyperperiod. It is a major factor of complexity whether for scheduling validation (simulation), or for generation of the corresponding tables in the case of pure off-line schedules. This paper presents a compression method of static real-time schedules and a design flow for generating real-time hardware schedulers. The goal is to minimize the size in embedded memory of the scheduling tables defined at compile-time. This method exploits Idle times in multiprocessors systems in order to identify cyclic patterns called dense schedules. When applied to our case studies, the average compression rate of our technique is near 90% of the initial schedules size.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

Characterization of Capture Actions in Video Sequences
Ana Pinzari and Mohamed Shawky

This paper deals with automatic characterization of video capture actions like scale change (zoom) or translation (traveling) in a video sequence. The overall objective is to enhance the scripted quality while capturing a video sequence by professionals or non professionals. Ultimately, the capture actions would use a predefined video template, progressively filled by the cameraman and automatically checked against the template scenario by the acquisition system. We focused on the detection of the zoom action. The characteristics of zoom are detected by analyzing pairs of frames of the target sequence. The principle of the method resides in choosing characteristic points into a pair of images, matching them in order to estimate the plane projective transformation using RANSAC method. Once the transformation model has been found, it allows to extract relevant information regarding the scale and the translation factor, necessary to verify the type and also the quality of the realized zoom sequence. We present several experiments that show very promising results.

Paper          Poster (restricted access)          BibTex          IEEE Xplore

GPU Implementation of Motion Estimation for Visual Saliency
Anis Rahman, Dominique Houzet, Denis Pellerin, and Lionel Agud

Visual attention is a complex concept that includes many processes to find the region of concentration in a visual scene. In this paper, we discuss a spatio-temporal visual saliency model where the visual information contained in videos is divided into two types: static and dynamic that are processed by two separate pathways. These pathways produce intermediate saliency maps that are merged together to get salient regions distinct from what surround them. Evidently, to realize a more robust model will involve inclusion of more complex processes. Likewise, the dynamic pathway of the model involves compute-intensive motion estimation,that when implemented on GPU resulted in a speedup of up to 40x against its sequential counterpart. The implementation involves a number of code and memory optimizations to get the performance gains, resultantly materializing real-time video analysis capability for the visual saliency model.

Paper          Poster (restricted access)          BibTex          IEEE Xplore


List of DASIP 2010 Participants

Share it now