Share it now

SMD 2015
Solutions for MultiCore Debug Conference

December 8, 2015
Hilton City Center, Munich, Germany


 9:00 - 9:15  Welcome  Andreas Herkersdorf,
 Albrecht  Mayer
 9:15 - 10:15  Keynote : Runtime Verification for Multicore Systems  Fadi Kurdahi, UC Irvine
 10:15 - 10:45  Coffee Break  
 10:45 - 11:05  Non-Intrusive Online observation of Multicore  Processors  Alexander Weiss, Accemic
 11:05 - 11:25  Combining Runtime Verification and Execution Time  Estimation  Boris Dreyer, Fachgebiet  Rechnersysteme, TU Darmstadt
 11:25 - 11:45  Online Runtime Verification on Multicore Systems  Norman Decker,  ISP, Universität zu Lübeck
 11:45 - 12:05  CApToR: Casual Trace Reconstruction from Semantic  Metadata  Randolf Rotta, 
 Brandenburgische Technische  Universität
 12:05 - 13:30  Lunch  
 13:30 - 14:30  Keynote : Verification of a Cache-Coherent  Multiprocessor System  Bodo Hoppe, IBM Böblingen
 14:30 - 14:50  A framework for systematic analysis of event traces  for software debugging and optimization  Salman Rafiq, Fraunhofer  ESK
 14:50 - 15:20  Coffee Break  
 15:20 - 15:40  Task Centric Debugging for High Gain Data Processing  Units  Catalin Horghidan, Freescale
 15:40 - 16:00  A Practical Approach: Automatic Bug Search Engines  Lin Li, Infineon Technologie
 16:00 - 16:20  Pre-Silicon Multi-Core Hardware/Software Debug  Russel Klein, Mentor
 16:20 - 16:30  Closing  Andreas Herkersdorf,
 Albrecht Mayer


Non-Intrusive Online Observation of Multicore Processors, Alexander Weiss, Accemic GmbH & Co. KG
Observability is an essential requirement for the success and predictability of multiprocessor projects.
Today's solutions -- based on trace data recording and offline processing -- have serious limitations, especially in limited trace trigger conditions, and the discrepancy between trace data output bandwidth and trace data processing bandwidth, which results in limited observation time.
A new FPGA based online trace data processing methodology [1] helps to overcome the state of the art limitations.
A stream of events, including information on task switches, data accesses, and an arbitrary amount of watch points will be made available for further processing (e.g. WCET analysis and runtime verification).
The presentation will
- discuss requirements for multiprocessor observation,
- give an overview about state-of-the-art observation methodologies,
- explain trace data generation capabilities of different multicore processor architectures and design considerations for efficient exploitation of the embedded trace structures,
- introduce our FPGA based online trace data processing methodology and outline the process of extraction the resulting events from the multi-Gbps trace data stream.

Combining Runtime Verification and Execution Time Estimation, Boris Dreyer, Fachgebiet Rechnersysteme, TU Darmstadt
Precise estimation of the Worst-Case Execution Time (WCET) of embedded software is a necessary precondition in most safety-critical systems.
Depending on the certification requirements, one can use measurement-based methods, which rely on exhaustive measurements performed on the real hardware.
In [1], we introduced a novel FPGA-based approach which eases the measuring by incorporating continuous aggregation of execution time data at runtime, thus allowing arbitrarily long periods of observation.
This is particurlarly important for multicore systems to catch both typical behaviour and rare circumstances.
Often, the execution time depends on the data being processed. If the data changes, the timing behaviour changes as well.
Sometimes, this effect is directly visible, e.g. because of a deadline miss.
However, even if the effect is not directyl visible, it might be a symptom of a sporadic system failure, corresponding to the "embedded health" approach [2].
We thus propose an extension to our method in [1] where we combine runtime verifcation with measurement-based execution time estimation.
The idea is as follows: Instead of computing basic block maxima at runtime, we preload the aggregation module with threshold values.
Then, monitors running on the FPGA compare the measured execution times with the provided thresholds.
Violations of the timing behaviour assumptions can thus be detected in a fine-grained way, even if they do not result in visible misbehaviour.

Online Runtime Verification on Multicore Systems, Normann Decker, Institut für Softwaretechnik und Programmiersprachen (ISP), Universität zu Lübeck
Checking the correct functionality of multicore processors is a non-trivial task.
One possibility is to observe the execution of the system via the processor's trace interface.
However, because of the huge data rate with which the trace information is typically transmitted, it is impossible to store the data for a longer time period.
An approach to check if certain properties of such systems hold is runtime verification, where the trace data is analyzed online using reconfigurable hardware and exploit the massive parallelism of FPGAs. 
The properties to be checked can typically be specified in a specification language that allows the user to formulate natural and intuitive specifications.
The presentation shows a language to specify properties that describe desired or undesired behaviour of software as well as its realization within an application framework.
This specification language has been developed especially for this application area and fulfills the needs for specifying important properties of multicore systems.
To observe the system during runtime, corresponding monitors are computed and synthesized to run on an FPGA.
Possible properties to check cover violation of timing constraints, ordering violations, or concurrency restrictions.

CApToR: Causal Trace Reconstruction from Semantic Metadata, Randolf Rotta Brandenburgische Technische Universität
A recurring challenge in the analysis of execution traces from highly parallel systems is the reconstruction of cross-location causal dependencies and the causal event order. Unfortunately, adapting reconstruction and analysis algorithms for new systems and programming models is time-consuming, especially when events from multiple hardware and software layers have to be combined. The usual huge amount of trace events can be overwhelming but is often necessary to capture all relevant causal relations. Hence, aggregation strategies are needed that remove low-level events without loosing their impact on high-level relations.
We analysed the events found in task-parallel algorithms that communicate via shared-memory. From this, we derived an approach for reusable reconstruction algorithms, called CApToR. A high level of abstraction is achieved by representing traces as attributed event graphs. Aggregation removes event nodes from the graph by updating related edges accordingly. Portability is achieved by exploiting semantic type information about synchronising events during the reconstruction and aggregation. Based on practical experience, this enables easy integration into new systems while existing graph-based query engines can be used to search for complex event patterns.

A framework for systematic analysis of event traces for software debugging and optimization, Salman Rafiq, Fraunhofer

As computational systems grow more and more complex, their debugging and performance optimization becomes a challenging task. Tracing, which is recording of events during run-time, can provide helpful data on a system’s behavior. One problem is that tracing can generate huge amount of data, and it is not easy for the developer to extract exactly the information needed. What is needed is analysis to process the data, raise the level of abstraction and support the developer in finding what he needs.
We believe that tracing in complex systems becomes even more efficient when used to combine several sources of information. This can be application traces, operating system kernel events, hardware, or communication traces. For many of those classes of information, there exist tracing and monitoring solutions. But unfortunately, each of them comes as a tightly coupled trace collection, analysis and visualization bundle, and thus, it can only be used to answer narrow range of questions. We aim to provide a flexible trace analysis solution that
• can handle and integrate different trace sources and different storage formats
• is easy to adapt to new analysis scenarios
• can be used to model abstract description of trace analysis workflows 

Task centric debugging for high gain data processing units, Catalin Horghidan, Freescale
It’s common on modern high traffic network or big data processing units to split the work on multiple cores, threads or tasks. One of the challenges that the embedded developers are facing is the possibility to simultaneously debug multiple cores, threads or tasks in a consistent, scalable nonintrusive way. Things tend to complicate when debugging hundreds of threads or tasks that execute calls on accelerators.
In order to overcome the heavy load on the debug system and provide a reliable control of the cores or tasks the debugger needs a new approach.
One example of such approach is the CodeWarrior Debugger for Advanced I/O Processor (AIOP). The AIOP is included in high-end communication processors and it focuses on accelerating data packet processing. As a primary scope it is designed to split the packet processing to up to 256 light hardware tasks. Some high-end units can have more than one AIOP device. This leaves the debugger with handling hundreds of tasks.
The CodeWarrior Debugger maintains control at a reliable and non-intrusive level for all AIOP’s tasks by employing several techniques like: task centric debugging, task stepping or per task breakpoints.

A Practical Approach:  Automatic Bug Search Engines, Lin Li, Infineon

As the complexity of multicore embedded systems increases, more potential system issues are arising e.g. shared resource contentions, cache false sharing and data race. Compared to functional bugs, these issues often have no obvious symptoms. For example, share resource contentions degrade the system performance but the system still works. A severe system breakdown may happen when the load suddenly increases and the current performance cannot handle it. Conventional methods such as breakpoints are inefficient to cope with these issues. In this presentation, a practical approach — automatic bug search engines, is proposed to detects these types of potential system issues even without symptoms or before the appearance of symptoms. This approach makes use of trace to collect system information and detect potential system issues. Based on this approach, a solid example tool named “ChipCoach” is implemented to show the basic ideas of this practical approach.

Pre-Silicon Multi-core Hardware/Software Debug, Russel Klein, Mentor

Multi-core software bugs are often intermittent, with a failure only occurring as a result of specific timing of events between two or more cores. Debugging these types of problems is very challenging using traditional sequential “stop-and-stare” debug techniques. Effective debugging requires capturing an instance of the failure and being able to examine, in a non-intrusive fashion, the software running on each of the cores. This session will detail a trace based, non-intrusive, debug approach for capturing an instance of a multi-core failure and presenting a comprehensive and scalable debug view of the activity across multiple cores with visibility into the activity on the processor interconnects and the hardware of the surrounding system-on-chip (SoC). This approach can be used with hardware emulation systems, FPGA prototypes, or, with some limitations, even on physical silicon. A demonstration of the approach can be provided in the context of an emulation system.
Share it now