# Publications équipe ARPA - 2009

1. Soraya Zertal, Claude Timsit and Majed Chatti. Communication / Synchronisation Mechanism for Multiprocessor on Chip Architectures. In IEEE Workshop on Performance Evaluation of Communications in Distributed systems and Web Based Service Architectures. 2009. BibTeX

@inproceedings{ZTCh09,
author = "Zertal, Soraya and Timsit, Claude and Chatti, Majed",
title = "{Communication / Synchronisation Mechanism for Multiprocessor on Chip Architectures}",
booktitle = "{IEEE Workshop on Performance Evaluation of Communications in Distributed systems and Web Based Service Architectures}",
year = "{2009}",
owner = "MOIS",
timestamp = "2011.10.19"
}

2. Sid Touati. Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques. 2009, 12 pages.
Abstract {The community of program optimisation and analysis, code performance evaluation, parallelisation and optimising compilation has published since many decades hundreds of research and engineering articles in major conferences and journals. These articles study efficient algorithms, strategies and techniques to accelerate programs execution times, or optimise other performance metrics (MIPS, code size, energy/power, MFLOPS, etc.). Many speedups are published, but nobody is able to reproduce them exactly. The non-reproducibility of our research results is a dark point of the art, and we cannot be qualified as ıt computer scientists if we do not provide rigorous experimental methodology. This article provides a first effort towards a correct statistical protocol for analysing and measuring speedups. As we will see, some common mistakes are done by the community inside published articles, explaining part of the non-reproducibility of the results. Our current article is not sufficient by its own to deliver a complete experimental methodology, further efforts must be done by the community to decide about a common protocol for our future experiences. Anyway, our community should take care about the aspect of reproducibility of the results in the future.} URL BibTeX

@unpublished{Toua09,
author = "Touati, Sid",
title = "{Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques}",
note = "{12 pages}",
month = "",
year = "{2009}",
abstract = "{The community of program optimisation and analysis, code performance evaluation, parallelisation and optimising compilation has published since many decades hundreds of research and engineering articles in major conferences and journals. These articles study efficient algorithms, strategies and techniques to accelerate programs execution times, or optimise other performance metrics (MIPS, code size, energy/power, MFLOPS, etc.). Many speedups are published, but nobody is able to reproduce them exactly. The non-reproducibility of our research results is a dark point of the art, and we cannot be qualified as {\it computer scientists} if we do not provide rigorous experimental methodology. This article provides a first effort towards a correct statistical protocol for analysing and measuring speedups. As we will see, some common mistakes are done by the community inside published articles, explaining part of the non-reproducibility of the results. Our current article is not sufficient by its own to deliver a complete experimental methodology, further efforts must be done by the community to decide about a common protocol for our future experiences. Anyway, our community should take care about the aspect of reproducibility of the results in the future.}",
affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
file = "_eng.pdf:http\://hal.archives-ouvertes.fr/hal-00356529/PDF/stat\\_eval\\_perf\\_eng.pdf:PDF",
hal_id = "hal-00356529",
keywords = "Program optimisation ; Statistical Performance Evaluation ; Performance; Measurement, Experimentation",
language = "Anglais",
url = "http://hal.archives-ouvertes.fr/hal-00356529/en/"
}

3. Sid-Ahmed-Ali Touati. Cyclic Task Scheduling with Storage Requirement Minimization under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities. 2009. This is a continuation work to SIRA (Sid-Ahmed-Ali Touati and Christine Eisenbeis. Early Periodic Register Allocation on ILP Processors. Parallel Processing Letters, Vol. 14, No. 2, June 2004. World Scientific.). We exetend that work with new heuristics and experimental results..
Abstract {In this report, we study the exact and an approximate formulation of the general problem of one-dimensional periodic task scheduling under storage requirement, irrespective of machine constraints. We rely on the SIRA theoretical framework that allows an optimisation of periodic storage requirement \citeTouati:PPL:04. SIRA is based on inserting some storage dependence arcs (ıt storage reuse arcs) labeled with ıt reuse distances directly on the data dependence graph. In this new graph, we are able to bound the storage requirement measured as the exact number of necessary storage locations. The determination of storage and distance reuse is parametrised by the desired minimal scheduling period (respectively maximal execution throughput) as well as by the storage requirement constraints - either can be minimised while the other one is bounded, or alternatively, both are bounded \citesiralina07,RR-INRIA-HAL-00436348. This report recalls our fundamental results on this problem, and proposes new experimental heuristics. We typically show how we can deal with some specific storage architectural constraints such as buffers and rotating storage facilities.} URL BibTeX

@techreport{Toua09a,
author = "Touati, Sid-Ahmed-Ali",
title = "{Cyclic Task Scheduling with Storage Requirement Minimization under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities}",
year = "{2009}",
note = "{This is a continuation work to SIRA (Sid-Ahmed-Ali Touati and Christine Eisenbeis. Early Periodic Register Allocation on ILP Processors. Parallel Processing Letters, Vol. 14, No. 2, June 2004. World Scientific.). We exetend that work with new heuristics and experimental results.}",
abstract = "{In this report, we study the exact and an approximate formulation of the general problem of one-dimensional periodic task scheduling under storage requirement, irrespective of machine constraints. We rely on the SIRA theoretical framework that allows an optimisation of periodic storage requirement \cite{Touati:PPL:04}. SIRA is based on inserting some storage dependence arcs ({\it storage reuse} arcs) labeled with {\it reuse distances} directly on the data dependence graph. In this new graph, we are able to bound the storage requirement measured as the exact number of necessary storage locations. The determination of storage and distance reuse is parametrised by the desired minimal scheduling period (respectively maximal execution throughput) as well as by the storage requirement constraints - either can be minimised while the other one is bounded, or alternatively, both are bounded \cite{siralina07,RR-INRIA-HAL-00436348}. This report recalls our fundamental results on this problem, and proposes new experimental heuristics. We typically show how we can deal with some specific storage architectural constraints such as buffers and rotating storage facilities.}",
affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI",
file = "PSSR.pdf:http\://hal.inria.fr/inria-00440446/PDF/PSSR.pdf:PDF",
hal_id = "inria-00440446",
language = "Anglais",
url = "http://hal.inria.fr/inria-00440446/en/"
}

4. Ling Shang, Serge Petiton, Nahid Emad, Xiaolin Yang and Zhijian Wang. Extending YML to Be a Middleware for Scientific Cloud Computing. In CloudCom 2009. 2009, 662-667. BibTeX

@inproceedings{SPE+09,
author = "Shang, Ling and Petiton, Serge and Emad, Nahid and Yang, Xiaolin and Wang, Zhijian",
title = "{Extending YML to Be a Middleware for Scientific Cloud Computing}",
booktitle = "{CloudCom 2009}",
year = "{2009}",
pages = "{662-667}",
owner = "MOIS",
timestamp = "2011.08.18"
}

5. Laurent Plagne, Frank Hulsemann, Denis Barthou and Julien Jaeger. Parallel expression template for large vectors. In Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific. 2009, 8:1-8:8. URL BibTeX

@inproceedings{PHBJ09,
author = "Plagne, Laurent and Hulsemann, Frank and Barthou, Denis and Jaeger, Julien",
title = "{Parallel expression template for large vectors}",
booktitle = "{Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific}",
year = "{2009}",
pages = "{8:1-8:8}",
month = "{} } # Jul # { {}",
affiliation = "EDF R\&D - EDF - Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
audience = "internationale",
hal_id = "hal-00551682",
keywords = "C++ template, OpenMP, expression template, intel TBB, multi-core processors, parallel computing",
language = "Anglais",
url = "http://hal.archives-ouvertes.fr/hal-00551682/en/"
}

6. Souad Koliai, Stephane Zuckerman, Emmnuel Oseret, Mickael Ivascot, Tipp Moseley, Ding Quang and William Jalby. A Balanced Approach to Application Performance Tuning. In LCPC' 09: Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing. 2009, 111-125. BibTeX

@inproceedings{KZO+09,
author = "Koliai, Souad and Zuckerman, Stephane and Oseret, Emmnuel and Ivascot, Mickael and Moseley, Tipp and Quang, Ding and Jalby, William",
title = "{A Balanced Approach to Application Performance Tuning}",
booktitle = "{LCPC' 09: Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing}",
year = "{2009}",
pages = "{111-125}",
publisher = "{Springer-Verlag}",
owner = "MOIS",
timestamp = "2011.07.28"
}

7. Minhaj Ahmad Khan, Henri-Pierre Charles and Denis Barthou. Improving Performance of Optimized Kernels Through Fast Instantiations of Templates. Concurrency and Computation: Practice and Experience (21):59-70, 2009. BibTeX

@article{KCBa11,
author = "Khan, Minhaj Ahmad and Charles, Henri-Pierre and Barthou, Denis",
title = "{Improving Performance of Optimized Kernels Through Fast Instantiations of Templates}",
journal = "{Concurrency and Computation: Practice and Experience}",
year = "{2009}",
pages = "{59-70}",
number = 21,
owner = "MOIS",
timestamp = "2011.08.19"
}

8. Richard Dusseaux, K Ait Braham and Nahid Emad. Eigenvalue System for the Scattering from Rough Surfaces – Saving in Computation Time by a Physical Approach. Optics Communications 282(18):3820-3826, 2009. URL, DOI BibTeX

@article{DAEm09,
author = "Dusseaux, Richard and Ait Braham, K. and Emad, Nahid",
title = "{Eigenvalue System for the Scattering from Rough Surfaces -- Saving in Computation Time by a Physical Approach}",
journal = "{Optics Communications}",
year = "{2009}",
volume = "{282}",
pages = "{3820-3826}",
number = "{18}",
affiliation = "Institut Pierre-Simon-Laplace - IPSL - CNRS : FR636 - IRD - CEA - CNES - INSU - Universit{\'e} Pierre et Marie Curie - Paris VI - Universit{\'e} de Versailles-Saint Quentin en Yvelines - Ecole Normale Sup{\'e}rieure de Paris - ENS Paris - Laboratoire Atmosph{\e}res, Milieux, Observations Spatiales - LATMOS - CNRS : UMR8190 - Universit{\'e} Pierre et Marie Curie - Paris VI - Universit{\'e} de Versailles-Saint Quentin en Yvelines - INSU - Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
audience = "internationale",
doi = "10.1016/j.optcom.2009.06.010",
hal_id = "hal-00410092",
keywords = "Scattering; Rough surfaces; C method; Eigenvalue system; Beam simulation method; Huygens principle",
language = "Anglais",
url = "http://hal.archives-ouvertes.fr/hal-00410092/en/"
}

9. Lamia Djoudi, Vasil Khachazide and William Jalby. KBS-MAQAO: A Knowledge Based System for MAQAO Tool. In HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications. 2009, 571-578. BibTeX

@inproceedings{DKJa09,
author = "Djoudi, Lamia and Khachazide, Vasil and Jalby, William",
title = "{KBS-MAQAO: A Knowledge Based System for MAQAO Tool}",
booktitle = "{HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications}",
year = "{2009}",
pages = "{571-578}",
publisher = "{IEEE Computer Society}",
owner = "MOIS",
timestamp = "2011.07.28"
}

10. Lamia Djoudi, Jean-Thomas Acquaviva and Denis Barthou. Compositional Approach applied to Loop Specialization. Concurrency and Computation: Practice and Experience 21(1):71-84, 2009. URL BibTeX

@article{DAB+09,
author = "Djoudi, Lamia and Acquaviva, Jean-Thomas and Barthou, Denis",
title = "{Compositional Approach applied to Loop Specialization}",
journal = "Concurrency and Computation: Practice and Experience",
year = "{2009}",
volume = "{21}",
pages = "{71-84}",
number = "{1}",
month = "{} } # Jan # { {}",
affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
audience = "internationale",
hal_id = "hal-00575934",
language = "Anglais",
url = "http://hal.archives-ouvertes.fr/hal-00575934/en/"
}

11. Laurent Choy, Olivier Delannoy, Nahid Emad and Serge Petiton. Federation and Abstraction of Heterogeneous Global Computing Platforms with the YML Framework. In CISIS 2009. 2009, 451-456. BibTeX

@inproceedings{CDEP09,
author = "Choy, Laurent and Delannoy, Olivier and Emad, Nahid and Petiton, Serge",
title = "{Federation and Abstraction of Heterogeneous Global Computing Platforms with the YML Framework}",
booktitle = "{CISIS 2009}",
year = "{2009}",
pages = "{451-456}",
owner = "MOIS",
timestamp = "2011.07.29"
}

12. Sebastien Briais and Sid-Ahmed-Ali Touati. Schedule-Sensitive Register Pressure Reduction in Innermost Loops, Basic Blocks and Super-Blocks. 2009.
Abstract {This report makes a massive experimental study of an efficient heuristic for the SIRA framework \citesira04. The heuristic, called SIRALINA \citesiralina07, bounds the register requirement of a data dependence graph before instruction scheduling under resource constraints. Our aim is to guarantee the absence of spilling before any instruction scheduling process, without hurting instruction level parallelism if possible. Our register pressure reduction methods are sensitive for both software pipelining (innermost loops) and acyclic scheduling (basic blocks and super-blocks). The SIRALINA method that we experiment in this report is shown efficient in terms of compilation times, in terms of register requirement reduction and in terms of shorted schedule increase. Our experiments are done on thousands standalone DDG extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. We consider processor architectures with multiple register type and we model delayed access times to registers. Our register pressure reduction method is distributed as a C independent library (\textttSIRAlib.} URL BibTeX

@techreport{BrTo09,
author = "Briais, Sebastien and Touati, Sid-Ahmed-Ali",
title = "{Schedule-Sensitive Register Pressure Reduction in Innermost Loops, Basic Blocks and Super-Blocks}",
year = "{2009}",
abstract = "{This report makes a massive experimental study of an efficient heuristic for the SIRA framework \cite{sira04}. The heuristic, called SIRALINA \cite{siralina07}, bounds the register requirement of a data dependence graph before instruction scheduling under resource constraints. Our aim is to guarantee the absence of spilling before any instruction scheduling process, without hurting instruction level parallelism if possible. Our register pressure reduction methods are sensitive for both software pipelining (innermost loops) and acyclic scheduling (basic blocks and super-blocks). The SIRALINA method that we experiment in this report is shown efficient in terms of compilation times, in terms of register requirement reduction and in terms of shorted schedule increase. Our experiments are done on thousands standalone DDG extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. We consider processor architectures with multiple register type and we model delayed access times to registers. Our register pressure reduction method is distributed as a C independent library (\texttt{SIRAlib}.}",
affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
file = "_report.pdf:http\://hal.inria.fr/inria-00436348/PDF/main\\_siralina\\_report.pdf:PDF",
hal_id = "inria-00436348",
keywords = "Compilation, Code optimisation, Register pressure, Instruction level parallelism",
language = "Anglais",
pages = 53,
url = "http://hal.inria.fr/inria-00436348/en/"
}

13. Sebastien Briais and Sid-Ahmed-Ali Touati. Experimental Study of Register Saturation in Basic Blocks and Super-Blocks: Optimality and heuristics. 2009. Experimental data and free software are included (made public).
Abstract {Register saturation (RS) is the exact maximal register need of all valid schedules of a data dependence graph \citeTouati:RSILP05. Its optimal computation is NP-complete. This report proposes two variants of heuristics for computing the acyclic RS of directed acyclic graphs (DAG). The first one improves the previous greedy-k heuristic \citeTouati:RSILP05 in terms of approximating the RS with equivalent computation times. The second heuristic is faster, has better RS approximation than greedy-k, but scarifies the computation of saturating values. In order to evaluate the efficiency of these two heuristics, we designed an optimal combinatorial algorithm computing the optimal RS for tractable cases, which turns out to be satisfactory in practice. Extensive experiments have been conducted on thousands of data dependence graphs extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. Numerical results are presented to demonstrate the efficiency of the two proposed heuristics, so hence they can replace the greedy-k heuristic presented in \citeTouati:RSILP05. Our RS computation methods are distributed as a C independent library (\textttRSlib) under LGPL licence.} URL BibTeX

@techreport{BrTo09a,
author = "Briais, Sebastien and Touati, Sid-Ahmed-Ali",
title = "{Experimental Study of Register Saturation in Basic Blocks and Super-Blocks: Optimality and heuristics}",
year = "{2009}",
note = "{Experimental data and free software are included (made public)}",
abstract = "{Register saturation (RS) is the exact maximal register need of all valid schedules of a data dependence graph \cite{Touati:RSILP05}. Its optimal computation is NP-complete. This report proposes two variants of heuristics for computing the acyclic RS of directed acyclic graphs (DAG). The first one improves the previous greedy-k heuristic \cite{Touati:RSILP05} in terms of approximating the RS with equivalent computation times. The second heuristic is faster, has better RS approximation than greedy-k, but scarifies the computation of saturating values. In order to evaluate the efficiency of these two heuristics, we designed an optimal combinatorial algorithm computing the optimal RS for tractable cases, which turns out to be satisfactory in practice. Extensive experiments have been conducted on thousands of data dependence graphs extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. Numerical results are presented to demonstrate the efficiency of the two proposed heuristics, so hence they can replace the greedy-k heuristic presented in \cite{Touati:RSILP05}. Our RS computation methods are distributed as a C independent library (\texttt{RSlib}) under LGPL licence.}",
affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
file = "_report.pdf:http\://hal.inria.fr/inria-00431103/PDF/main\\_RS\\_report.pdf:PDF",
hal_id = "inria-00431103",
keywords = "Compilation, Code optimisation, Register saturation, Instruction level parallelism",
language = "Anglais",
pages = 33,
url = "http://hal.inria.fr/inria-00431103/en/"
}

14. Samir Ammenouche, Sid-Ahmed-Ali Touati and William Jalby. On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors. In HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications. 2009, 196-205. BibTeX

@inproceedings{AmTo10,
author = "Ammenouche, Samir and Touati, Sid-Ahmed-Ali and Jalby, William",
title = "{On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors}",
booktitle = "{HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications}",
year = "{2009}",
pages = "{196-205}",
publisher = "{IEEE Computer Society}",
owner = "MOIS",
timestamp = "2011.07.28"
}


DMC Firewall is a Joomla Security extension!