Publications équipe ARPA - 2009 à 2012

  1. Eric Petit, Pablo De Oliveira Castro Herreo, Tarek Menouer, Bettina Krammer and William Jalby. Computing-Kernels Performance Prediction using Dataflow Analysis and Microbenchmarking. In 16th Workshop on Compilers for Parallel Computing (CPC 2012), Padova, Italy. janvier 2012. BibTeX

    @inproceedings{PDM+12,
    	author = "Petit, Eric and De Oliveira Castro Herreo, Pablo and Menouer, Tarek and Krammer, Bettina and Jalby, William",
    	title = "{Computing-Kernels Performance Prediction using Dataflow Analysis and Microbenchmarking}",
    	booktitle = "{16th Workshop on Compilers for Parallel Computing (CPC 2012), Padova, Italy}",
    	year = "{2012}",
    	month = "{January}",
    	owner = "MOIS",
    	timestamp = "2012.11.08"
    }
    
  2. Marine Minier and Maria Naya-Plasencia. A Related Key Impossible Differential Attack Against 22 Rounds of the Lightweight Block Cipher LBlock. Information Processing Letters 112(16):624-629, 2012. BibTeX

    @article{MiNa12,
    	author = "Minier, Marine and Naya-Plasencia, Maria",
    	title = "{A Related Key Impossible Differential Attack Against 22 Rounds of the Lightweight Block Cipher LBlock}",
    	journal = "{Information Processing Letters}",
    	year = "{2012}",
    	volume = "{112}",
    	pages = "{624-629}",
    	number = "{16}",
    	owner = "MOIS",
    	timestamp = "2012.11.08"
    }
    
  3. Abdelhafid Mazouz. Une étude empirique des performances des applications OpenMP sur plateformes multi-coeurs. Université de Versailles, laboratoire Prism, 2012. BibTeX

    @phdthesis{Mazo12,
    	author = "Mazouz, Abdelhafid",
    	title = "{Une étude empirique des performances des applications OpenMP sur plateformes multi-coeurs}",
    	school = "{Université de Versailles, laboratoire Prism}",
    	year = "{2012}",
    	owner = "MOIS",
    	timestamp = "2012.12.07"
    }
    
  4. Aurèle Mahéo, Souad Kolai, Patrick Carribault, Marc Pérache and William Jalby. Adaptive OpenMP for Large NUMA Nodes. In IWOMP'12: Proceedings of the 8th international Conference on OpenMP in a Heterogeneous World. 2012. BibTeX

    @inproceedings{MKC12,
    	author = "Mahéo, Aurèle and Kolai, Souad and Carribault, Patrick and Pérache, Marc and Jalby, William",
    	title = "{Adaptive OpenMP for Large NUMA Nodes}",
    	booktitle = "{IWOMP'12: Proceedings of the 8th international Conference on OpenMP in a Heterogeneous World}",
    	year = "{2012}",
    	publisher = "{Springer-Verlag}",
    	owner = "MOIS",
    	timestamp = "2012.12.11"
    }
    
  5. Jeremy Jean, Maria Naya-Plasencia and Thomas Peyrin. Improved Rebound Attack on the Finalist Grostl. In Anne Canteaut (ed.). Fast Software Encryption - 19th International Workshop, FSE 2012, Washington, DC, USA, March 19-21, 2012 7549. 2012, 110-126. Revised Selected Papers. BibTeX

    @inproceedings{JNPe12,
    	author = "Jean, Jeremy and Naya-Plasencia, Maria and Peyrin, Thomas",
    	title = "{Improved Rebound Attack on the Finalist Grostl}",
    	booktitle = "{Fast Software Encryption - 19th International Workshop, FSE 2012, Washington, DC, USA, March 19-21, 2012}",
    	year = "{2012}",
    	editor = "{Anne Canteaut}",
    	volume = "{7549}",
    	series = "{Lecture Notes in Computer Science}",
    	pages = "{110-126}",
    	publisher = "{Springer}",
    	note = "{Revised Selected Papers}",
    	owner = "MOIS",
    	timestamp = "2012.11.09"
    }
    
  6. William Jalby, David C Wong, David J Kuck, Jean-Thomas Acquaviva and Jean-Christophe Beyler. High-Performance Scientific Computing. Chapter Measuring Computer Performance, pages 75-95, Springer Verlag, 2012. BibTeX

    @inbook{JWK+12,
    	chapter = "Measuring Computer Performance",
    	pages = "{75-95}",
    	title = "{High-Performance Scientific Computing}",
    	publisher = "{Springer Verlag}",
    	year = "{2012}",
    	author = "Jalby, William and Wong, David C. and Kuck, David J. and Acquaviva, Jean-Thomas and Beyler, Jean-Christophe",
    	owner = "MOIS",
    	timestamp = "2012.03.21"
    }
    
  7. Julien Jaeger. Transformations source-à-source pour l'optimisation de codes irréguliers et multithreads. Université de Versailles Saint-Quentin en Yvelines, laboratoire PRISM, 2012. BibTeX

    @phdthesis{Jaeg12,
    	author = "Jaeger, Julien",
    	title = "{Transformations source-à-source pour l'optimisation de codes irréguliers et multithreads}",
    	school = "{Université de Versailles Saint-Quentin en Yvelines, laboratoire PRISM}",
    	year = "{2012}",
    	owner = "MOIS",
    	timestamp = "2012.07.16"
    }
    
  8. Peter Harrison, Sarah K Harrison, Naresh Patel and Soraya Zertal. Storage Workload Modelling by HIdden Markov Models: Application to Flash Memory. Performance Evaluation 69(1):17-40, 2012. BibTeX

    @article{HHPZ12,
    	author = "Harrison, Peter and Harrison, Sarah K. and Patel, Naresh and Zertal, Soraya",
    	title = "{Storage Workload Modelling by HIdden Markov Models: Application to Flash Memory}",
    	journal = "{Performance Evaluation}",
    	year = "{2012}",
    	volume = "{69}",
    	pages = "{17-40}",
    	number = "{1}",
    	owner = "MOIS",
    	timestamp = "2012.03.14"
    }
    
  9. Pablo De Oliveira Castro, Eric Petit, Jean Christophe Beyler and William Jalby. ASK: Adaptive Sampling Kit for Performance Characterization. In Euro-Par'12: Proceedings of the 18th International Conference on Parallel Processing. 2012. BibTeX

    @inproceedings{DPB+12,
    	author = "De Oliveira Castro, Pablo and Petit, Eric and Beyler, Jean Christophe and Jalby, William",
    	title = "{ASK: Adaptive Sampling Kit for Performance Characterization}",
    	booktitle = "{Euro-Par'12: Proceedings of the 18th International Conference on Parallel Processing}",
    	year = "{2012}",
    	publisher = "{Springer-Verlag}",
    	owner = "MOIS",
    	timestamp = "2012.12.11"
    }
    
  10. Florian Dang, Nahid Emad and Pierre Fiorini. Toward Reusable Numerical Library for Solving Hamilton-Jacobi-Bellman (HJB) Equations. In 7th International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2012), 28-30 June 2012, Birkbeck University of London, UK. 2012. BibTeX

    @inproceedings{DEFi12,
    	author = "Dang, Florian and Emad, Nahid and Fiorini, Pierre",
    	title = "{Toward Reusable Numerical Library for Solving Hamilton-Jacobi-Bellman (HJB) Equations}",
    	booktitle = "{7th International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2012), 28-30 June 2012, Birkbeck University of London, UK}",
    	year = "{2012}",
    	owner = "MOIS",
    	timestamp = "2012.12.11"
    }
    
  11. Makarem Dandouna. Librairies numériques réutilisables pour le calcul distribué à grande échelle. Université de Versailles-Saint-Quentin en Yvelines, laboratoire Prism, 2012. BibTeX

    @phdthesis{Dand12,
    	author = "Dandouna, Makarem",
    	title = "{Librairies numériques réutilisables pour le calcul distribué à grande échelle}",
    	school = "{Université de Versailles-Saint-Quentin en Yvelines, laboratoire Prism}",
    	year = "{2012}",
    	owner = "MOIS",
    	timestamp = "2012.12.03"
    }
    
  12. Andres Charif-Rubial. Analyse et optimisation de performances sur architectures multicoeurs. Université de Versailles Saint-Quentin en Yvelines, laboratoire Prism, 2012. BibTeX

    @phdthesis{CHAR12,
    	author = "Charif-Rubial, Andres",
    	title = "{Analyse et optimisation de performances sur architectures multicoeurs}",
    	school = "{Université de Versailles Saint-Quentin en Yvelines, laboratoire Prism}",
    	year = "{2012}",
    	owner = "MOIS",
    	timestamp = "2012.10.26"
    }
    
  13. Anne Canteaut and Maria Naya-Plasencia. Parity-Check Relations on Combination Generators. IEEE Transactions on Information Theory 58(6):3900-3911, 2012. BibTeX

    @article{CaNa12a,
    	author = "Canteaut, Anne and Naya-Plasencia, Maria",
    	title = "{Parity-Check Relations on Combination Generators}",
    	journal = "{IEEE Transactions on Information Theory}",
    	year = "{2012}",
    	volume = "{58}",
    	pages = "{3900-3911}",
    	number = "{6}",
    	owner = "MOIS",
    	timestamp = "2012.11.08"
    }
    
  14. Anne Canteaut and Maria Naya-Plasencia. Correlation Attacks on Combination Generators. In Cryptography and Communications 4(3-4). 2012, 147-171. BibTeX

    @inproceedings{CaNa12,
    	author = "Canteaut, Anne and Naya-Plasencia, Maria",
    	title = "{Correlation Attacks on Combination Generators}",
    	booktitle = "{Cryptography and Communications}",
    	year = "{2012}",
    	volume = "{4}",
    	number = "{3-4}",
    	pages = "{147-171}",
    	owner = "MOIS",
    	timestamp = "2012.11.08"
    }
    
  15. Anne Canteaut, Thomas Furh, Maria Naya-Plasencia, Pascal Paillier, Jean-Rene Reinhard and Marion Videau. A Unified Indifferentiability Proof for Permutation or Block Cipher-Based Hash Functions. IACR Cryptology ePrint archive 2012, page 363, 2012. BibTeX

    @article{CFN+12,
    	author = "Canteaut, Anne and Furh, Thomas and Naya-Plasencia, Maria and Paillier, Pascal and Reinhard, Jean-Rene and Videau, Marion",
    	title = "{A Unified Indifferentiability Proof for Permutation or Block Cipher-Based Hash Functions}",
    	journal = "{IACR Cryptology ePrint archive 2012}",
    	year = "{2012}",
    	pages = "{363}",
    	owner = "MOIS",
    	timestamp = "2012.11.09"
    }
    
  16. Nicolas Benoit. Etude des compilateurs "backend" spécifiques pour les processeurs dédiés des systèmes multiprocesseurs sur puce. Université de Versailles, laboratoire Prism, 2012. BibTeX

    @phdthesis{Beno12,
    	author = "Benoit, Nicolas",
    	title = {{Etude des compilateurs "backend" spécifiques pour les processeurs dédiés des systèmes multiprocesseurs sur puce}},
    	school = "{Université de Versailles, laboratoire Prism}",
    	year = "{2012}",
    	owner = "MOIS",
    	timestamp = "2012.12.03"
    }
    

  1. Tipp Moseley, Neil Vachharajani and William Jalby. Hardware Performance Monitoring for the Rest of Us: A Position and Survey. In NPC'11: Proceedings of the 8th IFIP International Conference on Network and Parallel Computing. 2011, 293-312. BibTeX

    @inproceedings{MVJa11,
    	author = "Moseley, Tipp and Vachharajani, Neil and Jalby, William",
    	title = "{Hardware Performance Monitoring for the Rest of Us: A Position and Survey}",
    	booktitle = "{NPC'11: Proceedings of the 8th IFIP International Conference on Network and Parallel Computing}",
    	year = "{2011}",
    	pages = "{293-312}",
    	address = "{The Centre for Graduate Studies of the Universiti Teknologi PETRONAS, Tronoh, Perak, Malaysia}",
    	month = "{September 19-20}",
    	publisher = "{Springer-Verlag}",
    	owner = "MOIS",
    	timestamp = "2011.11.24"
    }
    
  2. Samir Ammenouche. Etude de l'interaction bas niveau entre le parallélisme d'instructions et les caches. Université de Versailles Saint-Quentin en Yvelines UFR des Sciences - bâtiment Descartes Laboratoire PRISM 45 avenue des Etats-Unis 78035 Versailles cedex, 2011. BibTeX

    @phdthesis{Amme11,
    	author = "Ammenouche, Samir",
    	title = "{Etude de l'interaction bas niveau entre le parallélisme d'instructions et les caches}",
    	year = "{2011}",
    	address = "{Université de Versailles Saint-Quentin en Yvelines UFR des Sciences - bâtiment Descartes Laboratoire PRISM 45 avenue des Etats-Unis 78035 Versailles cedex}",
    	month = "{mars}",
    	owner = "MOIS",
    	timestamp = "2012.02.23"
    }
    

  1. Nahid Emad, Olivier Delannoy and Makarem Dandouna. A Design Approach for Numerical Libraries in Large Scale Distributed Systems. In The eight ACS/IEEE International Conference on Computer Systems and Applications 2010. mai 2010, 1-9. BibTeX

    @inproceedings{EDMa10,
    	author = "Emad, Nahid and Delannoy, Olivier and Dandouna, Makarem",
    	title = "{A Design Approach for Numerical Libraries in Large Scale Distributed Systems}",
    	booktitle = "{The eight ACS/IEEE International Conference on Computer Systems and Applications 2010}",
    	year = "{2010}",
    	pages = "1-9",
    	address = "Hammamet, Tunisia",
    	month = "May",
    	owner = "MOIS",
    	timestamp = "2012.02.10"
    }
    
  2. Stephane Zuckerman and William Jalby. Tacking Cache-Line Stealing Effects Using Run-Time Adaptation. In LCPC' 10: Proceedings of the 23rd International Conference on Languagesand Compilers for Parallel Computing. 2010, 62-76. BibTeX

    @inproceedings{ZuJa10,
    	author = "Zuckerman, Stephane and Jalby, William",
    	title = "{Tacking Cache-Line Stealing Effects Using Run-Time Adaptation}",
    	booktitle = "{LCPC' 10: Proceedings of the 23rd International Conference on Languagesand Compilers for Parallel Computing}",
    	year = "{2010}",
    	pages = "{62-76}",
    	publisher = "{Springer-Verlag}",
    	owner = "MOIS",
    	timestamp = "2011.07.28"
    }
    
  3. Sid-Ahmed-Ali Touati, Julien Worms and Sebastien Briais. The Speedup Test. 2010. A software is included with the document: the software implements the speedup-test protocole..
    Abstract {Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments. In order to improve the reproducibility of the experimental results, this technical report presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing $0\frac12$, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \citeJain:1991:ACS,lilja:book. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and distributed as an open source tool based on R software.} URL BibTeX

    @techreport{TWBr10,
    	author = "Touati, Sid-Ahmed-Ali and Worms, Julien and Briais, Sebastien",
    	title = "{The Speedup Test}",
    	year = "{2010}",
    	note = "{A software is included with the document: the software implements the speedup-test protocole.}",
    	abstract = "{Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments. In order to improve the reproducibility of the experimental results, this technical report presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing $0\frac{1}{2}$, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \cite{Jain:1991:ACS,lilja:book}. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and distributed as an open source tool based on R software.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI - Laboratoire de Math{\'e}matiques de Versailles - LM-Versailles - CNRS : UMR8100 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	file = "SpeedupTestDocument.pdf:http\://hal.inria.fr/inria-00443839/PDF/SpeedupTestDocument.pdf:PDF",
    	hal_id = "inria-00443839",
    	keywords = "Code optimisation, program performance evaluation and analysis, statistics",
    	language = "Anglais",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.inria.fr/inria-00443839/en/"
    }
    
  4. Claude Timsit. Du transistor à l'ordinateur. Hermann, 2010. BibTeX

    @book{Tims10,
    	title = "{Du transistor à l'ordinateur}",
    	publisher = "{Hermann}",
    	year = "{2010}",
    	author = "Timsit, Claude",
    	owner = "MOIS",
    	timestamp = "2012.01.31"
    }
    
  5. Claude Timsit and Soraya Zertal. Using Spreadsheets to Teach Computer Architecture. In International Conference on Computer Supported Education (CSEDU). 2010. BibTeX

    @inproceedings{TiZe10,
    	author = "Timsit, Claude and Zertal, Soraya",
    	title = "{Using Spreadsheets to Teach Computer Architecture}",
    	booktitle = "{International Conference on Computer Supported Education (CSEDU)}",
    	year = "{2010}",
    	owner = "MOIS",
    	timestamp = "2011.11.04"
    }
    
  6. Abdelhafid Mazouz, Sid-Ahmed-Ali Touati and Denis Barthou. Measuring and Analysing the Variations of Program Execution Times on Multicore Platforms: Case Study. 2010.
    Abstract The recent growth in the number of precessing units in today's multicore processor architectures enables multiple threads to execute simultanesiouly achieving better performances by exploiting thread level parallelism. With the architectural complexity of these new state of the art designs, comes a need to better understand the interactions between the operating system layers, the applications and the underlying hardware platforms. The ability to characterise and to quantify those interactions can be useful in the processes of performance evaluation and analysis, compiler optimisations and operating system job scheduling allowing to achieve better performance stability, reproducibility and predictability. We consider in our study performances instability as variations in program execution times. While these variations are statistically insignificant for large sequential applications, we observe that parallel native OpenMP programs have less performance stability. Understanding the performance instability in current multicore architectures is even more complicated by the variety of factors and sources influencing the applications performances. URL BibTeX

    @techreport{MTBa10a,
    	author = "Mazouz, Abdelhafid and Touati, Sid-Ahmed-Ali and Barthou, Denis",
    	title = "{Measuring and Analysing the Variations of Program Execution Times on Multicore Platforms: Case Study}",
    	year = "{2010}",
    	month = "{} } # Sep # { {}",
    	abstract = "{The recent growth in the number of precessing units in today's multicore processor architectures enables multiple threads to execute simultanesiouly achieving better performances by exploiting thread level parallelism. With the architectural complexity of these new state of the art designs, comes a need to better understand the interactions between the operating system layers, the applications and the underlying hardware platforms. The ability to characterise and to quantify those interactions can be useful in the processes of performance evaluation and analysis, compiler optimisations and operating system job scheduling allowing to achieve better performance stability, reproducibility and predictability. We consider in our study performances instability as variations in program execution times. While these variations are statistically insignificant for large sequential applications, we observe that parallel native OpenMP programs have less performance stability. Understanding the performance instability in current multicore architectures is even more complicated by the variety of factors and sources influencing the applications performances.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI - Laboratoire Bordelais de Recherche en Informatique - LaBRI - CNRS : UMR5800 - Universit{\'e} Sciences et Technologies - Bordeaux I - Ecole Nationale Sup{\'e}rieure d'Electronique, Informatique et Radiocommunications de Bordeaux - Universit{\'e} Victor Segalen - Bordeaux II",
    	file = "VarExecTime.pdf:http\://hal.inria.fr/inria-00514548/PDF/VarExecTime.pdf:PDF",
    	hal_id = "inria-00514548",
    	keywords = "OpenMP, Multicore, Parallelism, Performance evaluation",
    	language = "Anglais",
    	owner = "MOIS",
    	pages = 36,
    	timestamp = "2011.07.25",
    	url = "http://hal.inria.fr/inria-00514548/en/"
    }
    
  7. Abdelhafid Mazouz, Sid-Ahmed-Ali Touati and Denis Barthou. Study of Variations of Native Program Execution Times on Multi-core Architectures. In International IEEE Conference on Complex, Intelligent and Software Intensive Systems. 2010, 919-924. URL BibTeX

    @inproceedings{MTBa10,
    	author = "Mazouz, Abdelhafid and Touati, Sid-Ahmed-Ali and Barthou, Denis",
    	title = "{Study of Variations of Native Program Execution Times on Multi-core Architectures}",
    	booktitle = "{International IEEE Conference on Complex, Intelligent and Software Intensive Systems}",
    	year = "{2010}",
    	pages = "{919-924}",
    	address = "{Krakow, Pologne}",
    	month = "{} } # Feb # { {}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI",
    	audience = "internationale",
    	hal_id = "hal-00551581",
    	language = "Anglais",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.archives-ouvertes.fr/hal-00551581/en/"
    }
    
  8. Yuanjie Huang, Liang Peng, Chengyong Wu, Yury Kashnikov, Jorn Rennecke and Grigori Fursin. Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation. In 2nd International Workshop on GCC Research Opportunities (GROW'10). 2010. Google Summer of Code'09.
    Abstract Computer scientists are always eager to have a powerful, robust and stable compiler infrastructure. However, until recently, researchers had to either use available and often unstable research compilers, create new ones from scratch, try to hack open-source non-research compilers or use source to source tools. It often requires duplication of a large amount of functionality available in current production compilers while making questionable the practicality of the obtained research results. The Interactive Compilation Interface (ICI) has been introduced to avoid such time-consuming replication and transform popular, production compilers such as GCC into research toolsets by providing an ability to access, modify and extend GCC's internal functionality through a compiler-dependent hook and clear compiler-independent API with external portable plugins without interrupting the natural evolution of a compiler. In this paper, we describe our recent extensions to GCC and ICI with the preliminary experimental data to support selection and reordering of optimization passes with a dependency grammar, control of individual transformations and their parameters, generic function cloning and program instrumentation. We are synchronizing these developments implemented during Google Summer of Code'09 program with the mainline GCC 4.5 and its native low-level plugin system. These extensions are intended to enable and popularize the use of GCC for realistic research on empirical iterative feedback-directed compilation, statistical collective optimization, run-time adaptation and development of intelligent self-tuning computing systems among other important topics. Such research infrastructure should help researchers prototype and validate their ideas quickly in realistic, production environments while keeping portability of their research plugins across different releases of a compiler. Moreover, it should also allow to move successful ideas back to GCC much faster thus helping to improve, modularize and clean it up. Furthermore, we are porting GCC with ICI extensions for performance/power auto-tuning for data centers and cloud computing systems with heterogeneous architectures or for continuous whole system optimization. URL BibTeX

    @inproceedings{HPW+10,
    	author = "Huang, Yuanjie and Peng, Liang and Wu, Chengyong and Kashnikov, Yury and Rennecke, Jorn and Fursin, Grigori",
    	title = "{Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation}",
    	booktitle = "{2nd International Workshop on GCC Research Opportunities (GROW'10)}",
    	year = "{2010}",
    	address = "{Pisa, Italie}",
    	month = "{} } # Jan # { {}",
    	note = "{Google Summer of Code'09}",
    	abstract = "{Computer scientists are always eager to have a powerful, robust and stable compiler infrastructure. However, until recently, researchers had to either use available and often unstable research compilers, create new ones from scratch, try to hack open-source non-research compilers or use source to source tools. It often requires duplication of a large amount of functionality available in current production compilers while making questionable the practicality of the obtained research results. The Interactive Compilation Interface (ICI) has been introduced to avoid such time-consuming replication and transform popular, production compilers such as GCC into research toolsets by providing an ability to access, modify and extend GCC's internal functionality through a compiler-dependent hook and clear compiler-independent API with external portable plugins without interrupting the natural evolution of a compiler. In this paper, we describe our recent extensions to GCC and ICI with the preliminary experimental data to support selection and reordering of optimization passes with a dependency grammar, control of individual transformations and their parameters, generic function cloning and program instrumentation. We are synchronizing these developments implemented during Google Summer of Code'09 program with the mainline GCC 4.5 and its native low-level plugin system. These extensions are intended to enable and popularize the use of GCC for realistic research on empirical iterative feedback-directed compilation, statistical collective optimization, run-time adaptation and development of intelligent self-tuning computing systems among other important topics. Such research infrastructure should help researchers prototype and validate their ideas quickly in realistic, production environments while keeping portability of their research plugins across different releases of a compiler. Moreover, it should also allow to move successful ideas back to GCC much faster thus helping to improve, modularize and clean it up. Furthermore, we are porting GCC with ICI extensions for performance/power auto-tuning for data centers and cloud computing systems with heterogeneous architectures or for continuous whole system optimization.}",
    	affiliation = "Institute of Computing Technology - Chinese Academy of Science - ICT - Chinese Academy of Science (CAS) - Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI",
    	audience = "internationale",
    	file = "hpwp2010.pdf:http\://hal.inria.fr/inria-00451106/PDF/hpwp2010.pdf:PDF",
    	hal_id = "inria-00451106",
    	language = "Anglais",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.inria.fr/inria-00451106/en/"
    }
    
  9. Nahid Emad, Olivier Delannoy and Makarem Dandouna. Numerical Library Reuse in Parallel and Distributed Platforms. In VECPAR 2010. 2010, 271-278. BibTeX

    @inproceedings{EDDa10,
    	author = "Emad, Nahid and Delannoy, Olivier and Dandouna, Makarem",
    	title = "{Numerical Library Reuse in Parallel and Distributed Platforms}",
    	booktitle = "{VECPAR 2010}",
    	year = "{2010}",
    	pages = "{271-278}",
    	owner = "MOIS",
    	timestamp = "2011.08.18"
    }
    
  10. Majed Chatti, S Yahia, Claude Timsit and Soraya Zertal. A Hypercube-Based NoC Routing Algorithm for Efficient All-to-All Communications in Embedded Image and Signal Processing Applications. In IEEE Conference on High Performance Computing and Simulation (HPCS). 2010. Best Short Paper Award. BibTeX

    @inproceedings{CYTZ10,
    	author = "Chatti, Majed and Yahia, S. and Timsit, Claude and Zertal, Soraya",
    	title = "{A Hypercube-Based NoC Routing Algorithm for Efficient All-to-All Communications in Embedded Image and Signal Processing Applications}",
    	booktitle = "{IEEE Conference on High Performance Computing and Simulation (HPCS)}",
    	year = "{2010}",
    	address = "{Caen}",
    	note = "{Best Short Paper Award}",
    	owner = "MOIS",
    	timestamp = "2011.11.04"
    }
    
  11. Sebastien Briais, Sid-Ahmed-Ali Touati and Karine Deschinkel. Ensuring Lexicographic-Positive Data Dependence Graphs in the SIRA Framework. 2010.
    Abstract {Usual cyclic scheduling problems, such as software pipelining, deal with precedence constraints having non-negative latencies. This seems a natural way for modelling scheduling problems, since instructions delays are generally non-negative quantities. However, in some cases, we need to consider edges latencies that do not only model instructions latencies, but model other precedence constraints. For instance in register optimisation problems, a generic machine model can allow considering access delays into/from registers (VLIW, EPIC, DSP). In this case, edge latencies may be non-positive leading to a difficult scheduling problem in presence of resources constraints. This research report studies the problem of cyclic instruction scheduling with register requirement minimisation (without resources constraints). We show that pre-conditioning a data dependence graph (DDG) to satisfy register constraints before software pipelining under resources constraint s may create cycles with non-positive distances, resulted from the acceptance of non-positive edges latencies. Such DDG is called ıt non lexicographic positive because it does not define a to pological sort between the instructions instances: in other words, its full unrolling does not define an acyclic graph. As a compiler construction strategy, we cannot allow thecreation of cycles with non-positive di stances during the compilation flow, because non lexicographic positive DDG does not guarantee the existence of a valid instruction schedule under resource constraints. This research report examines two strategies to avoid the creation of these problematic DDG cycles. A first strategy is reactive, it tolerates the creation of non-positive cycles in a first step, and if detected in a further check step, makes a backtrack to eliminate them. A second strategy is proactive, it prevents the creation of non-positive cycles in the DDG during the register minimisation process. Our extensive experiments on FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks show that the reactive strategy is faster and works well in practice, but may require more registers than the proactive strategy. Consequently, the reactive strategy is a suitable working solution for compilation if the number of available architectural registers is already fixed and register minimisation is not necessary (just consume less registers than the available capacity). However, the proactive strategy, while more time consuming, is a better alternative for register requirement minimisation: this may be the case when dealing with reconfigurable architectures, i.e. when the nu mber of available architectural registers is defined posterior to the compilation of the application.} URL BibTeX

    @techreport{BTDe10,
    	author = "Briais, Sebastien and Touati, Sid-Ahmed-Ali and Deschinkel, Karine",
    	title = "{Ensuring Lexicographic-Positive Data Dependence Graphs in the SIRA Framework}",
    	year = "{2010}",
    	month = "",
    	abstract = "{Usual cyclic scheduling problems, such as software pipelining, deal with precedence constraints having non-negative latencies. This seems a natural way for modelling scheduling problems, since instructions delays are generally non-negative quantities. However, in some cases, we need to consider edges latencies that do not only model instructions latencies, but model other precedence constraints. For instance in register optimisation problems, a generic machine model can allow considering access delays into/from registers (VLIW, EPIC, DSP). In this case, edge latencies may be non-positive leading to a difficult scheduling problem in presence of resources constraints. This research report studies the problem of cyclic instruction scheduling with register requirement minimisation (without resources constraints). We show that pre-conditioning a data dependence graph (DDG) to satisfy register constraints before software pipelining under resources constraint s may create cycles with non-positive distances, resulted from the acceptance of non-positive edges latencies. Such DDG is called {\it non lexicographic positive} because it does not define a to pological sort between the instructions instances: in other words, its full unrolling does not define an acyclic graph. As a compiler construction strategy, we cannot allow thecreation of cycles with non-positive di stances during the compilation flow, because non lexicographic positive DDG does not guarantee the existence of a valid instruction schedule under resource constraints. This research report examines two strategies to avoid the creation of these problematic DDG cycles. A first strategy is reactive, it tolerates the creation of non-positive cycles in a first step, and if detected in a further check step, makes a backtrack to eliminate them. A second strategy is proactive, it prevents the creation of non-positive cycles in the DDG during the register minimisation process. Our extensive experiments on FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks show that the reactive strategy is faster and works well in practice, but may require more registers than the proactive strategy. Consequently, the reactive strategy is a suitable working solution for compilation if the number of available architectural registers is already fixed and register minimisation is not necessary (just consume less registers than the available capacity). However, the proactive strategy, while more time consuming, is a better alternative for register requirement minimisation: this may be the case when dealing with reconfigurable architectures, i.e. when the nu mber of available architectural registers is defined posterior to the compilation of the application.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI - Laboratoire d'Informatique de Franche-Comt{\'e} - LIFC - Universit{\'e} de Franche-Comt{\'e} : EA4269",
    	collaboration = "PRiSM-INRIA",
    	file = "_negcycle.pdf:http\://hal.inria.fr/inria-00452695/PDF/main\\_report\\_negcycle.pdf:PDF",
    	hal_id = "inria-00452695",
    	keywords = "Compilation, Code optimisation, Register pressure, Cyclic instruction scheduling, Instruction level parallelism",
    	language = "Anglais",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.inria.fr/inria-00452695/en/"
    }
    
  12. Frederic Brault, Benoit Dupont-De-Dinechin, Sid-Ahmed-Ali Touati and Albert Cohen. Software Pipelining and Register Pressure in VLIW Architectures: Preconditionning Data Dependence Graphs is Experimentally Better Than Lifetime-Sensitive Scheduling. In 8th Workshop on Optimizations for DSP and Embedded Systems (ODES'10). 2010.
    Abstract Embedding register-pressure control in software pipelining heuristics is the dominant approach in modern back-end compilers. However, aggressive attempts at combining resource and register constraints in software pipelining have failed to scale to real-life loops, leaving weaker heuristics as the only practical solutions. We propose a decoupled approach where register pressure is controlled before scheduling, and evaluate its effectiveness in combination with three representative software pipelining algorithms. We present conclusive experiments in a production compiler on a wealth of media processing and general purpose benchmarks. URL BibTeX

    @inproceedings{BDB+10,
    	author = "Brault, Frederic and Dupont-De-Dinechin, Benoit and Touati, Sid-Ahmed-Ali and Cohen, Albert",
    	title = "{Software Pipelining and Register Pressure in VLIW Architectures: Preconditionning Data Dependence Graphs is Experimentally Better Than Lifetime-Sensitive Scheduling}",
    	booktitle = "{8th Workshop on Optimizations for DSP and Embedded Systems (ODES'10)}",
    	year = "{2010}",
    	address = "{Toronto, Canada}",
    	month = "{} } # Apr # { {}",
    	abstract = "{Embedding register-pressure control in software pipelining heuristics is the dominant approach in modern back-end compilers. However, aggressive attempts at combining resource and register constraints in software pipelining have failed to scale to real-life loops, leaving weaker heuristics as the only practical solutions. We propose a decoupled approach where register pressure is controlled before scheduling, and evaluate its effectiveness in combination with three representative software pipelining algorithms. We present conclusive experiments in a production compiler on a wealth of media processing and general purpose benchmarks.}",
    	affiliation = "ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI - Kalray - Kalray - Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	audience = "internationale",
    	file = "SubmitODES2010.pdf:http\://hal.inria.fr/inria-00551515/PDF/SubmitODES2010.pdf:PDF",
    	hal_id = "inria-00551515",
    	language = "Anglais",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.inria.fr/inria-00551515/en/"
    }
    
  13. Marouane Belaoucha, Denis Barthou, Adrien Eliche and Sid-Ahmed-Ali Touati. FADAlib: an open source C++ library for fuzzy array dataflow analysis. In Procedia Computer Science. 2010, 2075-2084. URL BibTeX

    @inproceedings{BBET10,
    	author = "Belaoucha, Marouane and Barthou, Denis and Eliche, Adrien and Touati, Sid-Ahmed-Ali",
    	title = "{FADAlib: an open source C++ library for fuzzy array dataflow analysis}",
    	booktitle = "{Procedia Computer Science}",
    	year = "{2010}",
    	pages = "{2075-2084}",
    	address = "{Amsterdam, Pays-Bas}",
    	month = "{} } # May # { {}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI",
    	audience = "internationale",
    	hal_id = "hal-00551673",
    	language = "Anglais",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.archives-ouvertes.fr/hal-00551673/en/"
    }
    
  14. Mounira Bachir. Minimosation du facteur de déroulage de boucle dans une allocation périodique de registres. Uiversité de Versailles Saint-Quentin en Yvelines UFR des Sciences - bâtiment Descartes Laboratoire Prism - salle 301 45 avenue des Etats-Unis 78035 Versailles cedex, 2010. BibTeX

    @phdthesis{Bach10,
    	author = "Bachir, Mounira",
    	title = "{Minimosation du facteur de déroulage de boucle dans une allocation périodique de registres}",
    	year = "{2010}",
    	address = "Uiversité de Versailles Saint-Quentin en Yvelines UFR des Sciences - bâtiment Descartes Laboratoire Prism - salle 301 45 avenue des Etats-Unis 78035 Versailles cedex",
    	owner = "MOIS",
    	timestamp = "2012.02.22"
    }
    
  15. Jean-Christian Angles D'Auriac, Denis Barthou, Damir Becirevic, Rene Bilhaut, François Bodin, Philippe Boucaud, Olivier Brand-Foissac, Jaume Carbonell, Christine Eisenbeis, P Gallard, Gilbert Grosdidier, P Guichon, P F Honore, G Le Meur, P Pene, L Rilling, P Roudeau, André Seznec and A Stocchi. Towards the Petaflop for Lattice QCD Simulations the PetaQCD Project. In J Gruntorad and M Lokajicek (eds.). Journal of Physics Conference Series 219. 2010, 052021.
    Abstract The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice QCD simulations started in early '08 among a consortium of 7 laboratories (IN2P3, CNRS, INRIA, CEA) and 2 SMEs. This consortium received a grant from the French ANR agency in July '08, and the PetaQCD project kickoff took place in January '09. Building upon several years of fruitful collaborative studies in this area, the aim of this project is to demonstrate that the simulation of a 256 x 1283 lattice can be achieved through the HMC/ETMC software, using a machine with efficient speed/cost/reliability/power consumption ratios. It is expected that this machine can be built out of a rather limited number of processors (e.g. between 1000 and 4000), although capable of a sustained petaflop CPU performance. The proof-of-concept should be a mock-up cluster built as much as possible with off-the-shelf components, and 2 particularly attractive axis will be mainly investigated, in addition to fast all-purpose multi-core processors: the use of the new brand of IBM-Cell processors (with on-chip accelerators) and the very recent Nvidia GP-GPUs (off-chip co-processors). This cluster will obviously be massively parallel, and heterogeneous. Communication issues between processors, implied by the Physics of the simulation and the lattice partitioning, will certainly be a major key to the project. URL, DOI BibTeX

    @inproceedings{ABB+10,
    	author = "Angles D'Auriac, Jean-Christian and Barthou, Denis and Becirevic, Damir and Bilhaut, Rene and Bodin, François and Boucaud, Philippe and Brand-Foissac, Olivier and Carbonell, Jaume and Eisenbeis, Christine and Gallard, P. and Grosdidier, Gilbert and Guichon, P. and Honore, P.F. and Le Meur, G. and Pene, P. and Rilling, L. and Roudeau, P. and Seznec, André and Stocchi, A.",
    	title = "{Towards the Petaflop for Lattice QCD Simulations the PetaQCD Project}",
    	booktitle = "{Journal of Physics Conference Series}",
    	year = "{2010}",
    	editor = "Gruntorad, J. and Lokajicek, M.",
    	volume = 219,
    	pages = 052021,
    	address = "Prague, Tch{\`e}que, R{\'e}publique",
    	publisher = "IOP Publishing",
    	abstract = "{The study and design of a very ambitious petaflop cluster exclusively dedicated to Lattice QCD simulations started in early '08 among a consortium of 7 laboratories (IN2P3, CNRS, INRIA, CEA) and 2 SMEs. This consortium received a grant from the French ANR agency in July '08, and the PetaQCD project kickoff took place in January '09. Building upon several years of fruitful collaborative studies in this area, the aim of this project is to demonstrate that the simulation of a 256 x 1283 lattice can be achieved through the HMC/ETMC software, using a machine with efficient speed/cost/reliability/power consumption ratios. It is expected that this machine can be built out of a rather limited number of processors (e.g. between 1000 and 4000), although capable of a sustained petaflop CPU performance. The proof-of-concept should be a mock-up cluster built as much as possible with off-the-shelf components, and 2 particularly attractive axis will be mainly investigated, in addition to fast all-purpose multi-core processors: the use of the new brand of IBM-Cell processors (with on-chip accelerators) and the very recent Nvidia GP-GPUs (off-chip co-processors). This cluster will obviously be massively parallel, and heterogeneous. Communication issues between processors, implied by the Physics of the simulation and the lattice partitioning, will certainly be a major key to the project.}",
    	affiliation = "Laboratoire de Physique Subatomique et de Cosmologie - LPSC - CNRS : UMR5821 - IN2P3 - Universit{\'e} Joseph Fourier - Grenoble I - Institut Polytechnique de Grenoble - Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - Laboratoire de Physique Th{\'e}orique d'Orsay - LPT - CNRS : UMR8627 - Universit{\'e} Paris Sud - Paris XI - Laboratoire de l'Acc{\'e}l{\'e}rateur Lin{\'e}aire - LAL - CNRS : UMR8607 - IN2P3 - Universit{\'e} Paris Sud - Paris XI - Institut de Recherches sur les lois Fondamentales de l'Univers (ex DAPNIA) - IRFU - CEA : DSM/IRFU - ALF - INRIA - IRISA - INRIA - Universit{\'e} de Rennes I",
    	audience = "internationale",
    	doi = "10.1088/1742-6596/219/5/052021",
    	hal_id = "in2p3-00380246",
    	owner = "MOIS",
    	timestamp = "2011.07.25",
    	url = "http://hal.in2p3.fr/in2p3-00380246/en/"
    }
    

  1. Soraya Zertal, Claude Timsit and Majed Chatti. Communication / Synchronisation Mechanism for Multiprocessor on Chip Architectures. In IEEE Workshop on Performance Evaluation of Communications in Distributed systems and Web Based Service Architectures. 2009. BibTeX

    @inproceedings{ZTCh09,
    	author = "Zertal, Soraya and Timsit, Claude and Chatti, Majed",
    	title = "{Communication / Synchronisation Mechanism for Multiprocessor on Chip Architectures}",
    	booktitle = "{IEEE Workshop on Performance Evaluation of Communications in Distributed systems and Web Based Service Architectures}",
    	year = "{2009}",
    	owner = "MOIS",
    	timestamp = "2011.10.19"
    }
    
  2. Sid Touati. Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques. 2009, 12 pages.
    Abstract {The community of program optimisation and analysis, code performance evaluation, parallelisation and optimising compilation has published since many decades hundreds of research and engineering articles in major conferences and journals. These articles study efficient algorithms, strategies and techniques to accelerate programs execution times, or optimise other performance metrics (MIPS, code size, energy/power, MFLOPS, etc.). Many speedups are published, but nobody is able to reproduce them exactly. The non-reproducibility of our research results is a dark point of the art, and we cannot be qualified as ıt computer scientists if we do not provide rigorous experimental methodology. This article provides a first effort towards a correct statistical protocol for analysing and measuring speedups. As we will see, some common mistakes are done by the community inside published articles, explaining part of the non-reproducibility of the results. Our current article is not sufficient by its own to deliver a complete experimental methodology, further efforts must be done by the community to decide about a common protocol for our future experiences. Anyway, our community should take care about the aspect of reproducibility of the results in the future.} URL BibTeX

    @unpublished{Toua09,
    	author = "Touati, Sid",
    	title = "{Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques}",
    	note = "{12 pages}",
    	month = "",
    	year = "{2009}",
    	abstract = "{The community of program optimisation and analysis, code performance evaluation, parallelisation and optimising compilation has published since many decades hundreds of research and engineering articles in major conferences and journals. These articles study efficient algorithms, strategies and techniques to accelerate programs execution times, or optimise other performance metrics (MIPS, code size, energy/power, MFLOPS, etc.). Many speedups are published, but nobody is able to reproduce them exactly. The non-reproducibility of our research results is a dark point of the art, and we cannot be qualified as {\it computer scientists} if we do not provide rigorous experimental methodology. This article provides a first effort towards a correct statistical protocol for analysing and measuring speedups. As we will see, some common mistakes are done by the community inside published articles, explaining part of the non-reproducibility of the results. Our current article is not sufficient by its own to deliver a complete experimental methodology, further efforts must be done by the community to decide about a common protocol for our future experiences. Anyway, our community should take care about the aspect of reproducibility of the results in the future.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	file = "_eng.pdf:http\://hal.archives-ouvertes.fr/hal-00356529/PDF/stat\\_eval\\_perf\\_eng.pdf:PDF",
    	hal_id = "hal-00356529",
    	keywords = "Program optimisation ; Statistical Performance Evaluation ; Performance; Measurement, Experimentation",
    	language = "Anglais",
    	url = "http://hal.archives-ouvertes.fr/hal-00356529/en/"
    }
    
  3. Sid-Ahmed-Ali Touati. Cyclic Task Scheduling with Storage Requirement Minimization under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities. 2009. This is a continuation work to SIRA (Sid-Ahmed-Ali Touati and Christine Eisenbeis. Early Periodic Register Allocation on ILP Processors. Parallel Processing Letters, Vol. 14, No. 2, June 2004. World Scientific.). We exetend that work with new heuristics and experimental results..
    Abstract {In this report, we study the exact and an approximate formulation of the general problem of one-dimensional periodic task scheduling under storage requirement, irrespective of machine constraints. We rely on the SIRA theoretical framework that allows an optimisation of periodic storage requirement \citeTouati:PPL:04. SIRA is based on inserting some storage dependence arcs (ıt storage reuse arcs) labeled with ıt reuse distances directly on the data dependence graph. In this new graph, we are able to bound the storage requirement measured as the exact number of necessary storage locations. The determination of storage and distance reuse is parametrised by the desired minimal scheduling period (respectively maximal execution throughput) as well as by the storage requirement constraints - either can be minimised while the other one is bounded, or alternatively, both are bounded \citesiralina07,RR-INRIA-HAL-00436348. This report recalls our fundamental results on this problem, and proposes new experimental heuristics. We typically show how we can deal with some specific storage architectural constraints such as buffers and rotating storage facilities.} URL BibTeX

    @techreport{Toua09a,
    	author = "Touati, Sid-Ahmed-Ali",
    	title = "{Cyclic Task Scheduling with Storage Requirement Minimization under Specific Architectural Constraints: Case of Buffers and Rotating Storage Facilities}",
    	year = "{2009}",
    	note = "{This is a continuation work to SIRA (Sid-Ahmed-Ali Touati and Christine Eisenbeis. Early Periodic Register Allocation on ILP Processors. Parallel Processing Letters, Vol. 14, No. 2, June 2004. World Scientific.). We exetend that work with new heuristics and experimental results.}",
    	abstract = "{In this report, we study the exact and an approximate formulation of the general problem of one-dimensional periodic task scheduling under storage requirement, irrespective of machine constraints. We rely on the SIRA theoretical framework that allows an optimisation of periodic storage requirement \cite{Touati:PPL:04}. SIRA is based on inserting some storage dependence arcs ({\it storage reuse} arcs) labeled with {\it reuse distances} directly on the data dependence graph. In this new graph, we are able to bound the storage requirement measured as the exact number of necessary storage locations. The determination of storage and distance reuse is parametrised by the desired minimal scheduling period (respectively maximal execution throughput) as well as by the storage requirement constraints - either can be minimised while the other one is bounded, or alternatively, both are bounded \cite{siralina07,RR-INRIA-HAL-00436348}. This report recalls our fundamental results on this problem, and proposes new experimental heuristics. We typically show how we can deal with some specific storage architectural constraints such as buffers and rotating storage facilities.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines - ALCHEMY - INRIA Saclay - Ile de France - INRIA - CNRS : UMR8623 - Universit{\'e} Paris Sud - Paris XI",
    	file = "PSSR.pdf:http\://hal.inria.fr/inria-00440446/PDF/PSSR.pdf:PDF",
    	hal_id = "inria-00440446",
    	keywords = "Task scheduling, Storage requirement, Periodic scheduling, Task parallelism",
    	language = "Anglais",
    	url = "http://hal.inria.fr/inria-00440446/en/"
    }
    
  4. Ling Shang, Serge Petiton, Nahid Emad, Xiaolin Yang and Zhijian Wang. Extending YML to Be a Middleware for Scientific Cloud Computing. In CloudCom 2009. 2009, 662-667. BibTeX

    @inproceedings{SPE+09,
    	author = "Shang, Ling and Petiton, Serge and Emad, Nahid and Yang, Xiaolin and Wang, Zhijian",
    	title = "{Extending YML to Be a Middleware for Scientific Cloud Computing}",
    	booktitle = "{CloudCom 2009}",
    	year = "{2009}",
    	pages = "{662-667}",
    	owner = "MOIS",
    	timestamp = "2011.08.18"
    }
    
  5. Laurent Plagne, Frank Hulsemann, Denis Barthou and Julien Jaeger. Parallel expression template for large vectors. In Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific. 2009, 8:1-8:8. URL BibTeX

    @inproceedings{PHBJ09,
    	author = "Plagne, Laurent and Hulsemann, Frank and Barthou, Denis and Jaeger, Julien",
    	title = "{Parallel expression template for large vectors}",
    	booktitle = "{Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific}",
    	year = "{2009}",
    	pages = "{8:1-8:8}",
    	address = "{Genova, Italie}",
    	month = "{} } # Jul # { {}",
    	affiliation = "EDF R\&D - EDF - Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	audience = "internationale",
    	hal_id = "hal-00551682",
    	keywords = "C++ template, OpenMP, expression template, intel TBB, multi-core processors, parallel computing",
    	language = "Anglais",
    	url = "http://hal.archives-ouvertes.fr/hal-00551682/en/"
    }
    
  6. Souad Koliai, Stephane Zuckerman, Emmnuel Oseret, Mickael Ivascot, Tipp Moseley, Ding Quang and William Jalby. A Balanced Approach to Application Performance Tuning. In LCPC' 09: Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing. 2009, 111-125. BibTeX

    @inproceedings{KZO+09,
    	author = "Koliai, Souad and Zuckerman, Stephane and Oseret, Emmnuel and Ivascot, Mickael and Moseley, Tipp and Quang, Ding and Jalby, William",
    	title = "{A Balanced Approach to Application Performance Tuning}",
    	booktitle = "{LCPC' 09: Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing}",
    	year = "{2009}",
    	pages = "{111-125}",
    	publisher = "{Springer-Verlag}",
    	owner = "MOIS",
    	timestamp = "2011.07.28"
    }
    
  7. Minhaj Ahmad Khan, Henri-Pierre Charles and Denis Barthou. Improving Performance of Optimized Kernels Through Fast Instantiations of Templates. Concurrency and Computation: Practice and Experience (21):59-70, 2009. BibTeX

    @article{KCBa11,
    	author = "Khan, Minhaj Ahmad and Charles, Henri-Pierre and Barthou, Denis",
    	title = "{Improving Performance of Optimized Kernels Through Fast Instantiations of Templates}",
    	journal = "{Concurrency and Computation: Practice and Experience}",
    	year = "{2009}",
    	pages = "{59-70}",
    	number = 21,
    	owner = "MOIS",
    	timestamp = "2011.08.19"
    }
    
  8. Richard Dusseaux, K Ait Braham and Nahid Emad. Eigenvalue System for the Scattering from Rough Surfaces – Saving in Computation Time by a Physical Approach. Optics Communications 282(18):3820-3826, 2009. URL, DOI BibTeX

    @article{DAEm09,
    	author = "Dusseaux, Richard and Ait Braham, K. and Emad, Nahid",
    	title = "{Eigenvalue System for the Scattering from Rough Surfaces -- Saving in Computation Time by a Physical Approach}",
    	journal = "{Optics Communications}",
    	year = "{2009}",
    	volume = "{282}",
    	pages = "{3820-3826}",
    	number = "{18}",
    	affiliation = "Institut Pierre-Simon-Laplace - IPSL - CNRS : FR636 - IRD - CEA - CNES - INSU - Universit{\'e} Pierre et Marie Curie - Paris VI - Universit{\'e} de Versailles-Saint Quentin en Yvelines - Ecole Normale Sup{\'e}rieure de Paris - ENS Paris - Laboratoire Atmosph{\`e}res, Milieux, Observations Spatiales - LATMOS - CNRS : UMR8190 - Universit{\'e} Pierre et Marie Curie - Paris VI - Universit{\'e} de Versailles-Saint Quentin en Yvelines - INSU - Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	audience = "internationale",
    	doi = "10.1016/j.optcom.2009.06.010",
    	hal_id = "hal-00410092",
    	keywords = "Scattering; Rough surfaces; C method; Eigenvalue system; Beam simulation method; Huygens principle",
    	language = "Anglais",
    	url = "http://hal.archives-ouvertes.fr/hal-00410092/en/"
    }
    
  9. Lamia Djoudi, Vasil Khachazide and William Jalby. KBS-MAQAO: A Knowledge Based System for MAQAO Tool. In HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications. 2009, 571-578. BibTeX

    @inproceedings{DKJa09,
    	author = "Djoudi, Lamia and Khachazide, Vasil and Jalby, William",
    	title = "{KBS-MAQAO: A Knowledge Based System for MAQAO Tool}",
    	booktitle = "{HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications}",
    	year = "{2009}",
    	pages = "{571-578}",
    	publisher = "{IEEE Computer Society}",
    	owner = "MOIS",
    	timestamp = "2011.07.28"
    }
    
  10. Lamia Djoudi, Jean-Thomas Acquaviva and Denis Barthou. Compositional Approach applied to Loop Specialization. Concurrency and Computation: Practice and Experience 21(1):71-84, 2009. URL BibTeX

    @article{DAB+09,
    	author = "Djoudi, Lamia and Acquaviva, Jean-Thomas and Barthou, Denis",
    	title = "{Compositional Approach applied to Loop Specialization}",
    	journal = "Concurrency and Computation: Practice and Experience",
    	year = "{2009}",
    	volume = "{21}",
    	pages = "{71-84}",
    	number = "{1}",
    	month = "{} } # Jan # { {}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	audience = "internationale",
    	hal_id = "hal-00575934",
    	language = "Anglais",
    	url = "http://hal.archives-ouvertes.fr/hal-00575934/en/"
    }
    
  11. Laurent Choy, Olivier Delannoy, Nahid Emad and Serge Petiton. Federation and Abstraction of Heterogeneous Global Computing Platforms with the YML Framework. In CISIS 2009. 2009, 451-456. BibTeX

    @inproceedings{CDEP09,
    	author = "Choy, Laurent and Delannoy, Olivier and Emad, Nahid and Petiton, Serge",
    	title = "{Federation and Abstraction of Heterogeneous Global Computing Platforms with the YML Framework}",
    	booktitle = "{CISIS 2009}",
    	year = "{2009}",
    	pages = "{451-456}",
    	owner = "MOIS",
    	timestamp = "2011.07.29"
    }
    
  12. Sebastien Briais and Sid-Ahmed-Ali Touati. Schedule-Sensitive Register Pressure Reduction in Innermost Loops, Basic Blocks and Super-Blocks. 2009.
    Abstract {This report makes a massive experimental study of an efficient heuristic for the SIRA framework \citesira04. The heuristic, called SIRALINA \citesiralina07, bounds the register requirement of a data dependence graph before instruction scheduling under resource constraints. Our aim is to guarantee the absence of spilling before any instruction scheduling process, without hurting instruction level parallelism if possible. Our register pressure reduction methods are sensitive for both software pipelining (innermost loops) and acyclic scheduling (basic blocks and super-blocks). The SIRALINA method that we experiment in this report is shown efficient in terms of compilation times, in terms of register requirement reduction and in terms of shorted schedule increase. Our experiments are done on thousands standalone DDG extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. We consider processor architectures with multiple register type and we model delayed access times to registers. Our register pressure reduction method is distributed as a C independent library (\textttSIRAlib.} URL BibTeX

    @techreport{BrTo09,
    	author = "Briais, Sebastien and Touati, Sid-Ahmed-Ali",
    	title = "{Schedule-Sensitive Register Pressure Reduction in Innermost Loops, Basic Blocks and Super-Blocks}",
    	year = "{2009}",
    	abstract = "{This report makes a massive experimental study of an efficient heuristic for the SIRA framework \cite{sira04}. The heuristic, called SIRALINA \cite{siralina07}, bounds the register requirement of a data dependence graph before instruction scheduling under resource constraints. Our aim is to guarantee the absence of spilling before any instruction scheduling process, without hurting instruction level parallelism if possible. Our register pressure reduction methods are sensitive for both software pipelining (innermost loops) and acyclic scheduling (basic blocks and super-blocks). The SIRALINA method that we experiment in this report is shown efficient in terms of compilation times, in terms of register requirement reduction and in terms of shorted schedule increase. Our experiments are done on thousands standalone DDG extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. We consider processor architectures with multiple register type and we model delayed access times to registers. Our register pressure reduction method is distributed as a C independent library (\texttt{SIRAlib}.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	file = "_report.pdf:http\://hal.inria.fr/inria-00436348/PDF/main\\_siralina\\_report.pdf:PDF",
    	hal_id = "inria-00436348",
    	keywords = "Compilation, Code optimisation, Register pressure, Instruction level parallelism",
    	language = "Anglais",
    	pages = 53,
    	url = "http://hal.inria.fr/inria-00436348/en/"
    }
    
  13. Sebastien Briais and Sid-Ahmed-Ali Touati. Experimental Study of Register Saturation in Basic Blocks and Super-Blocks: Optimality and heuristics. 2009. Experimental data and free software are included (made public).
    Abstract {Register saturation (RS) is the exact maximal register need of all valid schedules of a data dependence graph \citeTouati:RSILP05. Its optimal computation is NP-complete. This report proposes two variants of heuristics for computing the acyclic RS of directed acyclic graphs (DAG). The first one improves the previous greedy-k heuristic \citeTouati:RSILP05 in terms of approximating the RS with equivalent computation times. The second heuristic is faster, has better RS approximation than greedy-k, but scarifies the computation of saturating values. In order to evaluate the efficiency of these two heuristics, we designed an optimal combinatorial algorithm computing the optimal RS for tractable cases, which turns out to be satisfactory in practice. Extensive experiments have been conducted on thousands of data dependence graphs extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. Numerical results are presented to demonstrate the efficiency of the two proposed heuristics, so hence they can replace the greedy-k heuristic presented in \citeTouati:RSILP05. Our RS computation methods are distributed as a C independent library (\textttRSlib) under LGPL licence.} URL BibTeX

    @techreport{BrTo09a,
    	author = "Briais, Sebastien and Touati, Sid-Ahmed-Ali",
    	title = "{Experimental Study of Register Saturation in Basic Blocks and Super-Blocks: Optimality and heuristics}",
    	year = "{2009}",
    	note = "{Experimental data and free software are included (made public)}",
    	abstract = "{Register saturation (RS) is the exact maximal register need of all valid schedules of a data dependence graph \cite{Touati:RSILP05}. Its optimal computation is NP-complete. This report proposes two variants of heuristics for computing the acyclic RS of directed acyclic graphs (DAG). The first one improves the previous greedy-k heuristic \cite{Touati:RSILP05} in terms of approximating the RS with equivalent computation times. The second heuristic is faster, has better RS approximation than greedy-k, but scarifies the computation of saturating values. In order to evaluate the efficiency of these two heuristics, we designed an optimal combinatorial algorithm computing the optimal RS for tractable cases, which turns out to be satisfactory in practice. Extensive experiments have been conducted on thousands of data dependence graphs extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. Numerical results are presented to demonstrate the efficiency of the two proposed heuristics, so hence they can replace the greedy-k heuristic presented in \cite{Touati:RSILP05}. Our RS computation methods are distributed as a C independent library (\texttt{RSlib}) under LGPL licence.}",
    	affiliation = "Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM - CNRS : UMR8144 - Universit{\'e} de Versailles-Saint Quentin en Yvelines",
    	file = "_report.pdf:http\://hal.inria.fr/inria-00431103/PDF/main\\_RS\\_report.pdf:PDF",
    	hal_id = "inria-00431103",
    	keywords = "Compilation, Code optimisation, Register saturation, Instruction level parallelism",
    	language = "Anglais",
    	pages = 33,
    	url = "http://hal.inria.fr/inria-00431103/en/"
    }
    
  14. Samir Ammenouche, Sid-Ahmed-Ali Touati and William Jalby. On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors. In HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications. 2009, 196-205. BibTeX

    @inproceedings{AmTo10,
    	author = "Ammenouche, Samir and Touati, Sid-Ahmed-Ali and Jalby, William",
    	title = "{On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors}",
    	booktitle = "{HPCC' 09: Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications}",
    	year = "{2009}",
    	pages = "{196-205}",
    	publisher = "{IEEE Computer Society}",
    	owner = "MOIS",
    	timestamp = "2011.07.28"
    }
    

 Imprimer  E-mail

DMC Firewall is developed by Dean Marshall Consultancy Ltd