Uday Kumar Reddy Bondhugula Professor, Mindtree Chair Phone: +91-80-2293-3249 Dept of Computer Science and Automation Fax: +91-80-2360-2911 Indian Institute of Science Email: udayb@iisc.ac.in Bengaluru 560012 INDIA Web: http://www.csa.iisc.ac.in/~udayb ------------------------------------------------------------------------------ Name as it appears on all publications: Uday Bondhugula Research Interests Compilation and parallelization for multicores, accelerators, and domain-specific hardware; high-performance domain-specific languages and compilers; automatic parallelization; polyhedral framework; MLIR. Domains of interest: high-performance AI, deep learning, stencils, and dense linear algebra. Education - Ph.D., Computer Science & Engineering Sep '04 - Aug '08 The Ohio State University (OSU) Columbus, OH, USA Thesis: Effective Automatic Parallelization and Locality Optimization using the Polyhedral Framework Advisor: Prof. P. Sadayappan - Bachelor of Technology, Computer Science & Engineering Jul 2004 Indian Institute of Technology (IIT), Madras. Chennai, India Professional Experience - Professor Sep 2023 - present Mindtree Chair May 2022 - present Department of Computer Science and Automation Indian Institute of Science Bangalore, India - Founder, CEO and CTO May 2019 - present PolyMage Labs Bangalore, India ML/AI Compiler Startup - Associate Professor Dec 2016 - Aug 2020, May 2022 - Sep 2023 Department of Computer Science and Automation Indian Institute of Science Bangalore, India (on leave) Sep 2020 - Apr 2022 - Visiting Researcher Mar 2018 - Mar 2019 Google Brain team Google Mountain View, California, USA - Assistant Professor Jan 2011 - Dec 2016 Department of Computer Science and Automation Indian Institute of Science Bangalore, India - Postdoctoral Research Scientist Oct 2008 - Dec 2010 Advanced Compiler Technologies IBM T.J. Watson Research Center Yorktown Heights, New York - Visiting Researcher Mar 2008 - May 2008 ALCHEMY team INRIA Futurs (INRIA Saclay), Ile de France Orsay, FRANCE - Research Intern Jun 2007 - Sep 2007 Advanced Compilation Technologies IBM T.J. Watson Research Center Yorktown Heights, NY - Graduate Research Associate Apr'05 - Jun'07, Oct'07 - Aug '08 Dept. of CSE The Ohio State University Columbus, OH, USA Automatic parallelization, polyhedral model, loop nest optimization - Graduate Teaching Associate Sep 2004 - Mar 2005 Department of CSE, OSU Columbus, OH, USA Instructor for CSE 459.21 'Programming in C', CSE 459.23 'Programming in Java'. - Summer Intern May 2003 - Jul 2003 Trilogy Software Inc. Bangalore, India Publications 1. HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description Kingshuk Majumder and Uday Bondhugula. ASPLOS 2024 (to appear). 2. Treebeard: An Optimizing Compiler for Decision Tree-Based ML Inference Ashwin Prasad, Sampath Rajendra, Kaushik Rajan, R Govindarajan, Uday Bondhugula. IEEE/ACM International Symposium on Microarchitectures (MICRO), Oct 2022. 3. MLIR-Based Code Generation for GPU Tensor Cores Navdeep Katel, Vivek Khandelwal, and Uday Bondhugula. ACM/IEEE International conference on Compiler Construction (CC), Apr 2022. 4. A Practical Tile Size Selection Model for Affine Loop Nests Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, and Uday Bondhugula. ACM International Conference on Supercomputing (ICS'21), Jun 2021. 5. MLIR: Scaling Compiler Infrastructure for Domain-Specific Computation Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. ACM CGO 2021. 6. An Effective Fusion and Tile Size Model for PolyMage Abhinav Jangda and Uday Bondhugula ACM Transactions on Programming Languages and Systems (TOPLAS), 42, 3, Article 12, 27 pages, Nov 2020. 7. Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems Karan Aggarwal and Uday Bondhugula ACM Transactions on Parallel Computing, Nov 2020. 8. Effective Loop Fusion in Polyhedral Compilation using Fusion Conflict Graphs Aravind Acharya, Uday Bondhugula, Albert Cohen. ACM Transactions on Architecture and Code Optimization (TACO), Sep 2020. 9. Bitwidth Customization in Image Processing Pipelines using Interval Analysis and SMT Solvers Suresh Purini, Vinamra Benara, Ziaul Chowdhury, Uday Bondhugula. ACM SIGPLAN International Conference on Compiler Construction (CC), Feb 2020. 10. Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems Karan Aggarwal, Uday Bondhugula International Conference on Supercomputing (ICS), Jun 2019. 11. Polyhedral Auto-Transformation with No Integer Linear Programming Aravind Acharya, Uday Bondhugula, Albert Cohen ACM SIGPLAN PLDI 2018. 12. An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines Abhinav Jangda, Uday Bondhugula ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2018 (to appear). Artifact evaluated (reusable and available). 13. Optimizing Geometric Multigrid Method Computation using a DSL Approach Vinay Vasista, Kumudha KN, Siddharth Bhat, Uday Bondhugula Supercomputing (SC), Nov 2017. 14. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations Uday Bondhugula, Vinayaka Bandishti, Irshad Pananilath IEEE Transactions on Parallel and Distributed Systems (TPDS), pgs 1285-1298, vol 27, issue 3, May 2017. 15. A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs Nitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2016), Sep 2016. 16. Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory Roshan Dathathri, Ravi Teja Mullapudi, Uday Bondhugula ACM Transactions on Parallel Computing, volume 3, issue 2, Jul 2016. 17. SMO: An Integrated Approach to Intra-Array and Inter-Array Storage Optimization Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Jan 2016. 18. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests Uday Bondhugula, Aravind Acharya, Albert Cohen ACM Transactions on Programming Languages and Systems, volume 38, issue 3, Apr 2016. 19. Automatic Storage Optimization for Arrays Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen ACM Transactions on Programming Languages and Systems (TOPLAS), volume 38, issue 3, Apr 2016. 20. An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations Irshad Pananilath, Aravind Acharya, Vinay Vasista, Uday Bondhugula, ACM Transactions on Architecture and Code Optimization (TACO), Volume 12 Issue 2, Article No. 14, Jul 2015. 21. PolyMage: Automatic Optimization for Image Processing Pipelines Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS 2015), Mar 2015. 22. Pluto+: Near-Complete Modeling of Affine Transformations for Parallelism and Locality Aravind Acharya, Uday Bondhugula ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2015. 23. Tiling and Optimizing Time-Iterated Computations over Periodic Domains Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2014), Aug 2014. Nominated for the best paper award. 24. Effective Automatic Computation Placement and Data allocation for Parallelization of Regular Programs Chandan Reddy, Uday Bondhugula ACM International Conference on Supercomputing (ICS), Jun 2014, Munich, Germany. 25. Automatic Data Allocation and Buffer Management for Multi-GPU Machines Thejas Ramashekar, Uday Bondhugula ACM Transactions on Architecture and Code Optimization, accepted Nov 2013 (also selected for presentation at HiPEAC '14, Jan 2014, Vienna) 26. Compiling Affine Loop Nests for Distributed-Memory Parallel Architectures Uday Bondhugula ACM/IEEE Supercomputing (SC '13), Nov 2013. 27. Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula International conference on Parallel Architectures and Compilation Techniques (PACT 2013), Sep 2013. 28. PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language Somashekar B, Uday Bondhugula International conference on Compiler Construction (CC 2013), Mar 2013, Rome, Italy. 29. Tiling Stencil Computations to Maximize Parallelism Vinayak Bandishti, Irshad Pananilath, and Uday Bondhugula ACM/IEEE Supercomputing (SC), Nov 2012, Utah, USA. 30. Loop Transformations: Convexity, Pruning, and Optimization. Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, P Sadayappan, and Nicolas Vasilache, ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), 2011. 31. Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework. Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, P Sadayappan, Supercomputing (SC) 2010. 32. A Model for Fusion and Code Motion in an Integrated Auto-Parallelizing Compiler Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and L. Renganarayana International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep 2010, Vienna, Austria. 33. Compact multi-dimensional kernel extraction for register tiling L. Renganarayana, Uday Bondhugula, Salem Derisavi, Alexandre E. Eichenberger, and Kevin O'Brien Supercomputing 2009 34. Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors M. Baskaran, N. Vydyanathan, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09), Feb 2009, Raleigh, North Carolina. 35. Data Layout Transformation for Enhancing Locality on NUCA Chip Multiprocessors Qingda Lu, Christophe Alias, Uday Bondhugula, Thomas Henretty, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan, Yongjian Chen, Haibo Lin, and Tin-fook Ngai. International Conference on Parallel Architectures and Compilation Techniques (PACT), 2009 36. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08), Jun 2008, Tucson, Arizona. ACM SIGPLAN Most Influential Paper Award in 2018. 37. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. International Conference on Compiler Construction (CC), Apr 2008, Budapest, Hungary. 38. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs M. Baskaran, Uday Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. ACM International Conference on Supercomputing (ICS'08), Jun 2008, Kos, Greece. 39. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories M. Baskaran, Uday Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. ACM SIGPLAN PPoPP'08, Feb 2008, Salt Lake City, Utah. 40. Automatic Mapping of Nested Loops to FPGAs Uday Bondhugula, J. Ramanujam, and P. Sadayappan. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '07), Mar 2007, San Jose, California. 41. Effective Automatic Parallelization of Stencil Computations S. Krishnamoorthy, M. Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07), Jun 2007, San Diego, California. 42. Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths Uday Bondhugula, A. Devulapalli, J. Dinan, J. Fernando, P. Wyckoff, E. Stahlberg, and P. Sadayappan. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Apr 2006, Napa Valley, California. 43. Parallel FPGA-based All-Pairs Shortest-Paths in a Directed Graph Uday Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff, and P. Sadayappan. 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Apr 2006, Rhodes, Greece. 44. High performance RDMA-based All-to-all Broadcast for InfiniBand Clusters S. Sur, Uday Bondhugula, A. Mamidala, H.-W. Jin, and D. K. Panda. 12th IEEE International Conference on High Performance Computing (HIPC '05), Dec 2005. Software/Tools 1. MLIR https://mlir.llvm.org/ Founding team member of the MLIR project, and co-developer of the early infrastructure, especially, the polyhedral/mid-level analysis and optimization infrastructure; open-sourced by Google in Apr 2019 and now an LLVM sub-project with high industry/community traction. The MLIR project was initiated to deliver the next generation optimizing compiler infrastructure with a focus on serving the computational demands of AI and machine learning programming models. At Google itself, one of the project's goals is to address the compiler challenges associated with the TensorFlow ecosystem. MLIR is a new intermediate representation designed to provide a unified, modular, and extensible infrastructure to progressively lower dataflow compute graphs, through loop nests potentially, to high-performance target-specific code. MLIR shares similarities with traditional CFG-based three-address SSA representations (including LLVM IR or Swift intermediate language), but also introduces notions from the polyhedral compiler framework as first class concepts to allow powerful analysis and transformation in the presence of loop nests and multi-dimensional arrays. MLIR supports multiple front- and back-ends and uses LLVM IR as one of its primary code generation targets. It is thus a very useful infrastructure for developing new compilers, especially to solve the compilation challenges involved in targeting emerging AI and machine learning programming languages/models to the plethora of specialized accelerator chips. 2. Pluto http://pluto-compiler.sourceforge.net I am the original and lead author of Pluto. Pluto is a source-to-source parallelization and optimization tool based on the polyhedral compiler framework. It can automatically optimize affine loop nests (sequences of imperfectly nested loops with regular data access patterns) for parallelism and locality using affine transformations. It can target both shared-memory multicore architectures (by generating code with OpenMP parallel pragmas) and distributed-memory architectures (by generating message passing MPI code). Pluto/Pluto+ is extensively used for advanced experimentation with loop optimization and parallelization, optimization of scientific stencil computations, and in university courses teaching loop transformations. 3. PolyMage http://mcl.csa.iisc.ernet.in/polymage.html PolyMage is a domain-specific language and compiler for automatic parallelization and optimization of image processing pipelines. PolyMage takes an image processing pipeline expressed by the user in a high-level language (embedded in Python) and generates a C++ implementation of the pipeline optimized using the polyhedral framework as the intermediate representation. It uses OpenCV for image I/O handling, islpy/ISL for integer set operations, 'cgen' for AST code generation and 'OpenMP' to mark parallel loops. PolyMage uses an asymmetric overlapped tiling technique (overlapped tiling extended for heterogeneous accesses and non-constant dependence vectors) to exploit locality and parallelism simultaneously. It uses a model-driven approach to automatically fuse image processing pipeline stages for tiling, and employs an in-built autotuner to find the best performing code within a small well-defined search space. Awards & Honors - Qualcomm Faculty Research Award 2022 - Awarded the Mindtree Chair position at the Department of CSA - Honorable Mention - ACM India Early Career Research Award 2020 - Cray APJ Abdul Kalam HPC award 2019 in the Young Researcher category - ACM SIGPLAN PLDI Most Influential Paper award in 2018 for PLDI 2008 paper - ACM SIGPLAN PLDI 2017 Distinguished Reviewer Award as PC member - Indian National Science Academy Medal for Young Scientists 2017 - Indian National Academy of Engineering Young Engineer Award 2016 - Awarded Indian Academy of Sciences Young Associate 2016--2019 - Google Faculty Research Award 2015 - Nominated for the best paper award at PACT 2014 for work on 'Tiling and Optimizing Time-Iterated Computations over Periodic Domains' - INRIA Associate Team award (2013--2015) on a worldwide competitive basis - Nominated for the ACM SIGPLAN doctoral dissertation award 2008 - ACM SIGPLAN Professional Activities Committee travel award for PLDI 2008 - All-India Rank 84 (top 0.06%) at the Indian Institutes of Technology Joint Entrance Examination (IIT-JEE) 2000, out of a total of about 1,27,000 candidates - Represented state of Andhra Pradesh, India at the Indian National Mathematical Olympiad in 1999 - Pratibha scholarship by the Govt of Andhra Pradesh (2000-2004) for performance at IIT-JEE 2000 - National Talent Search Exam (NTSE) scholarship (India) - 1998 Research Grants - DST/SERB EMR grant 2017-2020 - Google Faculty Research Award 2015 - INRIA 'Associate Team' award (2013--2015) with Albert Cohen (INRIA/ENS) - Gift from National Instruments in support of research on compiler optimizations for LabVIEW (2013--2015) - AMD research gift in support of research in the area of compilation for heterogeneous architectures (2011--) - NVIDIA CUDA research center award for 2012--2013 - Research grant from Intel labs, India (2013--2014) - Research grant from C-DAC, Bangalore (2013--2014) Students and Advising - Ph.D.: 2 (1 best CS thesis medal), 4 ongoing - Masters (Res.): 9 (4 best CS thesis medals), 1 (ongoing) - M-Tech (courses): 8 graduated, 1 ongoing Miscellaneous - Program committee member: ASPLOS 2024 (Spring, fall cycles), ASPLOS 2018, PLDI 2017, Supercomputing 2016, Compiler Construction 2016, PPoPP 2016, PPoPP 2012, IMPACT 2011--2016. Associate editor: ACM TACO; ERC: PLDI 2014. - Program chair: IMPACT 2012 - Reviewer for LCPC 2006, PPoPP 2007, ICS 2007, LCPC 2007, PACT 2009, GPGPU workshop 2010, PPoPP 2011, HPCA 2011, IMPACT 2011--2016, ACM TOPLAS, ACM TACO, IEEE TPDS, JPDC - Table Tennis: Ohio State University team (2007 - short while), IISc TT tournament champions 2013 (CSA team) - Football: IISc university tournament champions 2012, 2013; university football team (2012 -- present), Bangalore C-division player - Swimming: Karnataka state masters championships 2013 (50m freestyle bronze, 4x50m medley relay bronze, 4x50m freestyle relay bronze -- NCBS/IISc team) - Languages: English (fluent), Hindi (native), Telugu (native), French, Kannada