CS Workshop 2025

Speakers

Mark D. Hill

Gene M. Amdahl and John P. Morgridge Professor Emeritus of Computer Sciences

University of Wisconsin Madison

Talk Title: In Computer Architecture, We Don't Change the Questions, We Change the Answers

Abstract: When I was a new professor in the late 1980s, my senior colleague Jim Goodman told me, "On the computer architecture PhD qualifying exam, we don't change the questions, we only change the answers". More generally, I now augment this to say, "In computer architecture, we don't change the questions, application and technology innovations change the answers, and it's our job to recognize those changes." Eternal questions this talk will sample are how best to do the following interacting factors: compute, memory, storage, interconnect/networking, security, power, cooling and one more. The talk will not provide the answers but leave that as an audience exercise.

Bio: Mark D. Hill is the Gene M. Amdahl and John P. Morgridge Professor Emeritus of Computer Sciences at the University of Wisconsin-Madison (http://www.cs.wisc.edu/~markhill), following his 1988-2020 service in Computer Sciences and Electrical and Computer Engineering. His research interests include parallel-computer system design, memory system design, and computer simulation. Hill's work is highly collaborative with over 170 co-authors. He received the 2019 Eckert-Mauchly Award and is a fellow of AAAS, ACM, and IEEE. He served on the Computing Community Consortium (CCC) 2013-21 including as CCC Chair 2018-20, Computing Research Association (CRA) Board of Directors 2018-20 & 2025-present, and Wisconsin Computer Sciences Department Chair 2014-2017. Hill was Partner Hardware Architect at Microsoft (2020-2024) where he led software-hardware some pathfinding for Azure. Hill has a PhD in computer science from the University of California, Berkeley.

Manosiz Bhattacharyya

Cheif Technology Officer

Nutanix

Talk Title: Cloud Native and AI in a Multicloud World: Simplifying Innovation and Resilience

Abstract: Explore practical and strategic considerations for enabling developers, platform engineers, and IT teams to move faster and more efficiently for implementing AI, bringing new apps online, and adopting hybrid and multicloud environments. Learn how cloud native architectures, data locality, and infrastructure abstraction can simplify deployment, enhance resilience, and optimize cost-helping leaders align technology with business outcomes.

Bio: Mano has been a driving force at Nutanix for over a decade, beginning as one of the earliest engineers building the foundation of the industry leading Nutanix Hyper-Converged Storage Stack. He provides the Nutanix architecture and technology directions, shaping how Nutanix serves customers across hybrid cloud, AI, and enterprise workloads. Before assuming the CTO role at Nutanix, Mano served as the SVP & GM for the Core products division, growing it into a billion dollar plus business in a short span of time. Prior to Nutanix, Mano held key technical leadership positions at Oracle - working on the core Database architecture, HP - Developing the industry best dynamic optimizer for Java Virtual Machine (JVM). With a B.Tech from IIT Kharagpur, an M.S. from UC Santa Cruz, and over 40 U.S. patents, Mano brings deep technical expertise and a relentless curiosity to everything he does. In recent years, Mano has led Nutanix's bold leap into agentic AI, evolving from single-shot generative models to secure, autonomous multi-agent systems. Under his leadership, Nutanix Enterprise AI now powers both internal tools like Support GPT and customer-facing solutions built in partnership with Nvidia. Outside of work, Mano is known for his calm presence, sharp wit, and love for whiteboard sessions that somehow always end with a breakthrough idea. He is not just building the future of AI-he is inspiring the people around him to dream bigger and build smarter.

R. Govindarajan

Professor

IISc, Bangalore

Talk Title: TREEBEARD: A Retargetable Compiler for Decision Tree Inference

Abstract: Decision tree based models are the most popular models on tabular data. Decision tree ensemble inference is usually performed with libraries. While these libraries apply a fixed set of optimizations, the solutions lack portability and fail to fully exploit hardware or model specific information. In this talk, we present the design of a schedule-guided, retargetable compiler for decision tree based models, called Treebeard, which has two core components. The first is a scheduling language that encapsulates the optimization space, and techniques to efficiently explore this space. The second is an optimizing retargetable compiler that can generate code for any specified schedule by lowering the inference computation to optimized CPU or GPU code through multiple intermediate abstractions. By applying model-specific optimizations at the higher levels, tree walk optimizations at the middle level, and machine-specific optimizations lower down, Treebeard can specialize inference code for each model on each supported target. Treebeard combines several novel optimizations at various abstraction levels, uses different data layouts, loop structures and caching strategies to mitigate architectural bottlenecks and achieve portable performance across a range of targets. Treebeard is implemented using the MLIR compiler infrastructure and can generate code for single and multi-core CPUs as well as GPUs (both Nvidia and AMD MI GPUs). Treebeard demonstrates significant performance gain over the state-of-the-art methods, both on CPUs and on GPUs.

Bio: Govindarajan received his B.Sc. degree in Mathematics from Madras University in 1981 and B.E. (Electronics and Communication) and Ph.D. (Computer Science) degrees from the Indian Institute of Science, Bangalore in 1984 and 1989 respectively. Since 1995, he has been with the Supercomputer Education and Research Centre and the Department of Computer Science and Automation, Indian Institute of Science, Bangalore. His research interests are in the areas of High Performance Computing, Compilation Techniques, and Computer Architecture. He is a fellow of the Indian National Academy of Engineering.

Murali Annavaram

Professor

University of Southern California

Talk Title: Let's Make ML Affordable

Abstract: Machine learning (ML) training and inference services are some of the most compute hungry workloads that we have ever encountered. As ML parameter size explodes into Trillions they have an insatiable desire for GPUs and high bandwidth memory (HBM), both of which are expensive resources. In this talk I present our group's work on reducing our reliance on GPUs and HBMs in two ML domains: recommendation models and LLMs. Recommendation models have Trillions of embedding table entries that are usually distributed across GPU clusters. In our cDLRM research we made the observation that training of ML models is highly predictable as we can look ahead into the future to extract the training batches. This predictability can be exploited to move the embedding tables to CPU DRAMs (think very cheap memory!) and transfer only a tiny, but relevant, portion of the embedding tables to the GPU HBM just in time. In our follow up work, titled LEAF, we explored an orthogonal design space of embedding table compression. LEAF is a multi-level hashing framework that compresses the large embedding tables based on real-time access frequency distribution. In particular, LEAF leverages a streaming algorithm to estimate access distributions on the fly without relying on model gradients or requiring a priori knowledge of access distribution and achieves 2 orders of magnitude compression with limited model accuracy loss. In the second part of the talk I will present resource efficient solutions for LLMs. In our KVPR research we argued for offloading KV caches to CPU DRAMs. But CPU-GPU PCIe bandwidth could be a serious impediment for performance. We present a novel cache+recompute approach where part of the KV cache data is transferred from CPU to GPU, while part of the KV values are recomputed on the GPU to overlap the communication delay with computations. Finally, in our DEL research work we tackle the problem of making speculative decoding affordable. Speculative decoding is a key technique to enhance token generation speed. But identifying the right speculative decoding architecture is a challenging issue as some tokens demand large speculation resources while other tokens are much easier to predict. DEL provides a dynamic approach to select an optimal speculation resource for each token.

Bio: Murali Annavaram is the Lloyd Hunt Chair Professor in the Ming-Hsieh Department of Electrical and Computer Engineering and in the Thomas Lord department of Computer Science (joint appointment) at the University of Southern California. Prior to that he was the Dean's Professor at USC and held the Rukmini Gopalakrishnachar Visiting Chair Professor at the Indian Institute of Science. He is the founding director of the REAL@USC-Meta center that is focused on research and education in AI and learning. His research group tackles a wide range of computer system design challenges, relating to energy efficiency, security and privacy. He has been inducted to the hall of fame for three of the prestigious computer architecture conferences ISCA, MICRO and HPCA. He served as a Technical Program Chair for ICS 2024, HPCA 2021, and served as the General Co-Chair for ISCA 2018. Prior to his appointment at USC he worked at Intel Microprocessor Research Labs from 2001 to 2007. His work at Intel led to the first 3D microarchitecture design, and also influenced Intel's TurboBoost technology. In 2007 he was a visiting researcher at the Nokia Research Center working on mobile phone-based wireless traffic sensing using virtual trip lines, which later became Nokia Traffic Works product. In 2020 he was a visiting faculty scientist at Facebook, where he designed the checkpoint systems for distributed training. Murali co-authored Parallel Computer Organization and Design, a widely used textbook to teach both the basic and advanced principles of computer architecture. Murali received the Ph.D. degree in Computer Engineering from the University of Michigan, Ann Arbor, in 2001. He is a Fellow of IEEE and Distinguished Member of ACM.

Saurabh Goyal

Senior Principal Research Engineer

Microsoft Research India

Talk Title: Sparse attention techniques for long context inference

Abstract: Long context workloads have become increasingly common in LLM inference, driven by applications such as RAG, Multimodal inference and the recent progress in Chain of Thought. The self-attention operation, which scales quadratically with context length, becomes a dominant cost for long context inference. This talk will discuss some case studies in reducing the cost of self-attention operation using training free sparse attention mechanisms.

Bio: Saurabh Goyal is a Senior Principal Research Engineer at Microsoft Research India, currently working on research in LLM inference efficiency. Prior to Microsoft Research, Saurabh spent over a decade at Google working in the Search Infrastructure and Google Assistant teams.

Jayant Haritsa

Professor

IISc, Bangalore

Talk Title: Robust Query Processing: Where Geometry Beats ML!

Abstract: Over the past half-century, the design and implementation of declarative query processing techniques in relational database systems has been a foundational topic. Despite this sustained study, the solutions have largely remained a "black art" due to complexities of database query processing. Recent work explores two directions: learning-based query performance prediction and geometric search strategies. This talk argues that geometry, despite its simplicity, offers stronger guarantees for robust query processing.

Bio: Jayant Haritsa is on the computer science faculty at the Indian Institute of Science, Bangalore, since 1993. He received a BTech degree from IIT Madras, and the MS and PhD degrees from the University of Wisconsin (Madison). He is a Fellow of ACM and IEEE for his contributions to database engine design and analysis.

Mohan Parthasarathy

Distinguished Technologist

Hewlett Packard Enterprise

Talk Title: CXL - From Research to Reality

Abstract: Compute Express Link (CXL) is an open industry standard interconnect offering high-bandwidth, low latency connectivity between host processors and devices such as accelerators, memory buffers, and smart I/O devices. It is designed to address the growing high-performance computational workloads by supporting heterogeneous processing and memory systems by enabling cache coherency and memory semantics. This talk will cover the following areas : - Motivation of CXL and Use cases - CXL Research areas in Academia and Industry - CXL Hardware (Vendors/Products) and Industry Landscape - Software/Solution impacts and opportunities with CXL

Bio: Mohan Parthasarathy is a Distinguished Technologist in the Server Firmware R&D Team in HPE Compute Engineering, and is based in Bangalore, India. His primary focus currently is on platform architecture, including hardware, firmware, and software for the Servers portfolio of HPE. He has over 25 years of industry experience, and has worked on multiple areas of operating systems, virtualization, and platform firmware. Mohan did his Masters in Electrical Engineering from the Indian Institute of Science, Bangalore, and is a co-inventor of 14 US patents.

Biswabandan Panda

Associate Professor

IIT Bombay

Talk Title: The Micro Things That Matter: Microarchitecture for Macro Servers

Abstract: Many-core servers are the compute engines that drive large-scale datacenters 24/7, 365 days a year. These servers consist of 10s to 100s of processor cores running application with huge code and data footprints and performance of memory hierarchy plays an important role in the overall throughput of these servers. The talk will be about our journey in designing micro things in improving cache hierarchy for macro-servers keeping huge code/data footprints and limited DRAM bandwidth in mind. My awesome mentees (Sweta, Vedant, Prerna, and Hrishikesh) and I embarked on this journey together.

Bio: A mortal who is excited about microarchitecture and in general computer architecture research. A micro-architect does all the heavy lifting to squeeze out the best performance out of applications running on computing systems. Biswa is interested in microarchitecture ideas for improving performance and security of modern processors. Biswa is one of the recipients of Qualcomm India Faculty Award 2022, Google India Research Award 2022, Prof. Krithi Ramamritam Award for creative research 2023, Qualcomm Faculty Award 2024, Shridhar Shukla Chair Professor in Digital Trust (2025-2028), and USENIX SECURITY 2025 Distinguished Artifact Award.

Uday Reddy Bondhugula

Professor

IISc, Bangalore

Talk Title: Building Effective Compilers for AI Programming Frameworks

Abstract: This talk will focus on the role of compilers in the era of AI programming frameworks (e.g. PyTorch, JAX) and AI hardware accelerators. AI models are evolving and continue to heavily rely on high-performance computing. Specialized hardware for AI is often hard to program to exploit peak performance. AI models are also evolving in a way that is coupled with hardware strengths. This talk will describe how to build effective compiler systems using the MLIR infrastructure in a layered way to improve hardware usability and deliver high performance as automatically as possible.

Bio: Uday Bondhugula is a professor of Computer Science and Engineering at the Indian Institute of Science (IISc), Bangalore, India. He is also the founder, CEO, and CTO at PolyMage Labs, a deep-tech compiler startup. His research interests and expertise lie in the areas of compilers for AI, high-performance computing, polyhedral framework, automatic parallelization, and high-performance systems/accelerators for AI. As a visiting researcher at the Google Brain team in 2018, he was a founding team member of the MLIR project. He is also the original author and maintainer of Pluto, a source-to-source loop parallelization and optimization tool based on the polyhedral framework. In 2018, he received the ACM SIGPLAN Most Influential Paper award for his PLDI 2008 paper on polyhedral optimization for parallelism and locality. He received his Ph.D. from the Ohio State University in 2008 and his Bachelor's in Computer Science from the Indian Institute of Technology, Madras, in 2004.

Arun Ramachandran

Principal Member of Technical Staff

AMD

Talk Title: Evolving LLM Systems: Inference Opportunities and AMD MI Roadmap

Abstract:Large language models (LLMs) are rapidly increasing in scale and capability, placing growing demand on hardware, software, and deployment stacks. In this talk we present key research challenges in engineering LLM inference, and motivate systems research for efficient, scalable deployment. Building on this foundation, we describe IISc-AMD collaborative work on optimizations that accelerate LLM inference on CPUs and GPUs. Finally, we present AMD's roadmap for MI-class accelerators for inference and training and show how upcoming hardware and software capabilities will address critical systems challenges while enabling the research community through an open ecosystem.

Bio:Arun Ramachandran is a Principal Member of Technical Staff, Machine Learning, at AMD with 17+ years of industry experience. He is advised by Dr. Prakash Raghavendra (AMD India) and Prof. Govindarajan Ramaswamy (IISc).

Sorav Bansal

Professor

IIT Delhi

Talk Title: Imagining a next-generation superoptimizer

Abstract:A program superoptimizer uses a search procedure, e.g., a probabilistic backtracking algorithm, to find an optimized (and sometimes, optimal) implementation of a program specification on a given machine architecture. Each candidate implementation proposed by the search procedure is checked for equivalence with the input program specification, to eventually identify a sound optimization that can subsequently be stored as a peephole optimization rule. This is in contrast to a traditional compiler optimizer that is typically organized as a sequence of algorithmic passes that transform the program successively towards an optimized implementation.
I will share my thoughts on why the traditional model of compiler development may be unsustainable, and why it is likely for a superoptimizer to become a mainstream method of optimizing programs in the foreseeable future, considering recent advances in AI. I will then present our formal program equivalence checker which is intended to enable such a superoptimizer.

Bio:Sorav Bansal is a professor at the CS department at IIT Delhi, and works on program superoptimization. He is currently on a long leave from IIT Delhi, and is working on superoptimization in the context of quantitative trading algorithms at Graviton Research Capital. Sorav obtained his B.Tech. from IIT Delhi, and Ph.D. from Stanford University.

Jayesh Gaur

Chief Architect

IBM

Talk Title: Advancing General Purpose CPU Computing in the AI Era

Abstract: In the age of artificial intelligence, the ongoing evolution and optimization of general-purpose, high-performance, out-of-order cores remains crucial for modern computing. As advancements in microprocessor technology become increasingly challenging, substantial research efforts are now directed towards enhancing the performance of these cores through innovations in micro-architecture. This talk will delve into the primary bottlenecks encountered in contemporary server cores, the micro-architectural innovations necessary to address these challenges, and the role that AI can play in improving overall core performance and efficiency. Furthermore, it will explore how CPUs with advanced AI extensions can significantly benefit critical AI inference applications.

Bio: Jayesh Gaur is the chief architect of the Power Server Cores at IBM. He has nearly two decades of experience in building high performance microprocessors. Jayesh has filed for 60+ patents in computer architecture and has widely published in top conferences. He received best paper awards at ISCA 2024, HPCA 2017, a runners up to best paper at HPCA 2021 and has been admitted to the Hall of Fame of the ISCA conference. Jayesh completed his education from the Indian Institute of Technology, Kanpur and the University of Michigan, Ann Arbor.

Ranjita Bhagwan

Principal Engineer at Google

Google

Talk Title: Challenges in Observability of the Google Network

Abstract: Google owns and operates one of the world's largest networks, supporting billions of users. Today, this network not only supports the users of Google's various applications such as Gemini, Search, Youtube, Gmail and Maps, it also forms a critical part of the infrastructure supporting enterprise customers of the Google Cloud Platform. Given the scale and complexity of such varied applications, observing the performance and reliablity of the network and its various components is of prime importance. In this talk, I will present some of the challenges that we envision for network observability in coming years, and how we plan to address them.

Bio: Ranjita Bhagwan is currently a Principal Engineer at Google, working on making Google's network highly reliable. Recently, her work has focused on using data-driven approaches to improve networks and services and has led to several publications and awards. She is an ACM Distinguished Member, INAE Fellow, and is the recipient of the 2020 ACM India Outstanding Contributions to Computing by a Woman Award. She received her PhD and MS in Computer Engineering from University of California, San Diego and a BTech in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur.

Subrata Mitra

Senior Research Scientist

Adobe

Talk Title: Computing Less by Understanding More

Abstract: Large foundation models have significantly reshaped how machine learning and computer vision problems are approached. However, these models are often treated as black boxes, with interaction limited to prompting or fine-tuning via loss functions. This talk advocates for a deeper examination of their internal behavior, e.g., understanding of the latent spaces and attention mechanisms to uncover signals that can be systematically leveraged to improve efficiency, accuracy, and interpretability for different tasks. This talk will present a set of methods that utilizes internal computation patterns to achieve systems-level optimizations. For instance, it will show how attention logits in transformers reveal stable contextual relationships that can inform both cache management and fine-grained attribution. In retrieval-augmented generation, attention states corresponding to frequently retrieved chunks can be reused across queries, provided their validity is carefully checked and maintained. For generative models such as diffusion pipelines, the reuse of intermediate noise states or prompt-derived visual concepts offers a path to accelerate inference without compromising output quality. Finally, in continuous-time generative models, the model's own recent outputs can be used to speculatively predict future steps, allowing computation to be skipped when these predictions are accurate. These techniques do not require retraining and are compatible with existing model architectures. By interpreting the computations that foundation models already perform, these works highlight how internal structure can be exposed and repurposed to build systems that are faster, more efficient, and more explainable.

Bio: Subrata Mitra is a Senior Research Scientist at Adobe Research, Bangalore. His current research lies at the intersection of computer systems and machine learning, with a focus on efficiency and scalability. Previously, his work primarily addressed improving the performance and reliability of cloud and distributed systems and improving scalability of Big-Data processing and Recommender Systems. His research contributions include approximately 40 papers published in computer systems conferences e.g. USENIX NSDI, EuroSys, USENIX ATC, SenSys, SIGMOD, VLDB, PLDI, and artificial intelligence conferences e.g. AAAI, ICML, NeurIPS, ACL, and ECCV. He also serves in the PC of several major conferences. He received his Ph.D. in Electrical and Computer Engineering from Purdue University, West Lafayette, MS in Computer Engineering from University of Florida, Gainesville and BE in Electronics and Telecommunication Engineering from Jadavpur University, Kolkata. Prior to Adobe Research, he had spent time as research interns at Microsoft Research , AT&T Research, and Lawrence Livermore National Labs. Even prior to that he worked in Software Engineering roles at Intel and start-up (now acquired by Synopsys) on Electronic Design Automation and built software tools that help explore the internals of microchips.

V. Krishna Nandivada

Professor, Senior Member IEEE, Senior Member ACM

IIT Madras

Talk Title: Extracting Useful Parallelism from User Perceived Ideal Parallelism

Abstract: Multicore systems have taken the computing world by a storm, with the ever-increasing amount of parallelism in the hardware, and the continuously changing landscape of parallel programming. The programmers are expected to think in parallel and express the program logic (ideal parallelism), using parallel languages of their choice. However, a parallel program is not guaranteed to be efficient just because it is parallel. This problem becomes challenging as many of the traditional assumptions about serial programs do not hold in the context of parallel programs. In this talk, we will discuss some of our experiences in bridging the gap between ideal and useful parallelism.
The talk will first explain the performance challenges in parallel programs. We will follow it up with our experience in identifying different patterns in parallel programs that can be exploited to realize highly performant code. These can be seen as both manual and compiler optimizations. We will focus on the safety, profitability, and opportunities of such optimizations in the context of task-parallel programs. We will also explain the insufficiency of traditional analysis for safely transforming parallel programs and discuss how may-happen-in-parallel analysis plays a vital role in sound and precise analysis of parallel programs. In addition to covering the traditional HPC kernels, we also will have a particular focus on irregular task-parallel programs, which are becoming critical workloads. We will explain the challenges with irregular task-parallel programs and then discuss how we can achieve high performance in irregular task-parallel programs.

Bio:V. Krishna Nandivada is currently a Professor in the department of Computer Science and Engineering at IIT Madras. He is a senior member of ACM and IEEE. Before joining IIT Madras in 2011, he spent nearly 5.5 years at IBM India Research Lab (Programming Technologies and Software Engineering group). Prior to starting his PhD, he was associated with Hewlett Packard. He holds a BE degree from REC (now known as NIT) Rourkela, ME degree from IISc Bangalore, and PhD degree from UCLA. His research interests are Compilers, Program Analysis, Programming Languages, and Multicore systems.

Pradeep Ramachandran

Director

KLA

Talk Title: Enabling Angstrom-scale Manufacturing with AI & HPC

Abstract:Semiconductor manufacturing is approaching the Angstrom-era with innovations such as gate all around transistors, and chip to chip integration technologies enabling the continuation of Moore's law. This talk will highlight some of the challenges that these advanced technologies pose to manufacturing semiconductors, and will cover how modern AI & HPC technologies are being leveraged to address these challenges to enable high-volume manufacturing. We will also give a peek into some of the solutions that KLA is pioneering in this space.

Bio:Pradeep Ramachandran is the Director and Head of Research at KLA's Artificial Intelligence and Advanced Computing Lab (ACL) based out of IIT Madras Research Park. He holds a Btech from IIT Madras, and an MS and PhD from the University of Illinois at Urbana Champaign. Pradeep's research interests lies at the intersection of hardware and software and has developed several system-level solutions that leverage the synergies at the boundary. Pradeep enjoys cooking, travel, running outdoors, knocking around shuttles on the badminton court, and rolling on the ground with his 6 year old son!