Research Student · IISc Bangalore
MTech (Research) student at the Robert Bosch Centre for Cyber-Physical Systems, Indian Institute of Science, Bangalore. My research focuses on solving Constrained Optimization problems using Offline Reinforcement Learning.
01 — About
I'm a research student at the Indian Institute of Science (IISc), Bangalore, working at the Robert Bosch Centre for Cyber-Physical Systems (RBCCPS). My research lies at the intersection of Reinforcement Learning, Optimization, and Safety — specifically on developing frameworks for safe and performant Offline RL that can handle hard, state-wise safety constraints.
Prior to IISc, I completed my Engineering from IIT Gandhinagar where I developed strong interest in Computers and systems. I've also worked as an R&D Intern at SanDisk (Western Digital), where I contributed to optimizing LLM inference on edge devices.
02 — Publications
M. Tayal, M. Tayal. Devised an epigraph reformulation-based value function guided Offline RL framework that co-optimizes agent policy for safety and performance, maximizing reward while satisfying state-wise hard safety constraints at each step.
Read Paper ↗M. Tayal, M. Tayal, S. Kolathaya, R. Prakash. A value-guided framework that learns actuation-aware, infinite-horizon safe Neural Control Barrier Functions from offline demonstrations — producing deployable safety filters that satisfy hard, state-wise constraints.
Read Paper ↗M. Tayal, M. Tayal, R. Prakash. A variational, parameter-conditioned imitation-learning framework that encodes noisy/stochastic obstacle and goal sensor readings into a probabilistic latent space to improve robustness of offline-learned models.
Read Paper ↗M. Tayal, Y. Simmhan. Investigated the unexplored paradigm of heterogeneous mobile accelerators, evaluating multi-instance DNN inference performance on NVIDIA Jetson AGX Orin to understand resource congestion between CUDA cores, Tensor Cores, and DLAs under concurrent workloads.
Read Paper ↗03 — Experience
RBCCPS, Indian Institute of Science, Bangalore
Conducting research on Constrained Optimization using Offline Reinforcement Learning under Prof. Ravi Prakash. Developing frameworks (EpiFlow, V-OCBF, RISE) for safe and performant offline RL with hard safety constraints.
Indian Institute of Science, Bangalore
Tutoring the course on topics covering Offline Reinforcement Learning. Collaborating with 2 other TAs on designing problem statements and course curriculum slides.
SanDisk (Western Digital), Bangalore
Worked on an end-to-end software framework for enhanced inference latencies of LLMs on consumer-grade edge devices — resulting in a U.S. Patent filing. Achieved over 60% reduction in latency associated with loading model weights. Worked with low-level HPC toolkits including SPDK for kernel bypass, and implemented deep learning architecture for predictive memory fetching across model components.
CDS, Indian Institute of Science, Bangalore
Under Prof. Yogesh Simmhan, investigated heterogeneous mobile accelerators and explored performance behavior of multi-instance DNN inferencing on NVIDIA Jetson AGX Orin. Awarded Best Student Paper Presentation at HiPC 2024.
04 — Projects
Research under Prof. Ravi Prakash at RBCCPS, IISc. Devised an epigraph reformulation-based Offline RL framework (EpiFlow) for co-optimizing safety and performance. Developed V-OCBF for learning deployable safety filters from offline data. Proposed RISE — a variational imitation-learning framework for robust policy learning under sensor noise.
Research under Prof. Yogesh Simmhan at CDS, IISc. Evaluated multi-instance DNN inferencing on NVIDIA Jetson AGX Orin, exploring resource congestion between co-located accelerators — CUDA Cores, Tensor Cores, and DLAs — under concurrent workloads. Motivated intelligent workload-aware load-allocation frameworks.
Project under Prof. Balagopal Komarath at IIT Gandhinagar. Contributed a novel algorithm to the SageMath repository to reduce time complexity of finding Vertex Cover size. Developed a fixed-parameter tractable algorithm that attempts to solve the Vertex Cover problem in polynomial time for a given parameter k.
Full-stack online judge for competitive programming under Prof. Neeldhara Misra. Built with Golang/GoFiber backend and ReactJS+Vite frontend. Designed a separate "judge server" for containerized, secure user code execution using Docker, with parallel submission processing via goroutines and channels.
05 — Skills
06 — Contact
I'm always open to discussing research collaborations, new ideas in safe RL and constrained optimization, or opportunities to contribute to impactful work.