Nishit Anand

Nishit Anand

I am a final-year MS CS student at the University of Maryland, College Park. My research focuses on Multimodal LLMs, AI Agents, and Audio models under the guidance of Prof. Dinesh Manocha in the GAMMA Lab and Prof. Ramani Duraiswami in the PIRL Lab.

I worked as a Research Scientist Intern at Adobe Research San Francisco in Summer 2025, where I built the first speech model which can understand both the paralinguistic and acoustic aspects of speech. I have also collaborated with NVIDIA on multiple research projects, including the Audio-Flamingo series of models: Audio Flamingo Next and MMOU, a large-scale omni-modal benchmark.

Before my Master's, I worked as a Research Scientist at Radien, a Seattle-based AI startup funded by the Allen Institute for AI (AI2). There, I built agents to predict visual similarity in UI elements and intelligently simplify codebases. I was also the Founding Research Engineer at Ananas Labs, a Bengaluru-based startup founded by a former Staff Research Scientist at Google Research India. I developed our low-latency Indic-SpeechLLM, optimizing it for our app using model quantization, and researched a novel multilingual phoneme-based tokenizer to reduce token inefficiency in Indic languages.

Prior to my industry roles, I was a Research Fellow at the Vision and Graphics Lab, IIT Delhi, under Prof. Chetan Arora. Working on a government-funded project, I created state-of-the-art Vision Transformer-based OCR models for 13 Indian languages, deploying them with inference optimization and caching techniques for higher throughput. Earlier, as a Research Engineer at the Autonomous Networked Systems Lab at IIIT-Delhi under Prof. Saket Anand, I led the Driver Status Monitoring (DSM) module of a $300K government-funded autonomous driving project. I trained and optimized facial landmark detection and object detection models for deployment on our custom Android app and NVIDIA Jetson devices.

I completed my B.Tech (Honours) in Computer Science from Jaypee Institute of Information Technology Noida (JIIT Noida) in 2022, graduating in the top 5% of the CS department and scoring the highest grades in all courses in my final-year. I have also had the privilege of collaborating with Prof. Salman Khan, Prof. Mohamed Elhoseiny, and Prof. Ruohan Gao, among other exceptional mentors.

Email / CV / Google Scholar / Github / LinkedIn

News

[2026/04] 🎉 FIGMA: Towards Fine-Grained Music Retrieval accepted to ACL 2026!
[2026/04] 🎉 We release Audio Flamingo Next, the next-generation and most capable large audio-language model in the Audio Flamingo series.
[2026/03] 🎉 We release MMOU, a large-scale benchmark for evaluating omni-modal models on joint audio-visual understanding in long, complex real-world videos.
[2026/02] 🎉 Learning Illumination Control in Diffusion Models accepted to ICLR 2026 ReALM-GEN Workshop on Diffusion Models!
[2026/01] 🎉 Gencho accepted to ICASSP 2026 as Oral Presentation!
[2025/10] 🎉 MMAU-Pro accepted to AAAI 2026!
[2025/09] 🎉 Two papers accepted to EMNLP 2025! MultiVox received an Outstanding Paper Nominee award!
[2025/07] 🎉 Aurelia accepted to ICCV 2025!
[2025/06] 💼 Started Research Scientist Internship at Adobe Research, San Francisco!
[2025/05] 🎉 Paper on Linguistic Variations in Audio-Language Models accepted to NAACL 2025!
[2024/12] 🎉 TSPE paper accepted to SALMA Workshop at ICASSP 2025!
[2024/08] 🎓 Started MS in Computer Science at University of Maryland, College Park.
[2024/02] 📄 VidSum - Video Summarization paper now available on IEEE Xplore!
[2024/02] 🏆 Honored to be serving as a Judge at the LLM Hackathon, Vendata Event at Oneiros Technical Fest 2024, Manipal University Jaipur.
[2023/10] 🎉 Video Summarization Paper accepted to ICI 2023!
[2023/05] 🎉 Hurst-based Influence Maximization Paper accepted to Journal - Expert Systems!
[2023/04] 💼 Joined IIT Delhi as a Research Fellow in the CS Dept, working under Dr. Chetan Arora in Vision & Graphics Lab on MultiLingual OCR ML Models for Indic Languages Project.
[2022/06] 💼 Joined IIIT Delhi as a Research Engineer under Dr. Saket Anand to work in the Perception module of the Autonomous Driving Project - ALIVE.
[2022/05] 🎓 Graduated from Jaypee Institute of Information Technology Noida, with B.Tech in Computer Science with Honors!
[2022/02] 🎉 SaveLives Threat Detection Paper accepted to ICI 2022!
[2021/09] 🎉 DeepFake Detection Paper accepted to ICSC 2021!

Research

	FIGMA: Towards Fine-Grained Music Retrieval Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha, Ramani Duraiswami ACL, 2026 paper
	Learning Illumination Control in Diffusion Models Nishit Anand, Manan Suri, Christopher Metzler, Dinesh Manocha, Ramani Duraiswami* ICLR 2026, ReALM-GEN Workshop on Diffusion Models arXiv
	ParA-LLM: A Unified Approach to Paralinguistic and Acoustic Speech Understanding Nishit Anand, Jiaqi Su, Ke Chen, Yunyun Wang, Dinesh Manocha, Ramani Duraiswami, Rithesh Kumar, Zeyu Jin paper
	MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark Arushi Goel, Sreyan Ghosh, Vatsal Agarwal, Nishit Anand, Kaousheik Jayakumar, Lasha Koroshinadze et al. arXiv, 2026 arXiv
	Audio Flamingo Next: Next-Generation Audio-Language Models for Speech, Sound, Music Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong et al. arXiv, 2026 arXiv
	Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models Ashish Seth, Sonal Kumar, Ramaneswaran S, Nishit Anand, Utkarsh Tyagi et al. arXiv*, 2026 arXiv
	Gencho: Room Impulse Response Generation via Diffusion Transformers Jackie Lin, Jiaqi Su, Nishit Anand, Zeyu Jin, Minje Kim, Paris Smaragdis ICASSP, 2026 (Oral) arXiv
	Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR Shashank Vempati, Nishit Anand, Gaurav Talebailkar, Arpan Garai, Chetan Arora Preprint, 2025 arXiv
	MMAU-Pro: A Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Sonal Kumar, Šimon Sedláček, Vaibhavi L, Fernando López, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen et al. AAAI, 2026 arXiv
	MultiVox: Benchmark for Evaluating Multimodal Voice Assistants Ramaneswaran S, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha EMNLP, 2025 (Outstanding Paper Nominee) arXiv
	EgoIllusion: Benchmarking Hallucinations in Egocentric Video Understanding Ashish Seth, Utkarsh Tyagi, Ramaneswaran S, Nishit Anand, Sonal Kumar, Sreyan Ghosh, Ramani Duraiswami, Chirag Agarwal, Dinesh Manocha EMNLP, 2025 arXiv
	Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny, Salman Khan, Dinesh Manocha ICCV, 2025 arXiv
	Do Audio-Language Models Understand Linguistic Variations? Ramaneswaran S, Sonal Kumar, Hemant Giri, Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha NAACL*, 2025 arXiv
	TSPE: Task Specific Prompt Ensemble for Improved Zero-Shot Audio Classification Nishit Anand, Ashish Seth, Ramani Duraiswami, Dinesh Manocha SALMA Workshop ICASSP, 2025 arXiv
	Mitigating Memorization in LLMs using Activation Steering Manan Suri, Nishit Anand, Amisha Bhaskar Preprint, 2024 arXiv
	VidSum - Video Summarization using Deep Learning Nishit Anand, Rupesh Koshariya, Varsha Garg International Conference on Informatics (ICI), 2023 paper
	A Hurst-based Diffusion Model using Time Series Characteristics for Influence Maximization in Social Networks Bhawna Saxena, Vikas Saxena, Nishit Anand, Vikas Hassija, Vinay Chamola, Amir Hussain Expert Systems (Wiley), 2023 paper
	SaveLives - A Real-Time Threat Detection System Nishit Anand, Rupesh Koshariya International Conference on Informatics (ICI), 2022 paper
	IsSwap: Deep Fake Detection Aakriti Aggarwal, Siddhant Wadhwa, Pallav Gupta, Nishit Anand, Rashmi Kushwah International Conference on Signal Processing and Communication (ICSC), 2021 paper

Website template from Jon Barron