Kiwhan Song

Researcher @ OpenAI

I am a researcher at OpenAI working on GPT image generation. Previously at MIT CSAIL, I worked on generative models and video world models with Vincent Sitzmann. Before that, I worked on graph representation learning at the MIT-IBM Watson AI Lab with Jie Chen.

news

Oct 2025	Joined OpenAI imagegen team.
Sep 2025	Graduated from MIT.
May 2025	DFoT accepted to ICML 2025.
Feb 2025	Launched kiwhan.dev.

selected publications

arXiv
Selective Underfitting in Diffusion Models

Kiwhan Song^*, Jaeyeon Kim^* , Sitan Chen, Yilun Du, Sham Kakade, and Vincent Sitzmann

2025.

Abstract Website PAPER Bibtex

Diffusion models have emerged as the principal paradigm for generative modeling across various domains. During training, they learn the score function, which in turn is used to generate samples at inference. They raise a basic yet unsolved question: which score do they actually learn? In principle, a diffusion model that matches the empirical score in the entire data space would simply reproduce the training data, failing to generate novel samples. Recent work addresses this question by arguing that diffusion models underfit the empirical score due to training-time inductive biases. In this work, we refine this perspective, introducing the notion of selective underfitting: instead of underfitting the score everywhere, better diffusion models more accurately approximate the score in certain regions of input space, while underfitting it in others. We characterize these regions and design empirical interventions to validate our perspective. Our results establish that selective underfitting is essential for understanding diffusion models, yielding new, testable insights into their generalization and generative performance.
@misc{song2025selective, title = {Selective Underfitting in Diffusion Models}, author = {Song, Kiwhan and Kim, Jaeyeon and Chen, Sitan and Du, Yilun and Kakade, Sham and Sitzmann, Vincent}, year = {2025}, eprint = {2510.01378}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2510.01378}, }
ICML
History-Guided Video Diffusion

Kiwhan Song^* , Boyuan Chen^*, Max Simchowitz, Yilun Du, Russ Tedrake, and Vincent Sitzmann

In the 42nd International Conference on Machine Learning, 2025.

Abstract Website PAPER Code Demo Bibtex

Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos.
@misc{song2025historyguidedvideodiffusion, title = {History-Guided Video Diffusion}, author = {Song, Kiwhan and Chen, Boyuan and Simchowitz, Max and Du, Yilun and Tedrake, Russ and Sitzmann, Vincent}, year = {2025}, eprint = {2502.06764}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2502.06764}, }