Authors: Prakhar Gupta, Mayuresh Bhosale
Institution: Clemson University
Project Report: Report Paper, PPT
Course Name: CPSC8810 ML for Image Synthesis, Dr. Siyu Huang
Motivation & Key Idea
We build upon the ideas from Retrospective Cycle GAN (Kwon et al). They established great performance compared to the SOTA with their forward and backward temporal consistency idea for training the generator. However, they do not consider any conditionong on physics or restrict the movement of pixels expicitly
Figure 1: This image from Kwon et al (2019) shows the blurring in longer term predicitons and distortions. Even though they outperform PredNet, it has room for improvement.
We ask the following quesiton: “Can we improve blurring in longer term predictions through the use of physics constraints?”
Formulation
RCGAN baseline model
They use two discriminators - one for image frame reconstruction and one for image sequence temporal consistency. The loss function in the baseline model is given by :
Optical Flow Conditioned RCGAN
To restrict pixel movement to realistic areas indirectly, we exploit RAFT, Teed et al pre-trained optical flow model to condition generations on optical flow loss.
Figure 2: Optical flow detection perormance on KITTI dataset has been well estbalished by the Teed et al.
The new loss function is designed as:
Kinematics Constraint Flow Conditioned RCGAN
The new loss function is designed as:
Combined Contstraints Conditioned RCGAN
Project Results
We trained the conditional GAN on KITTI dataset city driving frame sequences of length 5. The models were evaluated on ~70 test sequences. Results belows show the performance of 3 models at different training epoch numbers for 2 chosen test sequences
Epoch 10
Epoch 30
Epoch 50
At epoch 50 of trianing, the blurring in kinematics conditioned RCGAN is already starting to improve above the baseline and approach-1 models.
Epoch 100
Here, the kinematic conditioning approach clearly outperforms all others. We also explore the combined flow and kinemtatics conditioning of GAN. We notice that the expansion of scalar multi variable loss function demands more trianing to improve blurring. But the statistical metric of PSNR, SSIM and MSE are most promising.
Some more details for different test frames from Kinematics conditioned approach:
Project Insights and Conclusions
We develop a conditional GAN model to restrict pixel movements to realistic ones using optical flow or kinematic velocity constraints. This helps reducing some blurring over the baseline model. This is also the first step towards differential eqn based physics conditioning of the GAN model that utilizes object pixel tracking.