Conditional Retrospective Cylce GAN for Video Prediction

Authors: Prakhar Gupta, Mayuresh Bhosale

Institution: Clemson University

Course Name: CPSC8810 ML for Image Synthesis, Dr. Siyu Huang

Motivation & Key Idea

We build upon the ideas from Retrospective Cycle GAN (Kwon et al). They established great performance compared to the SOTA with their forward and backward temporal consistency idea for training the generator. However, they do not consider any conditionong on physics or restrict the movement of pixels expicitly

kitti_paper
Figure 1: This image from Kwon et al (2019) shows the blurring in longer term predicitons and distortions. Even though they outperform PredNet, it has room for improvement.

We ask the following quesiton: “Can we improve blurring in longer term predictions through the use of physics constraints?”

Formulation

RCGAN baseline model

They use two discriminators - one for image frame reconstruction and one for image sequence temporal consistency. The loss function in the baseline model is given by :

loss

Optical Flow Conditioned RCGAN

To restrict pixel movement to realistic areas indirectly, we exploit RAFT, Teed et al pre-trained optical flow model to condition generations on optical flow loss.

kitti_paper
Figure 2: Optical flow detection perormance on KITTI dataset has been well estbalished by the Teed et al.

The new loss function is designed as:

loss

Kinematics Constraint Flow Conditioned RCGAN

The new loss function is designed as:

loss

Combined Contstraints Conditioned RCGAN

loss

Project Results

We trained the conditional GAN on KITTI dataset city driving frame sequences of length 5. The models were evaluated on ~70 test sequences. Results belows show the performance of 3 models at different training epoch numbers for 2 chosen test sequences

Epoch 10

loss

Epoch 30

loss

Epoch 50

At epoch 50 of trianing, the blurring in kinematics conditioned RCGAN is already starting to improve above the baseline and approach-1 models.

loss

Epoch 100

Here, the kinematic conditioning approach clearly outperforms all others. We also explore the combined flow and kinemtatics conditioning of GAN. We notice that the expansion of scalar multi variable loss function demands more trianing to improve blurring. But the statistical metric of PSNR, SSIM and MSE are most promising.

loss

Some more details for different test frames from Kinematics conditioned approach:

loss

Project Insights and Conclusions

We develop a conditional GAN model to restrict pixel movements to realistic ones using optical flow or kinematic velocity constraints. This helps reducing some blurring over the baseline model. This is also the first step towards differential eqn based physics conditioning of the GAN model that utilizes object pixel tracking.