Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling the ground truth flow? #34

Open
joshua-heipel opened this issue Jan 27, 2020 · 3 comments
Open

Scaling the ground truth flow? #34

joshua-heipel opened this issue Jan 27, 2020 · 3 comments

Comments

@joshua-heipel
Copy link

Hello Mr Ferriere,

thank you alot for sharing your tensorflow implementation of PWC Net. Currently I am using it as a starting point for my thesis. However I'm wondering about the scaling factors you used for the groundtruth/predicted flow and I think there might be a mistake in your implementation.

In the paper it reads:

"We scale the ground truth flow by 20 and downsample it to obtain the supervision signals at different levels. Note that we do not further scale the supervision signal at each level, the same as [15]. As a result, we need to scale the upsampled flow at each pyramid level for the warping layer. For example, at the second level, we scale the upsampled flow from the third level by a factor of 5 (= 20/4) before warping features of the second image."

For me this means the following two things:
First, if you devide the ground truth flow by 20, then the predicted flow (in each level) will be around 20 times too small. Therefore, to get the real flow values, you have to multiply the predicted flow by 20. Particularly, if you do some kind of warping operation, the predicted flow has to be rescaled in advance.
Secondly, in order to get the supervision signal for each level, you have to downsample the ground truth flow to the same height and width as the predicted flow. If you don't further scale the ground truth flow after downsampling (what is proposed by the paper), its magnitude will be too large and so will be the predicted flow at that level. That's why, before warping the feature maps, you have to divide the predicted flow by a factor of 2^lvl.
In your implementation you (correctly) account for that with the following lines:

scaler = 20. / 2**lvl  # scaler values are 0.625, 1.25, 2.5, 5.0
warp = self.warp(c2[lvl], up_flow * scaler, lvl)

But what about the supervision signal?
If I'm correct you would have to divide the ground truth flow by a factor of 20. Otherwise the magnitude of the predicted (learned) flow will be around 20 times too large after multiplying it with the "scaler". In this case the warping won't do what it should. Now I'm wondering where you downscale the ground truth flow by 20?
Additionaly, in your pwcnet_loss function you downsample and downscale the supervision signal.

scaled_flow_gt /= tf.cast(gt_height / lvl_height, dtype=tf.float32)
scaled_flow_gt = tf.image.resize_bilinear(y, (lvl_height, lvl_width))

So, in the second line you divide the magnitude of the ground truth flow by 2^lvl. As far as I can see it, this is not correct, if you also rescale the predicted flow by multiplying it with the "scaler" before the warping operation. To be more precise, because of your loss function, the network learns to predict a flow, which in each level is 2^lvl smaller than the original flow. It therefore already has the correct magnitude for the height/width of that level. When multiplying it with the scaler, you divide it again by 2^lvl. So the magnitude of the flow is too small and the warping will be wrong again.

I hope that my explanation is somewhat understandable. Thanks alot for taking some time to think about it and maybe share your thoughts on my points.

Best, Joshua

@ysnan
Copy link

ysnan commented Jan 31, 2020

I have the exact same question.
I can understand the following code, the flow to be warpped should have smaller resolution 1/2^lvl, and 20 is kind of "unit",

scaler = 20. / 2**lvl # scaler values are 0.625, 1.25, 2.5, 5.0
warp = self.warp(c2[lvl], up_flow * scaler, lvl)

but I don't understand the reason when computing the ground truth flow in loss function, the necessity of dividing the ground truth flow by the gt_height / lvl_height.Right now, the output flow from NN should be the same unit as scaled_flow_gt, i.e., the full scale with divided by 20.

scaled_flow_gt /= tf.cast(gt_height / lvl_height, dtype=tf.float32)

Any help is welcomed! Thank you!

@lelelexxx
Copy link

Yep, I have the same question.
In my opinion, If we dont divide the ground truth by 20.0, and scale the ground truth into different scale level, then we should not scale the estimated optical flow by [0.625, 1.25, 2.5, 5.0] .
Or, if the estimated optical flow is scaled by [0.625, 1.25, 2.5, 5.0], then the ground truth must be devided by 20.0, And the supervision signal for each optical flow do not need further scaled by gt_height / lvl_height.

So, I thought it is clearly wrong in this repo at this point ~

@lelelexxx
Copy link

Yep, I have the same question.
In my opinion, If we dont divide the ground truth by 20.0, and scale the ground truth into different scale level, then we should not scale the estimated optical flow by [0.625, 1.25, 2.5, 5.0] .
Or, if the estimated optical flow is scaled by [0.625, 1.25, 2.5, 5.0], then the ground truth must be devided by 20.0, And the supervision signal for each optical flow do not need further scaled by gt_height / lvl_height.

So, I thought it is clearly wrong in this repo at this point ~

Sorry, It's my bad, I thought it is ok, NOT A WRONG IMPLEMENTATION ! Apology for the wrong comment above.
As far as I concern, When warping for correlation module, pwcnet use transpose_conv to upsample the low level optical flow to bigger high level optical flow, that means this upsampling procedure is learnable, which can auto adjust to wanted scales. As there is a transpose_conv between the supervision signal--optical flow gt and warping optical flow, the supervision signal and optical flow for warping is separated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants