G_prompt for cut_turbo for dataset with single prompt#662
Conversation
| self.fake_B = self.netG_A(self.real_with_z, G_prompt) | ||
| else: | ||
| fake_B = self.netG_A(real_A_with_z) | ||
| self.fake_B = self.netG_A(self.real_with_z) |
|
|
||
| # match batch size | ||
| captions_enc = caption_enc.repeat(x.shape[0], 1, 1) | ||
| batch_size = caption_enc.shape[0] |
There was a problem hiding this comment.
inside cut_model, x is created by cat real_A and real_B, which double the batch size of the x tensor. prompt tensor has normal batch_size, so, to match the two tensor, I did this modification. Detail toy example is here: https://colab.research.google.com/drive/1RMvHt2PuQufH4zEc2Lrds561L9NEYzYf?usp=sharing
There was a problem hiding this comment.
This should be fixed by modifying the prompt tensor outside turbo, in cut when A & B are concatenated for inference, not here.
| "D_lr": 0.0001, | ||
| "G_ema": false, | ||
| "G_ema_beta": 0.999, | ||
| "G_lr": 0.0002, |
| @@ -201,17 +201,14 @@ def forward(self, x, prompt): | |||
| ).input_ids.cuda() | |||
| caption_enc = self.text_encoder(caption_tokens)[0] | |||
There was a problem hiding this comment.
with refs:
1.https://huggingface.co/transformers/v4.8.0/model_doc/clip.html#flaxcliptextmodel
2.https://github.com/huggingface/transformers/blob/f91c16d270e5e3ff32fdb32ccf286d05c03dfa66/src/transformers/models/clip/modeling_clip.py#L759
"outputs= self.text_encoder(caption_tokens)"
type(outputs)= text_encoder <class 'transformers.modeling_outputs.BaseModelOutputWithPooling'>
len(outputs) = 2
outputs[0].shape = torch.Size([4, 77, 1024]) this is last_hidden_state
outputs[1].shape = torch.Size([4, 1024]) this is the pooler_output
According to the explication of refs 1, should be outputs[0].
add G_prompt for cut_turbo for unaligned dataset and works for batch_size larger than 1
The training works with the following command line
The inference works with the following command line