dev-resources.site
for different kinds of informations.
Assorted Neural Style Transfer: An extension of Vanilla NST
Neural Style Transfer explores methods for artistic style transfer based on Convolutional Neural Networks. The core idea proposed by Gatys et al. became very popular, and with further research, Johnson et al. overcame a significant limitation to achieve style transfer in real-time.
This article uses the VGG16 model for manual implementation of Neural Style Transfer. Another implementation is based on an article by TensorFlow, which uses a pre-trained model for NST.
So, what is Assorted Neural Style Transfer? (Yes, I came up with that name myself π )
Well, as we know, NST combines the style of Style Image with the content of Content Image as follows:
We propose the Assorted NST, which combines the style of 3 Style Images with the content of Content Image. Below are a few examples:
We not only combine the three styles, but we can also control how much weight to give to which style. The above output was generated with weights [0.3, 0.3, 0.4]. The weights [0.1, 0.1, 0.8] (where style 3 has more weight) will give the following output:
So, how does the Assorted NST work??
It's pretty straightforward. Instead of giving the model a single style image as input, we take the weighted combination of all three style images and feed that to the model. Before taking the weighted combination, we resize the style images to have the exact dimensions.
In this way, the model can extract the style of the corresponding final image, which can be used for final image generation.
The above Assorted NST example is based on the TF Hub's model, while below are some examples of manual Assorted NST implementation:
Few Limitations of this method:
- For each set of content and style images, we have to do fine variations in weight values for the output to be better. It is impossible to have a fixed set of weights that work on all images. If the weights are not proper, then the outcome might be invalid like below:
- The time taken for output image generation is almost 8 seconds per iteration in manual implementation. We need at least ten iterations to get valid output. This can further be reduced using an end-to-end CNN model explicitly built for NST as introduced in Johnson et al. (which is used in TFHub implementation).
Thanks for reading!
References:
Featured ones: