AdaIN with MediaPipe

This is a work in progress – the Android app needs some UI work, and I'm still training the model.

Tensorflow implementation

I've been experimenting with MediaPipe for the last few months and my latest project has been an Android MediaPipe app[1] with a custom implementation of Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization.

Among other things, I've optimised both the encoder and the decoder that were used in the paper, I have used some additional loss function described in other papers to hopefully increase the output quality (although I haven't had the chance to test it with/without and test the output quality).

Android

While the video is a standard stream, the image with which to stylise the video had to be a lower rate stream. My initial idea was to use a side-packet, but it unfortunately did not work as I wasn't able to asynchronously initiate a side packet. Instead, any input value that changes over time needs to be a stream. Below is the code to take an image and asynchronously inject it into the mediapipe graph.

@Override public void onCaptureSuccess(ImageProxy image, int rotationDegrees) { AndroidPacketCreator packetCreator = processor.getPacketCreator(); Map<String, Packet> inputSidePackets = new HashMap<>(); ByteBuffer buffer = image.getPlanes()[0].getBuffer(); byte[] bytes = new byte[buffer.remaining()]; buffer.get(bytes); Bitmap bmp = BitmapFactory.decodeByteArray(bytes, 0, bytes.length, null); long timestamp = image.getTimestamp(); image.close(); try { Packet packet = packetCreator.createRgbImageFrame(bmp); processor.getGraph().addConsumablePacketToInputStream("input_image", packet, timestamp); packet = null; } catch (MediaPipeException e) { // TODO: do not suppress exceptions here! throw e; } }

MediaPipe

For a faster inference, I've split the model into two sub-models: an encoder model and a decoder model. The decoder model expects a vector of tensors at the same time. But as I mentioned above, the video stream operates at a higher clock-rate than the style image.

This took me a while to figure out, as there wasn't much documentation around it, but the trick was to use the SyncSetInputStreamHandler. By adding the following, the calculator was able to run as soon as either stream was ready.

input_stream_handler { input_stream_handler: "SyncSetInputStreamHandler" options { [mediapipe.SyncSetInputStreamHandlerOptions.ext] { sync_set { tag_index: "VIDEO" } sync_set { tag_index: "STYLE" } } } }

As to keep as much on the GPU and avoid the CPU, I created another calculator that accepts a std::vector <GlBuffer> and outputs straight to a GpuBuffer

[1]: One of the MediaPipe calculator that I've built is missing its Metal implementation for iOS. I'll look into this next.