Abstract
Accuracy in detecting the pose of an individual is one of the time-honored problems of computer vision. Due to its complexity to detect poses several scholars in this field have been attracted to research more about it that will help them assist to work in building models related to this domain. We can train ML models to locate the positions of limbs and generate model to accomplish intelligent surveillance with uncharacteristic behavior detection or to investigate in a medical treatment and a lot more, by only knowing the poses of a human. We can animate digital and analyze poses by using the data that we get through motion retargeting enabling actions performed by humans. Related to my study, by implementing GAN, a lot of work has been done to generate images that is a product of fused images using identity and shape of the input. There has therefore been a lot of exploration in this field for images rather than videos. Convolution neural network has helped to improvise in estimating the accuracy pose of humans. However, not much experiments have been done to make this detection more efficient on edge devices. Tiny development boards like the Raspberry Pi or Ardiuno have been enablers for a lot of recent startups. They inspire many knock-off projects and have been key to the researchers due to their speed, reliability and scalability. Although, these boards are limited in their compute power. Another edge device, Nvidia’s Jetson TX2, is designed to work with work intensive AI models. In this project, I have implemented a motion re-targeting implementation where given a video of a source subject, the pose of that source is transferred onto the target which is trained with minimal poses. I have used the capability of the image-to image translation to divide the video into frames, estimate the pose of the source frame and then transfer the pose to the target frame and then combining the frames back to create a motion retargeted video. The model achieved is then trained on a 1660Ti GPU and further tested on an edge device, namely the Nvidia Jetson TX2 leading to conquer the computational abilities of it. In addition, I have also implemented a pruned version of the Open Pose model. The model optimizes the idea of OpenPose to have the similar inference on CPU or edge devices. The goal was to utilize the power of the edge device and share the workload from the device that trains the model to the edge device. I have used the capability of the image-to-image translation to divide the video into frames, estimate the pose of the source frame and then transfer the pose to the target frame and then combining the frames back to create a motion retargeted video. The simulated motions from the source video look appropriate to the character’s anatomy in the target thus proving the Jetson TX2 to be capable for the use of edge-machine learning.