VIDEO GENERATION OF THE UNDERLYING SURFACE BASED ON A SINGLE IMAGE
Abstract and keywords
Abstract:
Video generation is one of the most urgent and challenging tasks in the field of artificial intelligence and computer vision. Solution of this task opens up wide opportunities for creative industries, business, education, and marketing. However, generating long-lasting, semantically coherent high-resolution video remains an unsolved problem. This explains necessity to create new models, as well as research existing models. This article provides a comparative analysis of the main methods of video generation: variational autoencoders (VAE), generative-adversarial networks (GAN), autoregressive, flow-based and diffusion models. Their key architectural features, advantages and disadvantages are considered. Special attention is paid to diffusion models, which are currently an advanced approach for solving the problem of video generation. Over the past few years, a huge number of diffusion video generation models have appeared, among which the most famous are Sora (OpenAI), Gen-3 (Runway), Kandinsky (Sber AI), Stable Video Diffusion (Stability AI). However, most of them are closed, commercial products, the source code and architecture of which are inaccessible for research and modification. The open source Stable Video Diffusion model is used to implement the generation. The practical part of the study includes video generation based on the original image of the underlying surface, as well as analysis of the result. The generated video sequences can be used to simulate various flight scenarios and expand datasets for unmanned aerial vehicles (UAVs). During the analysis of the quality of the generated video, it was revealed that additional processing of the sequence of frames after 10-13 frames of the video is required to prevent the accumulation of artifacts and generation errors. The analysis was carried out using a set of metrics reflecting changes in the color characteristics and texture of the generated video.

Keywords:
NEURAL NETWORKS, VIDEO GENERATION, IMAGE PROCESSING, FRAME, DIFFUSION MODELS, INTERPOLATION, COLOR CHARACTERISTICS
Text
Text (PDF): Read Download
Login or Create
* Forgot password?