• Platinum Pass Platinum Pass
  • Full Conference Pass Full Conference Pass
  • Full Conference One-Day Pass Full Conference One-Day Pass

Date: Monday, November 18th
Time: 4:15pm - 4:36pm
Venue: Plaza Meeting Room P2


Speaker(s):

Abstract: From angling smiles to duck faces, all kinds of facial expressions can be seen in selfies, portraits, and Internet pictures. These photos are taken from various camera types, and under a vast range of angles and lighting conditions. We present a deep learning framework that can fully normalize unconstrained face images, i.e., remove perspective distortions, relight to an evenly lit environment, and predict a frontal and neutral face. Our method can produce a high resolution image while preserving important facial details and the likeness of the subject, along with the original background. We divide this ill-posed problem into three consecutive normalization steps, each using a different generative adversarial network that acts as an image generator. Perspective distortion removal is performed using a dense flow field predictor. A uniformly illuminated face is obtained using a lighting translation network, and the facial expression is neutralized using a generalized facial expression synthesis framework combined with a regression network based on deep features for facial recognition. We introduce new data representations for conditional inference, as well as training methods for supervised learning to ensure that different expressions of the same person can yield to not only a plausible but also a similar neutral face. We demonstrate our results on a wide range of challenging images collected in the wild. Key applications of our method range from robust image-based 3D avatar creation, portrait manipulation, to facial enhancement and reconstruction tasks for crime investigation. We also found through an extensive user study, that our normalization results can be hardly distinguished from ground truth ones if the person is not familiar.

Speaker(s) Bio:

Date: Monday, November 18th
Time: 4:36pm - 4:57pm
Venue: Plaza Meeting Room P2


Speaker(s):

Abstract: The Ken Burns effect allows animating still images with a virtual camera scan and zoom. Adding parallax, which results in the 3D Ken Burns effect, enables significantly more compelling results. Creating such effects manually is time-consuming and demands sophisticated editing skills. Existing automatic methods, however, require multiple input images from varying viewpoints. In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks. To address the limitations of existing depth estimation methods such as geometric distortions, semantic distortions, and inaccurate depth boundaries, we develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud. Experiments with a wide variety of image content show that our method enables realistic synthesis results. Our study demonstrates that our system allows users to achieve better results while requiring little effort compared to existing solutions for the 3D Ken Burns effect creation.

Speaker(s) Bio:

Date: Monday, November 18th
Time: 4:57pm - 5:18pm
Venue: Plaza Meeting Room P2


Speaker(s):

Abstract: Automatic generation of artistic glyph images is a challenging task that attracts many research interests. Previous methods either are specifically designed for shape synthesis or focus on texture transfer. In this paper, we propose a novel model, AGIS-Net, to transfer both shape and texture styles in one-stage with only a few stylized samples. To achieve this goal, we first disentangle the representations for content and style by using two encoders, ensuring the multi-content and multi-style generation. Then we utilize two collaboratively working decoders to generate the glyph shape image and its texture image simultaneously. In addition, we introduce a local texture refinement loss to further improve the quality of the synthesized textures. In this manner, our one-stage model is much more efficient and effective than other multi-stage stacked methods. We also propose a large-scale dataset with Chinese glyph images in various shape and texture styles, rendered from 35 professional-designed artistic fonts with 7,326 characters and 2,460 synthetic artistic fonts with 639 characters, to validate the effectiveness and extendability of our method. Extensive experiments on both English and Chinese artistic glyph image datasets demonstrate the superiority of our model in generating high-quality stylized glyph images against other state-of-the-art methods.

Speaker(s) Bio:

Date: Monday, November 18th
Time: 5:18pm - 5:39pm
Venue: Plaza Meeting Room P2


Speaker(s):

Abstract: Procedural textures are powerful tools that have been used in graphics for decades. In contrast to the alternative exemplar-based texture synthesis techniques, procedural textures provide user control and fast texture generation with low-storage cost and unlimited texture resolution. However, creating procedural models for complex textures requires a time-consuming process of selecting a combination of procedures and parameters. We present an example-based framework to automatically select procedural models and estimate parameters. In our framework, we consider textures categorized by commonly used high level classes. For each high level class we build a data-driven inverse modeling system based on an extensive collection of real-world textures and procedural texture models in the form of node graphs. We use unsupervised learning on collected real-world images in a texture class to learn sub-classes. We then classify the output of each of the collected procedural models into these sub-classes. For each of the collected models we train a convolutional neural network (CNN) to learn the parameters to produce a specific output texture. To use our framework, a user provides an exemplar texture image within a high level class. The system first classifies the texture into a sub-class, and selects the procedural models that produce output in that sub-class. The pre-trained CNNs of the selected models are used to estimate the parameters of the texture example. With the predicted parameters, the system can generate appropriate procedural textures for the user. The user can easily edit the textures by adjusting the node graph parameters. In a last optional step, style transfer augmentation can be applied to the fitted procedural textures to recover details lost in the procedural modeling process. We demonstrate our framework for four high level classes and show that our inverse modeling system can produce high-quality procedural textures for both structural and non-structural textures.

Speaker(s) Bio:

Date: Monday, November 18th
Time: 5:39pm - 6:00pm
Venue: Plaza Meeting Room P2


Speaker(s):

Abstract: We introduce a novel approach for synthesizing realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character following the reading flow. It adopts a cascading strategy to synthesize speeches in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, the input comic page is analyzed to identify the gender and age of the characters, as well as texts each character speaks and corresponding emotion. Guided by this analysis, in the second stage, our approach synthesizes realistic speeches for each character, which are consistent with the visual observations. Our experiments show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple sample comics validate the efficacy of our approach.

Speaker(s) Bio:

Back