Design Guidelines for Prompt Engineering Text-to-Image Generative Models

63 0 0.0 ( 0 )

Download Cite

Added by Vivian Liu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Vivian Liu - Lydia B. Chilton

Human-Computer Interaction

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Text-to-image generative models are a new and powerful way to generate visual artwork. The free-form nature of text as interaction is double-edged; while users have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt components and model parameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style and investigate success and failure modes within these dimensions. Our evaluation of 5493 generations over the course of five experiments spans 49 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people find better outcomes from text-to-image generative models.

rate research

Generative Adversarial Text to Image Synthesis

173 - Scott Reed , Zeynep Akata , Xinchen Yan 2016

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image model- ing, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.

Neural and Evolutionary Computing Computer Vision and Pattern Recognition

Mixed-Reality Robotic Games: Design Guidelines for Effective Entertainment with Consumer Robots

73 - F. Gabriele Prattic`o , Fabrizio Lamberti 2020

In recent years, there has been an increasing interest in the use of robotic technology at home. A number of service robots appeared on the market, supporting customers in the execution of everyday tasks. Roughly at the same time, consumer level robots started to be used also as toys or gaming companions. However, gaming possibilities provided by current off-the-shelf robotic products are generally quite limited, and this fact makes them quickly loose their attractiveness. A way that has been proven capable to boost robotic gaming and related devices consists in creating playful experiences in which physical and digital elements are combined together using Mixed Reality technologies. However, these games differ significantly from digital- or physical only experiences, and new design principles are required to support developers in their creative work. This papers addresses such need, by drafting a set of guidelines which summarize developments carried out by the research community and their findings.

Human-Computer Interaction Graphics Robotics

Studying Visualization Guidelines According to Grounded Theory

91 - Alexandra Diehl n 2020

Visualization guidelines, if defined properly, are invaluable to both practical applications and the theoretical foundation of visualization. In this paper, we present a collection of research activities for studying visualization guidelines according to Grounded Theory (GT). We used the discourses at VisGuides, which is an online discussion forum for visualization guidelines, as the main data source for enabling data-driven research processes as advocated by the grounded theory methodology. We devised a categorization scheme focusing on observing how visualization guidelines were featured in different threads and posts at VisGuides, and coded all 248 posts between September 27, 2017 (when VisGuides was first launched) and March 13, 2019. To complement manual categorization and coding, we used text analysis and visualization to help reveal patterns that may have been missed by the manual effort and summary statistics. To facilitate theoretical sampling and negative case analysis, we made an in-depth analysis of the 148 posts (with both questions and replies) related to a student assignment of a visualization course. Inspired by two discussion threads at VisGuides, we conducted two controlled empirical studies to collect further data to validate specific visualization guidelines. Through these activities guided by grounded theory, we have obtained some new findings about visualization guidelines.

Human-Computer Interaction Graphics

SketchOpt: Sketch-based Parametric Model Retrieval for Generative Design

91 - Mohammad Keshavarzi , Clayton Hutson , Chin-Yi Cheng 2020

Developing fully parametric building models for performance-based generative design tasks often requires proficiency in many advanced 3D modeling and visual programming, limiting its use for many building designers. Moreover, iterations of such models can be time-consuming tasks and sometimes limiting, as major changes in the layout design may result in remodeling the entire parametric definition. To address these challenges, we introduce a novel automated generative design system, which takes a basic floor plan sketch as an input and provides a parametric model prepared for multi-objective building optimization as output. Furthermore, the user-designer can assign various design variables for its desired building elements by using simple annotations in the drawing. The system would recognize the corresponding element and define variable constraints to prepare for a multi-objective optimization problem.

Human-Computer Interaction

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

168 - Minfeng Zhu , Pingbo Pan , Wei Chen 2019

In this paper, we focus on generating realistic images from text descriptions. Current methods first generate an initial image with rough shape and color, and then refine the initial image to a high-resolution one. Most existing text-to-image synthesis methods have two main problems. (1) These methods depend heavily on the quality of the initial images. If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality. (2) Each word contributes a different level of importance when depicting different image contents, however, unchanged text representation is used in existing image refinement processes. In this paper, we propose the Dynamic Memory Generative Adversarial Network (DM-GAN) to generate high-quality images. The proposed method introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated. A memory writing gate is designed to select the important text information based on the initial image content, which enables our method to accurately generate images from the text description. We also utilize a response gate to adaptively fuse the information read from the memories and the image features. We evaluate the DM-GAN model on the Caltech-UCSD Birds 200 dataset and the Microsoft Common Objects in Context dataset. Experimental results demonstrate that our DM-GAN model performs favorably against the state-of-the-art approaches.

Computer Vision and Pattern Recognition