SketchOpt: Sketch-based Parametric Model Retrieval for Generative Design

92 0 0.0 ( 0 )

Download Cite

Added by Mohammad Keshavarzi

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Mohammad Keshavarzi - Clayton Hutson - Chin-Yi Cheng

Human-Computer Interaction

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Developing fully parametric building models for performance-based generative design tasks often requires proficiency in many advanced 3D modeling and visual programming, limiting its use for many building designers. Moreover, iterations of such models can be time-consuming tasks and sometimes limiting, as major changes in the layout design may result in remodeling the entire parametric definition. To address these challenges, we introduce a novel automated generative design system, which takes a basic floor plan sketch as an input and provides a parametric model prepared for multi-objective building optimization as output. Furthermore, the user-designer can assign various design variables for its desired building elements by using simple annotations in the drawing. The system would recognize the corresponding element and define variable constraints to prepare for a multi-objective optimization problem.

rate research

Towards Unsupervised Sketch-based Image Retrieval

136 - Conghui Hu , Yongxin Yang , Yunpeng Li 2021

Current supervised sketch-based image retrieval (SBIR) methods achieve excellent performance. However, the cost of data collection and labeling imposes an intractable barrier to practical deployment of real applications. In this paper, we present the first attempt at unsupervised SBIR to remove the labeling cost (category annotations and sketch-photo pairings) that is conventionally needed for training. Existing single-domain unsupervised representation learning methods perform poorly in this application, due to the unique cross-domain (sketch and photo) nature of the problem. We therefore introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment. Technically this is underpinned by exploiting joint distribution optimal transport (JDOT) to align data from different domains during representation learning, which we extend with trainable cluster prototypes and feature memory banks to further improve scalability and efficacy. Extensive experiments show that our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.

Computer Vision and Pattern Recognition

An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for Caregivers

123 - Lu Wang , Munif Ishad Mujib , Jake Williams 2021

With the advent of off-the-shelf intelligent home products and broader internet adoption, researchers increasingly explore smart computing applications that provide easier access to health and wellness resources. AI-based systems like chatbots have the potential to provide services that could provide mental health support. However, existing therapy chatbots are often retrieval-based, requiring users to respond with a constrained set of answers, which may not be appropriate given that such pre-determined inquiries may not reflect each patients unique circumstances. Generative-based approaches, such as the OpenAI GPT models, could allow for more dynamic conversations in therapy chatbot contexts than previous approaches. To investigate the generative-based models potential in therapy chatbot contexts, we built a chatbot using the GPT-2 model. We fine-tuned it with 306 therapy session transcripts between family caregivers of individuals with dementia and therapists conducting Problem Solving Therapy. We then evaluated the models pre-trained and the fine-tuned model in terms of basic qualities using three meta-information measurements: the proportion of non-word outputs, the length of response, and sentiment components. Results showed that: (1) the fine-tuned model created more non-word outputs than the pre-trained model; (2) the fine-tuned model generated outputs whose length was more similar to that of the therapists compared to the pre-trained model; (3) both the pre-trained model and fine-tuned model were likely to generate more negative and fewer positive outputs than the therapists. We discuss potential reasons for the problem, the implications, and solutions for developing therapy chatbots and call for investigations of the AI-based system application.

Human-Computer Interaction Computers and Society

Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval

248 - Zhipeng Wang , Hao Wang , Jiexi Yan 2021

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task, where abstract sketches are used as queries to retrieve natural images under zero-shot scenario. Most existing methods regard ZS-SBIR as a traditional classification problem and employ a cross-entropy or triplet-based loss to achieve retrieval, which neglect the problems of the domain gap between sketches and natural images and the large intra-class diversity in sketches. Toward this end, we propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR. Specifically, a cross-modal contrastive method is proposed to learn generalized representations to smooth the domain gap by mining relations with additional augmented samples. Furthermore, a category-specific memory bank with sketch features is explored to reduce intra-class diversity in the sketch domain. Extensive experiments demonstrate that our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets. Our source code is publicly available at https://github.com/haowang1992/DSN.

Computer Vision and Pattern Recognition

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

107 - Aneeshan Sain , Ayan Kumar Bhunia , Yongxin Yang 2021

Sketch-based image retrieval (SBIR) is a cross-modal matching problem which is typically solved by learning a joint embedding space where the semantic content shared between photo and sketch modalities are preserved. However, a fundamental challenge in SBIR has been largely ignored so far, that is, sketches are drawn by humans and considerable style variations exist amongst different users. An effective SBIR model needs to explicitly account for this style diversity, crucially, to generalise to unseen user styles. To this end, a novel style-agnostic SBIR model is proposed. Different from existing models, a cross-modal variational autoencoder (VAE) is employed to explicitly disentangle each sketch into a semantic content part shared with the corresponding photo, and a style part unique to the sketcher. Importantly, to make our model dynamically adaptable to any unseen user styles, we propose to meta-train our cross-modal VAE by adding two style-adaptive components: a set of feature transformation layers to its encoder and a regulariser to the disentangled semantic content latent code. With this meta-learning framework, our model can not only disentangle the cross-modal shared semantic content for SBIR, but can adapt the disentanglement to any unseen user style as well, making the SBIR model truly style-agnostic. Extensive experiments show that our style-agnostic model yields state-of-the-art performance for both category-level and instance-level SBIR.

Computer Vision and Pattern Recognition

Design Guidelines for Prompt Engineering Text-to-Image Generative Models

62 - Vivian Liu , Lydia B. Chilton 2021

Text-to-image generative models are a new and powerful way to generate visual artwork. The free-form nature of text as interaction is double-edged; while users have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt components and model parameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style and investigate success and failure modes within these dimensions. Our evaluation of 5493 generations over the course of five experiments spans 49 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people find better outcomes from text-to-image generative models.

Human-Computer Interaction