No Arabic abstract
The comprehension of business process models is crucial for enterprises. Prior research has shown that children as well as adolescents perceive and interpret graphical representations in a different manner compared to grown-ups. To evaluate this, observations in the context of business process models are presented in this paper obtained from a study on visual literacy in cultural education. We demonstrate that adolescents without expertise in process model comprehension are able to correctly interpret business process models expressed in terms of BPMN 2.0. In a comprehensive study, n = 205 learners (i.e., pupils at the age of 15) needed to answer questions related to process models they were confronted with, reflecting different levels of complexity. In addition, process models were created with varying styles of element labels. Study results indicate that an abstract description (i.e., using only alphabetic letters) of process models is understood more easily compared to concrete or pseudo} descriptions. As benchmark, results are compared with the ones of modeling experts (n = 40). Amongst others, study findings suggest using abstract descriptions in order to introduce novices to process modeling notations. With the obtained insights, we highlight that process models can be properly comprehended by novices.
Data scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, a framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to distill and share their domain knowledge using domain concept extractors and five types of label justification over a representative data sample. The design of Ziva is informed by preliminary interviews with data scientists, in order to understand current practices of domain knowledge acquisition process for ML development projects. To assess our design, we run a mix-method case-study to evaluate how Ziva can facilitate interaction of domain experts and data scientists. Our results highlight that (1) domain experts are able to use Ziva to provide rich domain knowledge, while maintaining low mental load and stress levels; and (2) data scientists find Zivas output helpful for learning essential information about the domain, offering scalability of information, and lowering the burden on domain experts to share knowledge. We conclude this work by experimenting with building NLP models using the Ziva output by our case study.
Mixture of Experts (MoE) is a popular framework for modeling heterogeneity in data for regression, classification and clustering. For continuous data which we consider here in the context of regression and cluster analysis, MoE usually use normal experts, that is, expert components following the Gaussian distribution. However, for a set of data containing a group or groups of observations with asymmetric behavior, heavy tails or atypical observations, the use of normal experts may be unsuitable and can unduly affect the fit of the MoE model. In this paper, we introduce new non-normal mixture of experts (NNMoE) which can deal with these issues regarding possibly skewed, heavy-tailed data and with outliers. The proposed models are the skew-normal MoE and the robust $t$ MoE and skew $t$ MoE, respectively named SNMoE, TMoE and STMoE. We develop dedicated expectation-maximization (EM) and expectation conditional maximization (ECM) algorithms to estimate the parameters of the proposed models by monotonically maximizing the observed data log-likelihood. We describe how the presented models can be used in prediction and in model-based clustering of regression data. Numerical experiments carried out on simulated data show the effectiveness and the robustness of the proposed models in terms modeling non-linear regression functions as well as in model-based clustering. Then, to show their usefulness for practical applications, the proposed models are applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data.
We present a crowdsourcing workflow to collect image annotations for visually similar synthetic categories without requiring experts. In animals, there is a direct link between taxonomy and visual similarity: e.g. a collie (type of dog) looks more similar to other collies (e.g. smooth collie) than a greyhound (another type of dog). However, in synthetic categories such as cars, objects with similar taxonomy can have very different appearance: e.g. a 2011 Ford F-150 Supercrew-HD looks the same as a 2011 Ford F-150 Supercrew-LL but very different from a 2011 Ford F-150 Supercrew-SVT. We introduce a graph based crowdsourcing algorithm to automatically group visually indistinguishable objects together. Using our workflow, we label 712,430 images by ~1,000 Amazon Mechanical Turk workers; resulting in the largest fine-grained visual dataset reported to date with 2,657 categories of cars annotated at 1/20th the cost of hiring experts.
Molecular packing, crystallinity, and texture of semiconducting polymers are often critical to performance. Although frame-works exist to quantify the ordering, interpretations are often just qualitative, resulting in imprecise and liberal use of terminology. Here, we reemphasize the continuity of the degree of molecular ordering and advocate that a more nuanced and consistent terminology is used with regards to crystallinity, semicyrstallinity, paracrystallinity, crystallite/aggregate, and related characteristics. We are motivated in part by our own imprecise and inconsistent use of terminology and the need to have a primer or tutorial reference to teach new group members. We show that a deeper understanding can be achieved by combining grazing-incidence wide-angle X-ray scattering and differential scanning calorimetry. We classify a broad range of representative polymers into four proposed categories based on the quantitative analysis of molecular order based on the paracrystalline disorder parameter (g). A small database is presented for over 10 representative conjugated and insulating polymers ranging from amorphous to semicrystalline. Finally, we outline the challenges to rationally design perfect polymer crystals and propose a new molecular design approach that envisions conceptual molecular grafting that is akin to strained and unstrained hetero-epitaxy in classic (compound) semiconductors thin film growth.
Matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements. Matching problems were traditionally solved in a semi-automatic manner, with correspondences being generated by matching algorithms and outcomes subsequently validated by human experts. Human-in-the-loop data integration has been recently challenged by the introduction of big data and recent studies have analyzed obstacles to effective human matching and validation. In this work we characterize human matching experts, those humans whose proposed correspondences can mostly be trusted to be valid. We provide a novel framework for characterizing matching experts that, accompanied with a novel set of features, can be used to identify reliable and valuable human experts. We demonstrate the usefulness of our approach using an extensive empirical evaluation. In particular, we show that our approach can improve matching results by filtering out inexpert matchers.