No Arabic abstract
Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation structure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utterances to decide whether one utterance is the parent of the other. We believe the entire ancestral history is a very important information source to make accurate prediction. Therefore, we design a novel masking mechanism to guide the ancestor flow, and leverage the transformer model to aggregate all ancestors to predict parent utterances. Our experiments are performed on the Reddit dataset (Zhang, Culbertson, and Paritosh 2017) and the Ubuntu IRC dataset (Kummerfeld et al. 2019). In addition, we also report experiments on a new larger corpus from the Reddit platform and release this dataset. We show that the proposed model, that takes into account the ancestral history of the conversation, significantly outperforms several strong baselines including the BERT model on all datasets
Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.
This paper describes our approach to DSTC 9 Track 2: Cross-lingual Multi-domain Dialog State Tracking, the task goal is to build a Cross-lingual dialog state tracker with a training set in rich resource language and a testing set in low resource language. We formulate a method for joint learning of slot operation classification task and state tracking task respectively. Furthermore, we design a novel mask mechanism for fusing contextual information about dialogue, the results show the proposed model achieves excellent performance on DSTC Challenge II with a joint accuracy of 62.37% and 23.96% in MultiWOZ(en - zh) dataset and CrossWOZ(zh - en) dataset, respectively.
Numerous online conversations are produced on a daily basis, resulting in a pressing need to conversation understanding. As a basis to structure a discussion, we identify the responding relations in the conversation discourse, which link response utterances to their initiations. To figure out who responded to whom, here we explore how the consistency of topic contents and dependency of discourse roles indicate such interactions, whereas most prior work ignore the effects of latent factors underlying word occurrences. We propose a model to learn latent topics and discourse in word distributions, and predict pairwise initiation-response links via exploiting topic consistency and discourse dependency. Experimental results on both English and Chinese conversations show that our model significantly outperforms the previous state of the arts, such as 79 vs. 73 MRR on Chinese customer service dialogues. We further probe into our outputs and shed light on how topics and discourse indicate conversational user interactions.
For centuries extremely-long grazing fireball displays have fascinated observers and inspired people to ponder about their origins. The Desert Fireball Network (DFN) is the largest single fireball network in the world, covering about one third of Australian skies. This expansive size has enabled us to capture a majority of the atmospheric trajectory of a spectacular grazing event that lasted over90 seconds, penetrated as deep as ~58.5km, and traveled over 1,300 km through the atmosphere before exiting back into interplanetary space. Based on our triangulation and dynamic analyses of the event, we have estimated the initial mass to be at least 60 kg, which would correspond to a30 cm object given a chondritic density (3500 kg m-3). However, this initial mass estimate is likely a lower bound, considering the minimal deceleration observed in the luminous phase. The most intriguing quality of this close encounter is that the meteoroid originated from an Apollo-type orbit and was inserted into a Jupiter-family comet (JFC) orbit due to the net energy gained during the close encounter with the Earth. Based on numerical simulations, the meteoroid will likely spend ~200kyrs on a JFC orbit and have numerous encounters with Jupiter, the first of which will occur in January-March 2025. Eventually the meteoroid will likely be ejected from the Solar System or be flung into a trans-Neptunian orbit.
Discourse signals are often implicit, leaving it up to the interpreter to draw the required inferences. At the same time, discourse is embedded in a social context, meaning that interpreters apply their own assumptions and beliefs when resolving these inferences, leading to multiple, valid interpretations. However, current discourse data and frameworks ignore the social aspect, expecting only a single ground truth. We present the first discourse dataset with multiple and subjective interpretations of English conversation in the form of perceived conversation acts and intents. We carefully analyze our dataset and create computational models to (1) confirm our hypothesis that taking into account the bias of the interpreters leads to better predictions of the interpretations, (2) and show disagreements are nuanced and require a deeper understanding of the different contextual factors. We share our dataset and code at http://github.com/elisaF/subjective_discourse.