ترغب بنشر مسار تعليمي؟ اضغط هنا

Task-oriented conversational systems often use dialogue state tracking to represent the users intentions, which involves filling in values of pre-defined slots. Many approaches have been proposed, often using task-specific architectures with special- purpose classifiers. Recently, good results have been obtained using more general architectures based on pretrained language models. Here, we introduce a new variation of the language modeling approach that uses schema-driven prompting to provide task-aware history encoding that is used for both categorical and non-categorical slots. We further improve performance by augmenting the prompting with schema descriptions, a naturally occurring source of in-domain knowledge. Our purely generative system achieves state-of-the-art performance on MultiWOZ 2.2 and achieves competitive performance on two other benchmarks: MultiWOZ 2.1 and M2M. The data and code will be available at https://github.com/chiahsuan156/DST-as-Prompting.
Learning to compare two objects are essential in applications, such as digital forensics, face recognition, and brain network analysis, especially when labeled data is scarce and imbalanced. As these applications make high-stake decisions and involve societal values like fairness and transparency, it is critical to explain the learned models. We aim to study post-hoc explanations of Siamese networks (SN) widely used in learning to compare. We characterize the instability of gradient-based explanations due to the additional compared object in SN, in contrast to architectures with a single input instance. We propose an optimization framework that derives global invariance from unlabeled data using self-learning to promote the stability of local explanations tailored for specific query-reference pairs. The optimization problems can be solved using gradient descent-ascent (GDA) for constrained optimization, or SGD for KL-divergence regularized unconstrained optimization, with convergence proofs, especially when the objective functions are nonconvex due to the Siamese architecture. Quantitative results and case studies on tabular and graph data from neuroscience and chemical engineering show that the framework respects the self-learned invariance while robustly optimizing the faithfulness and simplicity of the explanation. We further demonstrate the convergence of GDA experimentally.
Event mentions in text correspond to real-world events of varying degrees of granularity. The task of subevent detection aims to resolve this granularity issue, recognizing the membership of multi-granular events in event complexes. Since knowing the span of descriptive contexts of event complexes helps infer the membership of events, we propose the task of event-based text segmentation (EventSeg) as an auxiliary task to improve the learning for subevent detection. To bridge the two tasks together, we propose an approach to learning and enforcing constraints that capture dependencies between subevent detection and EventSeg prediction, as well as guiding the model to make globally consistent inference. Specifically, we adopt Rectifier Networks for constraint learning and then convert the learned constraints to a regularization term in the loss function of the neural model. Experimental results show that the proposed method outperforms baseline methods by 2.3% and 2.5% on benchmark datasets for subevent detection, HiEve and IC, respectively, while achieving a decent performance on EventSeg prediction.
We report on the nonlinear temporal compression of mJ energy pulses from a Ti:Sa chirped pulse amplifier system in a multipass cell filled with argon. The pulses are compressed from 30 fs down to 5.3 fs, corresponding to two optical cycles. The post- compressed beam exhibits excellent spatial quality and homogeneity. These results pave the way to robust and energy-scalable compression of Ti:Sa pulses down to the few-cycle regime.
In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-s et architecture (ISA). Existing algorithms are all heuristics-based, and the well-known IPC-driven approach essentially tries to schedule high scaling factor processes on big cores. Our analysis shows that, for optimum solutions, it is also critical to consider placing long running processes on big cores. Tests of SPEC 2006 cases on various big-small core combinations show that our proposed optimum approach is up to 34% faster than the IPC-driven heuristic approach in terms of total workload completion time. The complexity of our algorithm is O(NlogN) where N is the number of processes. Therefore, the proposed optimum algorithm is practical for use.
While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from this limitation. To alleviate this limitation, in this paper, we leverage the syn thetic data introduced by zero-shot quantization with calibration dataset and we propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization. The method is based on two important properties of batch normalization statistics (BNS) we observed in deep layers of the trained network, i.e., inter-class separation and intra-class incohesion. To preserve this fine-grained distribution information: 1) We calculate the per-class BNS of the calibration dataset as the BNS centers of each class and propose a BNS-centralized loss to force the synthetic data distributions of different classes to be close to their own centers. 2) We add Gaussian noise into the centers to imitate the incohesion and propose a BNS-distorted loss to force the synthetic data distribution of the same class to be close to the distorted centers. By introducing these two fine-grained losses, our method shows the state-of-the-art performance on ImageNet, especially when the first and last layers are quantized to low-bit as well. Our project is available at https://github.com/viperit/FDDA.
134 - Yejiong Zhu , Hao Chen 2021
We provide sufficient conditions for the asymptotic normality of the generalized correlation coefficient $sum a_{ij}b_{ij}$ under the permutation null distribution when $a_{ij}$s are symmetric and $b_{ij}$s are symmetric.
275 - Hao Chen 2021
The list-decodable code has been an active topic in theoretical computer science since the seminal papers of M. Sudan and V. Guruswami in 1997-1998. There are general result about the Johnson radius and the list-decoding capacity theorem for random c odes. However few results about general constraints on rates, list-decodable radius and list sizes for list-decodable codes have been obtained. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new simple but strong upper bounds for list-decodable codes based on various covering codes. Then any good upper bound on the covering radius imply a good upper bound on the size of list-decodable codes. Hence the list-decodablity of codes is a strong constraint from the view of covering codes. Our covering code upper bounds for $(d,1)$ list decodable codes give highly non-trivial upper bounds on the sizes of codes with the given minimum Hamming distances. Our results give exponential improvements on the recent generalized Singleton upper bound of Shangguan and Tamo in STOC 2020, when the code lengths are very large. The asymptotic forms of covering code bounds can partially recover the list-decoding capacity theorem, the Blinovsky bound and the combinatorial bound of Guruswami-H{aa}stad-Sudan-Zuckerman. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes.
80 - Yusen Wu , Hao Chen , Xin Wang 2021
Adversarial attacks attempt to disrupt the training, retraining and utilizing of artificial intelligence and machine learning models in large-scale distributed machine learning systems. This causes security risks on its prediction outcome. For exampl e, attackers attempt to poison the model by either presenting inaccurate misrepresentative data or altering the models parameters. In addition, Byzantine faults including software, hardware, network issues occur in distributed systems which also lead to a negative impact on the prediction outcome. In this paper, we propose a novel distributed training algorithm, partial synchronous stochastic gradient descent (ParSGD), which defends adversarial attacks and/or tolerates Byzantine faults. We demonstrate the effectiveness of our algorithm under three common adversarial attacks again the ML models and a Byzantine fault during the training phase. Our results show that using ParSGD, ML models can still produce accurate predictions as if it is not being attacked nor having failures at all when almost half of the nodes are being compromised or failed. We will report the experimental evaluations of ParSGD in comparison with other algorithms.
142 - Chao Chen , Wei Chen , Di Zhao 2021
In this paper, we introduce an angle notion, called the singular angle, for stable nonlinear systems from an input-output perspective. The proposed system singular angle, based on the angle between $mathcal{L}_2$-signals, describes an upper bound for the rotating effect from the system input to output signals. It is, thus, different from the recently appeared nonlinear system phase which adopts the complexification of real-valued signals using the Hilbert transform. It can quantify the passivity and serve as an angular counterpart to the system $mathcal{L}_2$-gain. It also provides an alternative to the nonlinear system phase. A nonlinear small angle theorem, which involves a comparison of the loop system angle with $pi$, is established for feedback stability analysis. When dealing with multi-input multi-output linear time-invariant (LTI) systems, we further come up with the frequency-wise and $mathcal{H}_infty$ singular angle notions based on the matrix singular angle, and develop corresponding LTI small angle theorems.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا