No Arabic abstract
Deep learning has led to significant improvement in text summarization with various methods investigated and improved ROUGE scores reported over the years. However, gaps still exist between summaries produced by automatic summarizers and human professionals. Aiming to gain more understanding of summarization systems with respect to their strengths and limits on a fine-grained syntactic and semantic level, we consult the Multidimensional Quality Metric(MQM) and quantify 8 major sources of errors on 10 representative summarization models manually. Primarily, we find that 1) under similar settings, extractive summarizers are in general better than their abstractive counterparts thanks to strength in faithfulness and factual-consistency; 2) milestone techniques such as copy, coverage and hybrid extractive/abstractive methods do bring specific improvements but also demonstrate limitations; 3) pre-training techniques, and in particular sequence-to-sequence pre-training, are highly effective for improving text summarization, with BART giving the best results.
Anonymous peer review is used by the great majority of computer science conferences. OpenReview is such a platform that aims to promote openness in peer review process. The paper, (meta) reviews, rebuttals, and final decisions are all released to public. We collect 5,527 submissions and their 16,853 reviews from the OpenReview platform. We also collect these submissions citation data from Google Scholar and their non-peer-review
Despite almost all being acquired as photons, astronomical data from different instruments and at different stages in its life may exist in different formats to serve different purposes. Beyond the data itself, descriptive information is associated with it as metadata, either included in the data format or in a larger multi-format data structure. Those formats may be used for the acquisition, processing, exchange, and archiving of data. It has been useful to use similar formats, or even a single standard to ease interaction with data in its various stages using familiar tools. Knowledge of the evolution and advantages of present standards is useful before we discuss the future of how astronomical data is formatted. The evolution of the use of world coordinates in FITS is presented as an example.
Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Whilst lots of creative text augmentation methods have been designed, they augment the text in a non-selective manner, which means the less important or noisy words have the same chances to be augmented as the informative words, and thereby limits the performance of augmentation. In this work, we systematically summarize three kinds of role keywords, which have different functions for text classification, and design effective methods to extract them from the text. Based on these extracted role keywords, we propose STA (Selective Text Augmentation) to selectively augment the text, where the informative, class-indicating words are emphasized but the irrelevant or noisy words are diminished. Extensive experiments on four English and Chinese text classification benchmark datasets demonstrate that STA can substantially outperform the non-selective text augmentation methods.
I revisit two theories of cell differentiation in multicellular organisms published a half-century ago, Stuart Kauffmans global gene regulatory dynamics (GGRD) model and Roy Brittens and Eric Davidsons modular gene regulatory network (MGRN) model, in light of newer knowledge of mechanisms of gene regulation in the metazoans (animals). The two models continue to inform hypotheses and computational studies of differentiation of lineage-adjacent cell types. However, their shared notion (based on bacterial regulatory systems) of gene switches and networks built from them, have constrained progress in understanding the dynamics and evolution of differentiation. Recent work has described unique write-read-rewrite chromatin-based expression encoding in eukaryotes, as well metazoan-specific processes of gene activation and silencing in condensed-phase, enhancer-recruiting regulatory hubs, employing disordered proteins, including transcription factors, with context-dependent identities. These findings suggest an evolutionary scenario in which the origination of differentiation in animals, rather than depending exclusively on adaptive natural selection, emerged as a consequence of a type of multicellularity in which the novel metazoan gene regulatory apparatus was readily mobilized to amplify and exaggerate inherent cell functions of unicellular ancestors. The plausibility of this hypothesis is illustrated by the evolution of the developmental role of Grainyhead-like in the formation of epithelium.
A year after emph{Fermi} was launched, the number of known gamma-ray pulsars has increased dramatically. For the first time, a sizable population of pulsars has been discovered in gamma-ray data alone. For the first time, millisecond pulsars have been confirmed as powerful sources of gamma-ray emission, and a whole population of these objects is seen with the LAT. The remaining gamma-ray pulsars are young pulsars, discovered via an efficient collaboration with radio and X-ray telescopes. It is now clear that a large fraction of the nearby energetic pulsars are gamma-ray emitters, whose luminosity grows with the spin-down energy loss rate. Many previously unidentified EGRET sources turn out to be pulsars. Many of the detected pulsars are found to be powering pulsar wind nebulae, and some are associated with TeV sources. The emph{Fermi} LAT is expected to detect more pulsars in gamma rays in the coming years, while multi-wavelength follow ups should detect emph{Fermi}-discovered pulsars. The data already revealed that gamma-ray pulsars generally emit fan-like beams sweeping over a large fraction of the sky and produced in the outer magnetosphere.