No Arabic abstract
As the success of deep models has led to their deployment in all areas of computer vision, it is increasingly important to understand how these representations work and what they are capturing. In this paper, we shed light on deep spatiotemporal representations by visualizing what two-stream models have learned in order to recognize actions in video. We show that local detectors for appearance and motion objects arise to form distributed representations for recognizing human actions. Key observations include the following. First, cross-stream fusion enables the learning of true spatiotemporal features rather than simply separate appearance and motion features. Second, the networks can learn local representations that are highly class specific, but also generic representations that can serve a range of classes. Third, throughout the hierarchy of the network, features become more abstract and show increasing invariance to aspects of the data that are unimportant to desired distinctions (e.g. motion patterns across various speeds). Fourth, visualizations can be used not only to shed light on learned representations, but also to reveal idiosyncracies of training data and to explain failure cases of the system.
Anonymous peer review is used by the great majority of computer science conferences. OpenReview is such a platform that aims to promote openness in peer review process. The paper, (meta) reviews, rebuttals, and final decisions are all released to public. We collect 5,527 submissions and their 16,853 reviews from the OpenReview platform. We also collect these submissions citation data from Google Scholar and their non-peer-review
I revisit two theories of cell differentiation in multicellular organisms published a half-century ago, Stuart Kauffmans global gene regulatory dynamics (GGRD) model and Roy Brittens and Eric Davidsons modular gene regulatory network (MGRN) model, in light of newer knowledge of mechanisms of gene regulation in the metazoans (animals). The two models continue to inform hypotheses and computational studies of differentiation of lineage-adjacent cell types. However, their shared notion (based on bacterial regulatory systems) of gene switches and networks built from them, have constrained progress in understanding the dynamics and evolution of differentiation. Recent work has described unique write-read-rewrite chromatin-based expression encoding in eukaryotes, as well metazoan-specific processes of gene activation and silencing in condensed-phase, enhancer-recruiting regulatory hubs, employing disordered proteins, including transcription factors, with context-dependent identities. These findings suggest an evolutionary scenario in which the origination of differentiation in animals, rather than depending exclusively on adaptive natural selection, emerged as a consequence of a type of multicellularity in which the novel metazoan gene regulatory apparatus was readily mobilized to amplify and exaggerate inherent cell functions of unicellular ancestors. The plausibility of this hypothesis is illustrated by the evolution of the developmental role of Grainyhead-like in the formation of epithelium.
In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry out in-depth comparative analysis to better understand the differences between these approaches and the progress made by them. To this end, we develop an unified framework for both 2D-CNN and 3D-CNN action models, which enables us to remove bells and whistles and provides a common ground for fair comparison. We then conduct an effort towards a large-scale analysis involving over 300 action recognition models. Our comprehensive analysis reveals that a) a significant leap is made in efficiency for action recognition, but not in accuracy; b) 2D-CNN and 3D-CNN models behave similarly in terms of spatio-temporal representation abilities and transferability. Our codes are available at https://github.com/IBM/action-recognition-pytorch.
Nearly 50 years ago, in the proceedings of the first IAU symposium on planetary nebulae, Lawrence H. Aller and Stanley J. Czyzak said that the problem of determination of the chemical compositions of planetary and other gaseous nebulae constitutes one of the most exasperating problems in astrophysics. Although the situation has greatly improved over the years, many important problems are still open and new questions have arrived to the field, which still is an active field of study. Here I will review some of the main aspects related to the determination of gaseous abundances in PNe and some relevant results derived in the last five years, since the last IAU symposium on PNe.
We review the main results obtained from our seismic studies of B-type main sequence pulsators, based on the ground-based, MOST, Kepler and BRITE observations. Important constraints on stellar opacities, convective overshooting and rotation are derived. In each studied case, a significant modification of the opacity profile at the depths corresponding to the temperature range $log{T}in (5.0-5.5)$ is indispensable to explain all pulsational properties. In particular, a huge amount of opacity (at least 200%) at the depth of the temperature $log T = 5.46$ (the nickel opacity) has to be added in early B-type stellar models to account for low frequencies which correspond to high-order g modes. The values of the overshooting parameter, $alpha_{rm ov}$, from our seismic studies is below 0.3. In the case of a few stars, the deeper interiors have to rotate faster to get the g-mode instability in the whole observed range.