ترغب بنشر مسار تعليمي؟ اضغط هنا

Learning from image-text data has demonstrated recent success for many recognition tasks, yet is currently limited to visual features or individual visual concepts such as objects. In this paper, we propose one of the first methods that learn from im age-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create pseudo labels for learning scene graph. Further, we design a Transformer-based model to predict these pseudo labels via a masked token prediction task. Learning from only image-sentence pairs, our model achieves 30% relative gain over a latest method trained with human-annotated unlocalized scene graphs. Our model also shows strong results for weakly and fully supervised scene graph generation. In addition, we explore an open-vocabulary setting for detecting scene graphs, and present the first result for open-set scene graph generation. Our code is available at https://github.com/YiwuZhong/SGG_from_NLS.
171 - Jing Shi , Ning Xu , Yihang Xu 2021
Recently, language-guided global image editing draws increasing attention with growing application potentials. However, previous GAN-based methods are not only confined to domain-specific, low-resolution data but also lacking in interpretability. To overcome the collective difficulties, we develop a text-to-operation model to map the vague editing language request into a series of editing operations, e.g., change contrast, brightness, and saturation. Each operation is interpretable and differentiable. Furthermore, the only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions. Hence, we propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth. Comparison experiments on the newly collected MA5k-Req dataset and GIER dataset show the advantages of our methods. Code is available at https://jshi31.github.io/T2ONet.
Emission line galaxies (ELGs), more generally star-forming galaxies, are valuable tracers of large-scale structure and therefore main targets of upcoming wide-area spectroscopic galaxy surveys. We propose a fixed-aperture shape estimator of each ELG for extracting the intrinsic alignment (IA) signal, and assess the performance of the method using image simulations of ELGs generated from the IllustrisTNG simulation including observational effects such as the sky background noise. We show that our method enables a significant detection of the IA power spectrum with the linear-scale coefficient $A_{rm IA}simeq (13$--$15)pm 3.0$ up to $z=2$, even from the small simulation volume $sim0.009,(h^{-1}{rm Gpc})^3$, in contrast to the null detection with the standard method. Thus the ELG IA signal, measured with our method, opens up opportunities to exploit cosmology and galaxy physics in high-redshift universe.
Using photometric galaxies from the HSC survey, we measure the stellar mass density profiles for satellite galaxies as a function of the projected distance, $r_p$, to isolated central galaxies (ICGs) selected from SDSS/DR7 spectroscopic galaxies at $ zsim0.1$. By stacking HSC images, we also measure the projected stellar mass density profiles for ICGs and their stellar halos. The total mass distributions are further measured from HSC weak lensing signals. ICGs dominate within $sim$0.15 times the halo virial radius ($0.15R_{200}$). The stellar mass versus total mass fractions drop with the increase in $r_p$ up to $sim0.15R_{200}$, beyond which they are less than 1% while stay almost constant, indicating the radial distribution of satellites trace dark matter. The total stellar mass in satellites is proportional to the virial mass of the host halo, $M_{200}$, for ICGs more massive than $10^{10.5}M_odot$, i.e., $M_{ast,mathrm{sat}} propto M_{200}$, whereas the relation between the stellar mass of ICGs $+$ stellar halos and $M_{200}$ is close to $M_{ast,mathrm{ICG+diffuse}}propto M_{200}^{1/2}$. Below $10^{10.5}M_odot$, the change in $M_{200}$ is much slower with the decrease in $M_{ast,mathrm{ICG+diffuse}}$. At fixed stellar mass, red ICGs are hosted by more massive dark matter halos and have more satellites. At $M_{200}sim10^{12.7}M_odot$, both $M_{ast,mathrm{sat}}$ and the fraction of stellar mass in satellites versus total stellar mass, $f_mathrm{sat}$, tend to be slightly higher around blue ICGs, perhaps implying the late formation of blue galaxies. $f_mathrm{sat}$ increases with the increase in both $M_{ast,mathrm{ICG+diffuse}}$ and $M_{200}$, and scales more linearly with $M_{200}$. We provide best-fitting formulas for these scaling relations and for red and blue ICGs separately.
The minimization of electronics makes heat dissipation of related devices an increasing challenge. When the size of materials is smaller than the phonon mean free paths, phonons transport without internal scatterings and laws of diffusive thermal con duction fail, resulting in significant reduction in the effective thermal conductivity. This work reports, for the first time, the temperature dependent thermal conductivity of doped epitaxial 6H-SiC and monocrystalline porous 6H-SiC below room temperature probed by time-domain thermoreflectance. Strong quasi-ballistic thermal transport was observed in these samples, especially at low temperatures. Doping and structural boundaries were applied to tune the quasi-ballistic thermal transport since dopants selectively scatter high-frequency phonons while boundaries scatter phonons with long mean free paths. Exceptionally strong phonon scattering by boron dopants are observed, compared to nitrogen dopants. Furthermore, orders of magnitude reduction in the measured thermal conductivity was observed at low temperatures for the porous 6H-SiC compared to the epitaxial 6H-SiC. Finally, first principles calculations and a simple Callaway model are built to understand the measured thermal conductivities. Our work sheds light on the fundamental understanding of thermal conduction in technologically-important wide bandgap semiconductors such as 6H-SiC and will impact applications such as thermal management of 6H-SiC-related electronics and devices.
95 - Jing Shi , Marcel Ausloos , 2020
This paper investigates the heterogeneous impacts of either Global or Local Investor Sentiments on stock returns. We study 10 industry sectors through the lens of 6 (so called) emerging countries: China, Brazil, India, Mexico, Indonesia and Turkey, o ver the 2000 to 2014 period. Using a panel data framework, our study sheds light on a significant effect of Local Investor Sentiments on expected returns for basic materials, consumer goods, industrial, and financial industries. Moreover, our results suggest that from Global Investor Sentiments alone, one cannot predict expected stock returns in these markets.
226 - Peng Zhang , Jiaming Xu , Jing shi 2020
Speech separation aims to separate individual voice from an audio mixture of multiple simultaneous talkers. Although audio-only approaches achieve satisfactory performance, they build on a strategy to handle the predefined conditions, limiting their application in the complex auditory scene. Towards the cocktail party problem, we propose a novel audio-visual speech separation model. In our model, we use the face detector to detect the number of speakers in the scene and use visual information to avoid the permutation problem. To improve our models generalization ability to unknown speakers, we extract speech-related visual features from visual inputs explicitly by the adversarially disentangled method, and use this feature to assist speech separation. Besides, the time-domain approach is adopted, which could avoid the phase reconstruction problem existing in the time-frequency domain models. To compare our models performance with other models, we create two benchmark datasets of 2-speaker mixture from GRID and TCDTIMIT audio-visual datasets. Through a series of experiments, our proposed model is shown to outperform the state-of-the-art audio-only model and three audio-visual models.
Human object interaction (HOI) detection is an important task in image understanding and reasoning. It is in a form of HOI triplet <human; verb; object>, requiring bounding boxes for human and object, and action between them for the task completion. In other words, this task requires strong supervision for training that is however hard to procure. A natural solution to overcome this is to pursue weakly-supervised learning, where we only know the presence of certain HOI triplets in images but their exact location is unknown. Most weakly-supervised learning methods do not make provision for leveraging data with strong supervision, when they are available; and indeed a naive combination of this two paradigms in HOI detection fails to make contributions to each other. In this regard we propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent learning that learns seamlessly across these two types of supervision. Moreover, in light of the annotation insufficiency in mixed supervision, we introduce an HOI element swapping technique to synthesize diverse and hard negatives across images and improve the robustness of the model. Our method is evaluated on the challenging HICO-DET dataset. It performs close to or even better than many fully-supervised methods by using a mixed amount of strong and weak annotations; furthermore, it outperforms representative state of the art weakly and fully-supervised methods under the same supervision.
We address the problem of decomposing an image into albedo and shading. We propose the Fast Fourier Intrinsic Network, FFI-Net in short, that operates in the spectral domain, splitting the input into several spectral bands. Weights in FFI-Net are opt imized in the spectral domain, allowing faster convergence to a lower error. FFI-Net is lightweight and does not need auxiliary networks for training. The network is trained end-to-end with a novel spectral loss which measures the global distance between the network prediction and corresponding ground truth. FFI-Net achieves state-of-the-art performance on MPI-Sintel, MIT Intrinsic, and IIW datasets.
Few-shot learning has recently emerged as a new challenge in the deep learning field: unlike conventional methods that train the deep neural networks (DNNs) with a large number of labeled data, it asks for the generalization of DNNs on new classes wi th few annotated samples. Recent advances in few-shot learning mainly focus on image classification while in this paper we focus on object detection. The initial explorations in few-shot object detection tend to simulate a classification scenario by using the positive proposals in images with respect to certain object class while discarding the negative proposals of that class. Negatives, especially hard negatives, however, are essential to the embedding space learning in few-shot object detection. In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives. We build our work on a recent few-shot pipeline RepMet with several new modules to encode negative information for both training and testing. Extensive experiments on ImageNet-LOC and PASCAL VOC show our method substantially improves the state-of-the-art few-shot object detection solutions. Our code is available at https://github.com/yang-yk/NP-RepMet.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا