ترغب بنشر مسار تعليمي؟ اضغط هنا

143 - Ryan Razani , Ran Cheng , Enxu Li 2021
Panoptic segmentation as an integrated task of both static environmental understanding and dynamic object identification, has recently begun to receive broad research interest. In this paper, we propose a new computationally efficient LiDAR based pan optic segmentation framework, called GP-S3Net. GP-S3Net is a proposal-free approach in which no object proposals are needed to identify the objects in contrast to conventional two-stage panoptic systems, where a detection network is incorporated for capturing instance information. Our new design consists of a novel instance-level network to process the semantic results by constructing a graph convolutional network to identify objects (foreground), which later on are fused with the background classes. Through the fine-grained clusters of the foreground objects from the semantic segmentation backbone, over-segmentation priors are generated and subsequently processed by 3D sparse convolution to embed each cluster. Each cluster is treated as a node in the graph and its corresponding embedding is used as its node feature. Then a GCNN predicts whether edges exist between each cluster pair. We utilize the instance label to generate ground truth edge labels for each constructed graph in order to supervise the learning. Extensive experiments demonstrate that GP-S3Net outperforms the current state-of-the-art approaches, by a significant margin across available datasets such as, nuScenes and SemanticPOSS, ranking first on the competitive public SemanticKITTI leaderboard upon publication.
Recent advancements in deep neural networks have made remarkable leap-forwards in dense image prediction. However, the issue of feature alignment remains as neglected by most existing approaches for simplicity. Direct pixel addition between upsampled and local features leads to feature maps with misaligned contexts that, in turn, translate to mis-classifications in prediction, especially on object boundaries. In this paper, we propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled higher-level features; and another feature selection module to emphasize the lower-level features with rich spatial details. We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN). Extensive experimental evaluations on four dense prediction tasks and four datasets have demonstrated the efficacy of FaPN, yielding an overall improvement of 1.2 - 2.6 points in AP / mIoU over FPN when paired with Faster / Mask R-CNN. In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within Mask-Former. The code is available from https://github.com/EMI-Group/FaPN.
Neural interfaces using biocompatible scaffolds provide crucial properties for the functional repair of nerve injuries and neurodegenerative diseases, including cell adhesion, structural support, and mass transport. Neural stimulation has also been f ound to be effective in promoting neural regeneration. This work provides a new strategy to integrate photoacoustic (PA) neural stimulation into hydrogel scaffolds using a nanocomposite hydrogel approach. Specifically, polyethylene glycol (PEG)-functionalized carbon nanotubes (CNT), highly efficient photoacoustic agents, are embedded into silk fibroin to form biocompatible and soft photoacoustic materials. We show that these photoacoustic functional scaffolds enable non-genetic activation of neurons with a spatial precision defined by the area of light illumination, promoting neuron regeneration. These CNT/silk scaffolds offered reliable and repeatable photoacoustic neural stimulation. 94% of photoacoustic stimulated neurons exhibit a fluorescence change larger than 10% in calcium imaging in the light illuminated area. The on-demand photoacoustic stimulation increased neurite outgrowth by 1.74-fold in a dorsal root ganglion model, when compared to the unstimulated group. We also confirmed that photoacoustic neural stimulation promoted neurite outgrowth by impacting the brain-derived neurotrophic factor (BDNF) pathway. As a multifunctional neural scaffold, CNT/silk scaffolds demonstrated non-genetic PA neural stimulation functions and promoted neurite outgrowth, providing a new method for non-pharmacological neural regeneration.
Autonomous driving vehicles and robotic systems rely on accurate perception of their surroundings. Scene understanding is one of the crucial components of perception modules. Among all available sensors, LiDARs are one of the essential sensing modali ties of autonomous driving systems due to their active sensing nature with high resolution of sensor readings. Accurate and fast semantic segmentation methods are needed to fully utilize LiDAR sensors for scene understanding. In this paper, we present Lite-HDSeg, a novel real-time convolutional neural network for semantic segmentation of full $3$D LiDAR point clouds. Lite-HDSeg can achieve the best accuracy vs. computational complexity trade-off in SemanticKitti benchmark and is designed on the basis of a new encoder-decoder architecture with light-weight harmonic dense convolutions as its core. Moreover, we introduce ICM, an improved global contextual module to capture multi-scale contextual features, and MCSPN, a multi-class Spatial Propagation Network to further refine the semantic boundaries. Our experimental results show that the proposed method outperforms state-of-the-art semantic segmentation approaches which can run real-time, thus is suitable for robotic and autonomous driving applications.
360 - Ran Cheng , Ryan Razani , Yuan Ren 2021
Semantic Segmentation is a crucial component in the perception systems of many applications, such as robotics and autonomous driving that rely on accurate environmental perception and understanding. In literature, several approaches are introduced to attempt LiDAR semantic segmentation task, such as projection-based (range-view or birds-eye-view), and voxel-based approaches. However, they either abandon the valuable 3D topology and geometric relations and suffer from information loss introduced in the projection process or are inefficient. Therefore, there is a need for accurate models capable of processing the 3D driving-scene point cloud in 3D space. In this paper, we propose S3Net, a novel convolutional neural network for LiDAR point cloud semantic segmentation. It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM), and Sparse Inter-channel Attention Module (SInterAM) to emphasize the fine details of both within each feature map and among nearby feature maps. To extract the global contexts in deeper layers, we introduce Sparse Residual Tower based upon sparse convolution that suits varying sparsity of LiDAR point cloud. In addition, geo-aware anisotrophic loss is leveraged to emphasize the semantic boundaries and penalize the noise within each predicted regions, leading to a robust prediction. Our experimental results show that the proposed method leads to a large improvement (12%) compared to its baseline counterpart (MinkNet42 cite{choy20194d}) on SemanticKITTI cite{DBLP:conf/iccv/BehleyGMQBSG19} test set and achieves state-of-the-art mIoU accuracy of semantic segmentation approaches.
201 - Hantao Zhang , Ran Cheng 2021
Magnon spin Nernst effect was recently proposed as an intrinsic effect in antiferromagnets, where spin diffusion and boundary spin transmission have been ignored. However, diffusion processes are essential to convert a bulk spin current into boundary spin accumulation, which determines the spin injection rate into detectors through imperfect transmission. We formulate a diffusive theory of the magnon spin Nernst effect with boundary conditions reflecting real device geometry. Thanks to the diffusion effect, the output signals in both electronic and optical detection grow rapidly with an increasing system size in the transverse dimension, which eventually saturate. Counterintuitively, the measurable signals are even functions of magnetic field, yielding optical detection more reliable than electronic detection.
Autonomous robotic systems and self driving cars rely on accurate perception of their surroundings as the safety of the passengers and pedestrians is the top priority. Semantic segmentation is one the essential components of environmental perception that provides semantic information of the scene. Recently, several methods have been introduced for 3D LiDAR semantic segmentation. While, they can lead to improved performance, they are either afflicted by high computational complexity, therefore are inefficient, or lack fine details of smaller instances. To alleviate this problem, we propose AF2-S3Net, an end-to-end encoder-decoder CNN network for 3D LiDAR semantic segmentation. We present a novel multi-branch attentive feature fusion module in the encoder and a unique adaptive feature selection module with feature map re-weighting in the decoder. Our AF2-S3Net fuses the voxel based learning and point-based learning into a single framework to effectively process the large 3D scene. Our experimental results show that the proposed method outperforms the state-of-the-art approaches on the large-scale SemanticKITTI benchmark, ranking 1st on the competitive public leaderboard competition upon publication.
119 - Hantao Zhang , Ran Cheng 2020
In an easy-plane antiferromagnet with the Dzyaloshinskii-Moriya interaction (DMI), magnons are subject to an effective spin-momentum locking. An in-plane temperature gradient can generate interfacial accumulation of magnons with a specified polarizat ion, realizing the magnon thermal Edelstein effect. We theoretically investigate the injection and detection of this thermally-driven spin polarization in an adjacent heavy metal with strong spin Hall effect. We find that the inverse spin Hall voltage depends monotonically on both temperature and the DMI but non-monotonically on the hard-axis anisotropy. Counterintuitively, the magnon thermal Edelstein effect is an even function of a magnetic field applied along the Neel vector.
Despite the remarkable successes of Convolutional Neural Networks (CNNs) in computer vision, it is time-consuming and error-prone to manually design a CNN. Among various Neural Architecture Search (NAS) methods that are motivated to automate designs of high-performance CNNs, the differentiable NAS and population-based NAS are attracting increasing interests due to their unique characters. To benefit from the merits while overcoming the deficiencies of both, this work proposes a novel NAS method, RelativeNAS. As the key to efficient search, RelativeNAS performs joint learning between fast-learners (i.e. networks with relatively higher accuracy) and slow-learners in a pairwise manner. Moreover, since RelativeNAS only requires low-fidelity performance estimation to distinguish each pair of fast-learner and slow-learner, it saves certain computation costs for training the candidate architectures. The proposed RelativeNAS brings several unique advantages: (1) it achieves state-of-the-art performance on ImageNet with top-1 error rate of 24.88%, i.e. outperforming DARTS and AmoebaNet-B by 1.82% and 1.12% respectively; (2) it spends only nine hours with a single 1080Ti GPU to obtain the discovered cells, i.e. 3.75x and 7875x faster than DARTS and AmoebaNet respectively; (3) it provides that the discovered cells obtained on CIFAR-10 can be directly transferred to object detection, semantic segmentation, and keypoint detection, yielding competitive results of 73.1% mAP on PASCAL VOC, 78.7% mIoU on Cityscapes, and 68.5% AP on MSCOCO, respectively. The implementation of RelativeNAS is available at https://github.com/EMI-Group/RelativeNAS
Despite significant advances in image-to-image (I2I) translation with Generative Adversarial Networks (GANs) have been made, it remains challenging to effectively translate an image to a set of diverse images in multiple target domains using a pair o f generator and discriminator. Existing multimodal I2I translation methods adopt multiple domain-specific content encoders for different domains, where each domain-specific content encoder is trained with images from the same domain only. Nevertheless, we argue that the content (domain-invariant) features should be learned from images among all the domains. Consequently, each domain-specific content encoder of existing schemes fails to extract the domain-invariant features efficiently. To address this issue, we present a flexible and general SoloGAN model for efficient multimodal I2I translation among multiple domains with unpaired data. In contrast to existing methods, the SoloGAN algorithm uses a single projection discriminator with an additional auxiliary classifier, and shares the encoder and generator for all domains. As such, the SoloGAN model can be trained effectively with images from all domains such that the domain-invariant content representation can be efficiently extracted. Qualitative and quantitative results over a wide range of datasets against several counterparts and variants of the SoloGAN model demonstrate the merits of the method, especially for the challenging I2I translation tasks, i.e., tasks that involve extreme shape variations or need to keep the complex backgrounds unchanged after translations. Furthermore, we demonstrate the contribution of each component using ablation studies.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا