No Arabic abstract
Examining locomotion has improved our basic understanding of motor control and aided in treating motor impairment. Mice and rats are the model system of choice for basic neuroscience studies of human disease. High frame rates are needed to quantify the kinematics of running rodents, due to their high stride frequency. Manual tracking, especially for multiple body landmarks, becomes extremely time-consuming. To overcome these limitations, we proposed the use of superpixels based image segmentation as superpixels utilized both spatial and color information for segmentation. We segmented some parts of body and tested the success of segmentation as a function of color space and SLIC segment size. We used a simple merging function to connect the segmented regions considered as neighbor and having the same intensity value range. In addition, 28 features were extracted, and t-SNE was used to demonstrate how much the methods are capable to differentiate the regions. Finally, we compared the segmented regions to a manually outlined region. The results showed for segmentation, using the RGB image was slightly better compared to the hue channel. For merg- ing and classification, however, the hue representation was better as it captures the relevant color information in a single channel.
Examining locomotion has improved our basic understanding of motor control and aided in treating motor impairment. Mice and rats are premier models of human disease and increasingly the model systems of choice for basic neuroscience. High frame rates (250 Hz) are needed to quantify the kinematics of these running rodents. Manual tracking, especially for multiple markers, becomes time-consuming and impossible for large sample sizes. Therefore, the need for automatic segmentation of these markers has grown in recent years. We propose two methods to segment and track these markers: first, using SLIC superpixels segmentation with a tracker based on position, speed, shape, and color information of the segmented region in the previous frame; second, using a thresholding on hue channel following up with the same tracker. The comparison showed that the SLIC superpixels method was superior because the segmentation was more reliable and based on both color and spatial information.
Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data. We propose a divide&conquer strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one. This derives a novel learning paradigm: class-incremental few-shot learning, which is especially effective for the challenge evolving over time: 1) the class imbalance among the old-class knowledge review and 2) the few-shot data in new-class learning. We call our approach Learning to Segment the Tail (LST). In particular, we design an instance-level balanced replay scheme, which is a memory-efficient approximation to balance the instance-level samples from the old-class images. We also propose to use a meta-module for new-class learning, where the module parameters are shared across incremental phases, gaining the learning-to-learn knowledge incrementally, from the data-rich head to the data-poor tail. We empirically show that: at the expense of a little sacrifice of head-class forgetting, we can gain a significant 8.3% AP improvement for the tail classes with less than 10 instances, achieving an overall 2.0% AP boost for the whole 1,230 classes.
Nodule segmentation from breast ultrasound images is challenging yet essential for the diagnosis. Weakly-supervised segmentation (WSS) can help reduce time-consuming and cumbersome manual annotation. Unlike existing weakly-supervised approaches, in this study, we propose a novel and general WSS framework called Flip Learning, which only needs the box annotation. Specifically, the target in the label box will be erased gradually to flip the classification tag, and the erased region will be considered as the segmentation result finally. Our contribution is three-fold. First, our proposed approach erases on superpixel level using a Multi-agent Reinforcement Learning framework to exploit the prior boundary knowledge and accelerate the learning process. Second, we design two rewards: classification score and intensity distribution reward, to avoid under- and over-segmentation, respectively. Third, we adopt a coarse-to-fine learning strategy to reduce the residual errors and improve the segmentation performance. Extensively validated on a large dataset, our proposed approach achieves competitive performance and shows great potential to narrow the gap between fully-supervised and weakly-supervised learning.
We consider the problem of segmenting an image into superpixels in the context of $k$-means clustering, in which we wish to decompose an image into local, homogeneous regions corresponding to the underlying objects. Our novel approach builds upon the widely used Simple Linear Iterative Clustering (SLIC), and incorporate a measure of objects structure based on the spectral residual of an image. Based on this combination, we propose a modified initialisation scheme and search metric, which helps keeps fine-details. This combination leads to better adherence to object boundaries, while preventing unnecessary segmentation of large, uniform areas, while remaining computationally tractable in comparison to other methods. We demonstrate through numerical and visual experiments that our approach outperforms the state-of-the-art techniques.
While deep learning-based 3D face generation has made a progress recently, the problem of dynamic 3D (4D) facial expression synthesis is less investigated. In this paper, we propose a novel solution to the following question: given one input 3D neutral face, can we generate dynamic 3D (4D) facial expressions from it? To tackle this problem, we first propose a mesh encoder-decoder architecture (Expr-ED) that exploits a set of 3D landmarks to generate an expressive 3D face from its neutral counterpart. Then, we extend it to 4D by modeling the temporal dynamics of facial expressions using a manifold-valued GAN capable of generating a sequence of 3D landmarks from an expression label (Motion3DGAN). The generated landmarks are fed into the mesh encoder-decoder, ultimately producing a sequence of 3D expressive faces. By decoupling the two steps, we separately address the non-linearity induced by the mesh deformation and motion dynamics. The experimental results on the CoMA dataset show that our mesh encoder-decoder guided by landmarks brings a significant improvement with respect to other landmark-based 3D fitting approaches, and that we can generate high quality dynamic facial expressions. This framework further enables the 3D expression intensity to be continuously adapted from low to high intensity. Finally, we show our framework can be applied to other tasks, such as 2D-3D facial expression transfer.