ترغب بنشر مسار تعليمي؟ اضغط هنا

Automatically describing video, or video captioning, has been widely studied in the multimedia field. This paper proposes a new task of sensor-augmented egocentric-video captioning, a newly constructed dataset for it called MMAC Captions, and a metho d for the newly proposed task that effectively utilizes multi-modal data of video and motion sensors, or inertial measurement units (IMUs). While conventional video captioning tasks have difficulty in dealing with detailed descriptions of human activities due to the limited view of a fixed camera, egocentric vision has greater potential to be used for generating the finer-grained descriptions of human activities on the basis of a much closer view. In addition, we utilize wearable-sensor data as auxiliary information to mitigate the inherent problems in egocentric vision: motion blur, self-occlusion, and out-of-camera-range activities. We propose a method for effectively utilizing the sensor data in combination with the video data on the basis of an attention mechanism that dynamically determines the modality that requires more attention, taking the contextual information into account. We compared the proposed sensor-fusion method with strong baselines on the MMAC Captions dataset and found that using sensor data as supplementary information to the egocentric-video data was beneficial, and that our proposed method outperformed the strong baselines, demonstrating the effectiveness of the proposed method.
In this study, we analyze giant Galactic spurs seen in both radio and X-ray all-sky maps to reveal their origins. We discuss two types of giant spurs: one is the brightest diffuse emission near the maps center, which is likely to be related to Fermi bubbles (NPSs/SPSs, north/south polar spurs, respectively), and the other is weaker spurs that coincide positionally with local spiral arms in our Galaxy (LAS, local arm spur). Our analysis finds that the X-ray emissions, not only from the NPS but from the SPS are closer to the Galactic center by ~5 deg compared with the corresponding radio emission. Furthermore, larger offsets of 10-20 deg are observed in the LASs; however, they are attributed to different physical origins. Moreover, the temperature of the X-ray emission is kT ~ 0.2 keV for the LAS, which is systematically lower than those of the NPS and SPS (kT ~ 0.3 keV) but consistent with the typical temperature of Galactic halo gas. We argue that the radio/X-ray offset and the slightly higher temperature of the NPS/SPS X-ray gas are due to the shock compression/heating of halo gas during a significant Galactic explosion in the past, whereas the enhanced X-ray emission from the LAS may be due to the weak condensation of halo gas in the arm potential or star formation activity without shock heating.
Variational Inference (VI) combined with Bayesian nonlinear filtering produces the state-of-the-art results for latent trajectory inference. A body of recent works focused on Sequential Monte Carlo (SMC) and its expansion, e.g., Forward Filtering Bac kward Simulation (FFBSi). These studies achieved a great success, however, remain a serious problem for particle degeneracy. In this paper, we propose Ensemble Kalman Objectives (EnKOs), the hybrid method of VI and Ensemble Kalman Filter (EnKF), to infer the State Space Models (SSMs). Unlike the SMC based methods, the our proposed method can identify the latent dynamics given fewer particles because of its rich particle diversity. We demonstrate that EnKOs outperform the SMC based methods in terms of predictive ability for three benchmark nonlinear dynamics systems tasks.
Class-imbalance is one of the major challenges in real world datasets, where a few classes (called majority classes) constitute much more data samples than the rest (called minority classes). Learning deep neural networks using such datasets leads to performances that are typically biased towards the majority classes. Most of the prior works try to solve class-imbalance by assigning more weights to the minority classes in various manners (e.g., data re-sampling, cost-sensitive learning). However, we argue that the number of available training data may not be always a good clue to determine the weighting strategy because some of the minority classes might be sufficiently represented even by a small number of training data. Overweighting samples of such classes can lead to drop in the models overall performance. We claim that the difficulty of a class as perceived by the model is more important to determine the weighting. In this light, we propose a novel loss function named Class-wise Difficulty-Balanced loss, or CDB loss, which dynamically distributes weights to each sample according to the difficulty of the class that the sample belongs to. Note that the assigned weights dynamically change as the difficulty for the model may change with the learning progress. Extensive experiments are conducted on both image (artificially induced class-imbalanced MNIST, long-tailed CIFAR and ImageNet-LT) and video (EGTEA) datasets. The results show that CDB loss consistently outperforms the recently proposed loss functions on class-imbalanced datasets irrespective of the data type (i.e., video or image).
The Kalman filter is the most powerful tool for estimation of the states of a linear Gaussian system. In addition, using this method, an expectation maximization algorithm can be used to estimate the parameters of the model. However, this algorithm c annot function in real time. Thus, we propose a new method that can be used to estimate the transition matrices and the states of the system in real time. The proposed method uses three ideas: estimation in an observation space, a time-invariant interval, and an online learning framework. Applied to damped oscillation model, we have obtained extraordinary performance to estimate the matrices. In addition, by introducing localization and spatial uniformity to the proposed method, we have demonstrated that noise can be reduced in high-dimensional spatio-temporal data. Moreover, the proposed method has potential for use in areas such as weather forecasting and vector field analysis.
Owing to their high photon detection efficiency, compactness, and low operating voltage, silicon photomultipliers (SiPMs) have found widespread application in many fields, including medical imaging, particle physics, and high-energy astrophysics. How ever, the so-called optical crosstalk (OCT) phenomenon of SiPMs is a major drawback to their adoption. Secondary infrared photons are emitted inside the silicon substrate spontaneously after the avalanche process caused by the primary incident photons, and they can be detected by the surrounding photodiodes. As a result large output pulses that are equivalent to multiple photoelectrons are observed with a certain probability (OCT rate), even for single-photon events, making the charge resolution worse and increasing the rate of accidental triggers by single-photon events in applications such as atmospheric Cherenkov telescopes. In our previous study, we found that the OCT rates of single-channel SiPMs was dependent on the thickness of their protection resin window, which may be explained by photon propagation inside the resin. In the present study, we measured the OCT rate of a multichannel SiPM and those of neighboring channels caused by photon propagation. Both OCT rates were found to be dependent on the protection-window thickness. We report our OCT measurements of a multichannel SiPM and comparisons with a ray-tracing simulation.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا