ﻻ يوجد ملخص باللغة العربية
Activity detection from first-person videos (FPV) captured using a wearable camera is an active research field with potential applications in many sectors, including healthcare, law enforcement, and rehabilitation. State-of-the-art methods use optical flow-based hybrid techniques that rely on features derived from the motion of objects from consecutive frames. In this work, we developed a two-stream network, the emph{SegCodeNet}, that uses a network branch containing video-streams with color-coded semantic segmentation masks of relevant objects in addition to the original RGB video-stream. We also include a stream-wise attention gating that prioritizes between the two streams and a frame-wise attention module that prioritizes the video frames that contain relevant features. Experiments are conducted on an FPV dataset containing $18$ activity classes in office environments. In comparison to a single-stream network, the proposed two-stream method achieves an absolute improvement of $14.366%$ and $10.324%$ for averaged F1 score and accuracy, respectively, when average results are compared for three different frame sizes $224times224$, $112times112$, and $64times64$. The proposed method provides significant performance gains for lower-resolution images with absolute improvements of $17%$ and $26%$ in F1 score for input dimensions of $112times112$ and $64times64$, respectively. The best performance is achieved for a frame size of $224times224$ yielding an F1 score and accuracy of $90.176%$ and $90.799%$ which outperforms the state-of-the-art Inflated 3D ConvNet (I3D) cite{carreira2017quo} method by an absolute margin of $4.529%$ and $2.419%$, respectively.
Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recog
The capture of scintillation light emitted by liquid Argon and Xenon under molecular excitations by charged particles is still a challenging task. Here we present a first attempt to design a device able to grab sufficiently high luminosity in order t
FlatCam is a thin form-factor lensless camera that consists of a coded mask placed on top of a bare, conventional sensor array. Unlike a traditional, lens-based camera where an image of the scene is directly recorded on the sensor pixels, each pixel
Seamlessly blending features from multiple images is extremely challenging because of complex relationships in lighting, geometry, and partial occlusion which cause coupling between different parts of the image. Even though recent work on GANs enable
Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect t