No Arabic abstract
Given a sequence of possibly sparse and noisy GPS traces and a map of the road network, map matching algorithms can infer the most accurate trajectory on the road network. However, if the road network is wrong (for example due to missing or incorrectly mapped roads, missing parking lots, misdirected turn restrictions or misdirected one-way streets) standard map matching algorithms fail to reconstruct the correct trajectory. In this paper, an algorithm to tracking vehicles able to move both on and off the known road network is formulated. It efficiently unifies existing hidden Markov model (HMM) approaches for map matching and standard free-space tracking methods (e.g. Kalman smoothing) in a principled way. The algorithm is a form of interacting multiple model (IMM) filter subject to an additional assumption on the type of model interaction permitted, termed here as semi-interacting multiple model (sIMM) filter. A forward filter (suitable for real-time tracking) and backward MAP sampling step (suitable for MAP trajectory inference and map matching) are described. The framework set out here is agnostic to the specific tracking models used, and makes clear how to replace these components with others of a similar type. In addition to avoiding generating misleading map matching trajectories, this algorithm can be applied to learn map features by detecting unmapped or incorrectly mapped roads and parking lots, incorrectly mapped turn restrictions and road directions.
Prior gradient-based attribution-map methods rely on handcrafted propagation rules for the non-linear/activation layers during the backward pass, so as to produce gradients of the input and then the attribution map. Despite the promising results achieved, such methods are sensitive to the non-informative high-frequency components and lack adaptability for various models and samples. In this paper, we propose a dedicated method to generate attribution maps that allow us to learn the propagation rules automatically, overcoming the flaws of the handcrafted ones. Specifically, we introduce a learnable plugin module, which enables adaptive propagation rules for each pixel, to the non-linear layers during the backward pass for mask generating. The masked input image is then fed into the model again to obtain new output that can be used as a guidance when combined with the original one. The introduced learnable module can be trained under any auto-grad framework with higher-order differential support. As demonstrated on five datasets and six network architectures, the proposed method yields state-of-the-art results and gives cleaner and more visually plausible attribution maps.
Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method does not require fine-tuning the network weights and halves the data transfer volumes to the main memory by compressing feature maps, which are highly correlated, with variable length coding. Our method outperform previous approach in term of the number of bits per value with minor accuracy degradation on ResNet-34 and MobileNetV2. We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy. When allowing accuracy degradation of up to 2%, the reduction of 60% is achieved. A reference implementation is available at https://github.com/CompressTeam/TransformCodingInference
We discuss a diffusion based implementation of the self-organizing map on the unit hypersphere. We show that this approach can be efficiently implemented using just linear algebra methods, we give a python numpy implementation, and we illustrate the approach using the well known MNIST dataset.
We prove a claim by Williams that the coassembly map is a homotopy limit map. As an application, we show that the homotopy limit map for the coarse version of equivariant $A$-theory agrees with the coassembly map for bivariant $A$-theory that appears in the statement of the topological Riemann-Roch theorem.
Most existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection paradigm and the data association framework where objects are firstly detected and then associated. Although deep-learning based method can noticeably improve the object detection performance and also provide good appearance features for cross-frame association, the framework is not completely end-to-end, and therefore the computation is huge while the performance is limited. To address the problem, we present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types. Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames, from which the trajectory of each tracked object can be easily picked up, just like how a detector inputs an image and outputs the bounding boxes of each detected object. The proposed model is fast and accurate. Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.