High-dimensional Convolutional Networks for Geometric Pattern Recognition

216 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Christopher Choy

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Christopher Choy - Junha Lee - Rene Ranftl

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Many problems in science and engineering can be formulated in terms of geometric patterns in high-dimensional spaces. We present high-dimensional convolutional networks (ConvNets) for pattern recognition problems that arise in the context of geometric registration. We first study the effectiveness of convolutional networks in detecting linear subspaces in high-dimensional spaces with up to 32 dimensions: much higher dimensionality than prior applications of ConvNets. We then apply high-dimensional ConvNets to 3D registration under rigid motions and image correspondence estimation. Experiments indicate that our high-dimensional ConvNets outperform prior approaches that relied on deep networks based on global pooling operators.

قيم البحث

159 - Yi Zhu , Zhenzhong Lan , Shawn Newsam 2017

Analyzing videos of human actions involves understanding the temporal relationships among video frames. State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs. Suc h a two-stage approach is computationally expensive, storage demanding, and not end-to-end trainable. In this paper, we present a novel CNN architecture that implicitly captures motion information between adjacent frames. We name our approach hidden two-stream CNNs because it only takes raw video frames as input and directly predicts action classes without explicitly computing optical flow. Our end-to-end approach is 10x faster than its two-stage baseline. Experimental results on four challenging action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show that our approach significantly outperforms the previous best real-time approaches.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الوسائط المتعددة

Post-OCR Paragraph Recognition by Graph Convolutional Networks

137 - Renshen Wang , Yasuhisa Fujii , Ashok C. Popat 2021

Paragraphs are an important class of document entities. We propose a new approach for paragraph identification by spatial graph convolutional neural networks (GCN) applied on OCR text boxes. Two steps, namely line splitting and line clustering, are p erformed to extract paragraphs from the lines in OCR results. Each step uses a beta-skeleton graph constructed from bounding boxes, where the graph edges provide efficient support for graph convolution operations. With only pure layout input features, the GCN model size is 3~4 orders of magnitude smaller compared to R-CNN based models, while achieving comparable or better accuracies on PubLayNet and other datasets. Furthermore, the GCN models show good generalization from synthetic training data to real-world images, and good adaptivity for variable document styles.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Convolutional Kernel Networks

525 - Julien Mairal 2014

An important goal in visual recognition is to devise image representations that are invariant to particular transformations. In this paper, we address this goal with a new type of convolutional neural network (CNN) whose invariance is encoded by a re producing kernel. Unlike traditional approaches where neural networks are learned either to represent data or for solving a classification task, our network learns to approximate the kernel feature map on training data. Such an approach enjoys several benefits over classical ones. First, by teaching CNNs to be invariant, we obtain simple network architectures that achieve a similar accuracy to more complex ones, while being easy to train and robust to overfitting. Second, we bridge a gap between the neural network literature and kernels, which are natural tools to model invariance. We evaluate our methodology on visual recognition tasks where CNNs have proven to perform well, e.g., digit recognition with the MNIST dataset, and the more challenging CIFAR-10 and STL-10 datasets, where our accuracy is competitive with the state of the art.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Fully Convolutional Networks for Handwriting Recognition

103 - Felipe Petroski Such , Dheeraj Peri , Frank Brockler 2019

Handwritten text recognition is challenging because of the virtually infinite ways a human can write the same message. Our fully convolutional handwriting model takes in a handwriting sample of unknown length and outputs an arbitrary stream of symbol s. Our dual stream architecture uses both local and global context and mitigates the need for heavy preprocessing steps such as symbol alignment correction as well as complex post processing steps such as connectionist temporal classification, dictionary matching or language models. Using over 100 unique symbols, our model is agnostic to Latin-based languages, and is shown to be quite competitive with state of the art dictionary based methods on the popular IAM and RIMES datasets. When a dictionary is known, we further allow a probabilistic character error rate to correct errant word blocks. Finally, we introduce an attention based mechanism which can automatically target variants of handwriting, such as slant, stroke width, or noise.

الرؤية الحاسوبية وتمييز الأنماط

Orientation Convolutional Networks for Image Recognition

114 - Yalan Qin , Guorui Feng , Hanzhou Wu 2021

Deep Convolutional Neural Networks (DCNNs) are capable of obtaining powerful image representations, which have attracted great attentions in image recognition. However, they are limited in modeling orientation transformation by the internal mechanism . In this paper, we develop Orientation Convolution Networks (OCNs) for image recognition based on the proposed Landmark Gabor Filters (LGFs) that the robustness of the learned representation against changed of orientation can be enhanced. By modulating the convolutional filter with LGFs, OCNs can be compatible with any existing deep learning networks. LGFs act as a Gabor filter bank achieved by selecting $ p $ $ left( ll nright) $ representative Gabor filters as andmarks and express the original Gabor filters as sparse linear combinations of these landmarks. Specifically, based on a matrix factorization framework, a flexible integration for the local and the global structure of original Gabor filters by sparsity and low-rank constraints is utilized. With the propogation of the low-rank structure, the corresponding sparsity for representation of original Gabor filter bank can be significantly promoted. Experimental results over several benchmarks demonstrate that our method is less sensitive to the orientation and produce higher performance both in accuracy and cost, compared with the existing state-of-art methods. Besides, our OCNs have few parameters to learn and can significantly reduce the complexity of training network.

الرؤية الحاسوبية وتمييز الأنماط