New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Human Action Recognition from video data

التعرف على النشاط البشري من مقطع فيديو

1044 19 47 0 ( 0 )

Download Cite

Added by Higher Institute for Applied Sciences and Technology رسالة ماجستير

Publication date 2017

fields Communication Engineering

and research's language is العربية

Authors بشار ونوس( طالب ) - آصف جعفر( مشرف ) - شادي البيطار( مشرف )

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this work, our goal is recognizing human action from video data. First we propose an overview about Human Action Recognition, includes the famous methods and previous algorithms, then we propose an algorithm and its implementation using MATLAB.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تهدف هذه الأطروحة إلى التعرف على النشاط البشري من مقاطع الفيديو باستخدام خوارزميات معالجة الصور والشبكات العصبونية. تبدأ الدراسة بمراجعة مرجعية تشمل الطرق والخوارزميات المستخدمة في هذا المجال، وتقديم قواعد البيانات العالمية وطرق الاختبار المتبعة. يتم تصميم نظام للتعرف على النشاط البشري وتنفيذه باستخدام بيئة MATLAB، حيث يتم عرض العمل الرياضي بشكل كامل ونتائج الاختبارات والمقارنة مع الطرق المعروفة. تم تحقيق معدل تعرف يصل إلى 98.9% باستخدام قاعدة بيانات عالمية وطريقة اختبار معروفة. كما تقدم الأطروحة طريقة جديدة لتقييم تعقيد أنظمة التعرف على النشاط البشري ومعرفة إمكانية عمل أي نظام معالجة فيديو في الزمن الحقيقي. في النهاية، يتم عرض نتائج بعض الطرق التي تم تجربتها قبل الوصول بالخوارزمية إلى شكلها النهائي، مع شرح مفصل لهذه الطرق.

Critical review

دراسة نقدية: على الرغم من أن الأطروحة تقدم نتائج ممتازة في مجال التعرف على النشاط البشري، إلا أن هناك بعض النقاط التي يمكن تحسينها. أولاً، يجب التركيز على تحسين الزمن اللازم لاختبار مقطع الفيديو، حيث أن الزمن الحالي قد لا يكون كافياً للتطبيقات الزمن الحقيقي في جميع الحالات. ثانياً، يمكن تحسين الخوارزمية لتكون أكثر مرونة في التعامل مع التحديات مثل تغير الإضاءة والعوائق في الخلفية. ثالثاً، يجب النظر في استخدام تقنيات تعلم عميق أكثر تطوراً مثل الشبكات العصبونية التلافيفية (CNN) لتحسين الأداء بشكل أكبر.

Questions related to the research

ما هو معدل التعرف الذي تم تحقيقه باستخدام الخوارزمية المقترحة؟

تم تحقيق معدل تعرف يصل إلى 98.9% باستخدام الخوارزمية المقترحة.
ما هي البيئة البرمجية المستخدمة لتنفيذ الخوارزمية؟

تم تنفيذ الخوارزمية باستخدام بيئة MATLAB.
ما هي الطريقة الجديدة التي قدمتها الأطروحة لتقييم تعقيد الأنظمة؟

قدمت الأطروحة طريقة جديدة لتقييم تعقيد أنظمة التعرف على النشاط البشري ومعرفة إمكانية عمل أي نظام معالجة فيديو في الزمن الحقيقي.
ما هي التحديات التي تواجه التعرف على النشاط البشري في مقاطع الفيديو؟

تشمل التحديات تباين نقطة النظر، وجود العوائق، تباين معدل تنفيذ النشاط، الفضاء الذي يؤدي فيه كل شخص نشاطه، وحركة الكاميرا.

Keywords

التعرف على النشاط البشري معالجة الصور الشبكات العصبونية الزمن الحقيقي تعقيد الأنظمة

References used

Wannous Bashar, Jaafar Assef, and Albitar Chadi. "Human Action Recognition using Contour History Images and Neural Networks Classifier." International Research Journal of Engineering and Technology 4.8 (2017): 7

Turaga, Pavan, et al. "Machine recognition of human activities: A survey." IEEE Transactions on Circuits and Systems for Video Technology 18.11 (2008): 1473-1488

Aggarwal, Jake K., and Michael S. Ryoo. "Human activity analysis: A review." ACM Computing Surveys (CSUR) 43.3 (2011): 16

rate research

Data Augmentation for Cross-Domain Named Entity Recognition

376 - Association for Computation Linguistics 2021 مقالة

Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limite d. In this work, we take this research direction to the opposite and study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains by projecting it into the low-resource domains. Specifically, we propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain by learning the patterns (e.g. style, noise, abbreviations, etc.) in the text that differentiate them and a shared feature space where both domains are aligned. We experiment with diverse datasets and show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.

حقيقي صناعة حمض الفوسفور

Video-aided Unsupervised Grammar Induction

219 - Association for Computation Linguistics 2021 مقالة

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video. Existing methods of multi-modal grammar induction focus on grammar induction from text-image pairs, with promising results showing that the information from static images is useful in induction. However, videos provide even richer information, including not only static objects but also actions and state changes useful for inducing verb phrases. In this paper, we explore rich features (e.g. action, object, scene, audio, face, OCR and speech) from videos, taking the recent Compound PCFG model as the baseline. We further propose a Multi-Modal Compound PCFG model (MMC-PCFG) to effectively aggregate these rich features from different modalities. Our proposed MMC-PCFG is trained end-to-end and outperforms each individual modality and previous state-of-the-art systems on three benchmarks, i.e. DiDeMo, YouCook2 and MSRVTT, confirming the effectiveness of leveraging video information for unsupervised grammar induction.

grammar induction unsupervised grammar induction video-aided grammar induction القواعد الحث التعريفي القواعد غير المدعومة بمساعدة الفيديو الحث صناعة حمض الفوسفور المزيد..

Isolated Word Recognition

2160 - Tishreen University 2016 مشروع تخرج

الغاية من هذا البحث بناء نظام لتصنيف نطق الأرقام الانكليزية وذلك بالاعتماد على نماذج ماركوف المخفية في التصنيف وذلك بالاعتماد على طيف الإشارة في استخراج سمات الإشارات

MFCC Hidden Markov Model Vector Quantization Mel Frequency Cepstral Coefficients نماذج ماركوف المخفية

Relation-aware Video Reading Comprehension for Temporal Language Grounding

380 - Association for Computation Linguistics 2021 مقالة

Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence. Previous methods treat it either as a boundary regression task or a span extraction task. This paper will formulate temporal language groun ding into video reading comprehension and propose a Relation-aware Network (RaNet) to address it. This framework aims to select a video moment choice from the predefined answer set with the aid of coarse-and-fine choice-query interaction and choice-choice relation construction. A choice-query interactor is proposed to match the visual and textual information simultaneously in sentence-moment and token-moment levels, leading to a coarse-and-fine cross-modal interaction. Moreover, a novel multi-choice relation constructor is introduced by leveraging graph convolution to capture the dependencies among video moment choices for the best choice selection. Extensive experiments on ActivityNet-Captions, TACoS, and Charades-STA demonstrate the effectiveness of our solution. Codes will be available at https://github.com/Huntersxsx/RaNet.

الاستدلال في الدوران المتعدد language grounding temporal language لغة الأرض اللغة الزمنية صناعة حمض الفوسفور

Meta-Learning for Few-Shot Named Entity Recognition

370 - Association for Computation Linguistics 2021 مقالة

Meta-learning has recently been proposed to learn models and algorithms that can generalize from a handful of examples. However, applications to structured prediction and textual tasks pose challenges for meta-learning algorithms. In this paper, we a pply two meta-learning algorithms, Prototypical Networks and Reptile, to few-shot Named Entity Recognition (NER), including a method for incorporating language model pre-training and Conditional Random Fields (CRF). We propose a task generation scheme for converting classical NER datasets into the few-shot setting, for both training and evaluation. Using three public datasets, we show these meta-learning algorithms outperform a reasonable fine-tuned BERT baseline. In addition, we propose a novel combination of Prototypical Networks and Reptile.

النماذج القائمة على المحولات متعددة اللغات few-shot named entity عدد قليل من القليل من الكيان صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Human Action Recognition from video data

التعرف على النشاط البشري من مقطع فيديو

Ask ChatGPT about the research

In this work, our goal is recognizing human action from video data. First we propose an overview about Human Action Recognition, includes the famous methods and previous algorithms, then we propose an algorithm and its implementation using MATLAB.

Read More

suggested questions