Do you want to publish a course? Click here

A SURVEY STUDY ON INFORMATION EXTRACTION FROM TEXT

دراسة استقصائية لطرق استخلاص المعلومات من نص

1640   0   130   0 ( 0 )
 Publication date 2017
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

Information extraction is the task of finding structured information from unstructured or semi-structured text. It is an important task in text mining and has been extensively studied in various research communities including natural language processing, information retrieval and Web mining. It has a wide range of applications in domains such as biomedical literature mining and business intelligence. Two fundamental tasks of information extraction are named entity recognition and relation extraction. The former refers to finding names of entities such as people, organizations and locations. The latter refers to finding the semantic relations between entities.


Artificial intelligence review:
Research summary
تتناول هذه الدراسة الاستقصائية طرق استخلاص المعلومات من النصوص غير المنظمة أو شبه المنظمة، وهي مهمة أساسية في التقيب بالنصوص ومعالجة اللغة الطبيعية. تركز الدراسة على مهمتين رئيسيتين: التعرف على الكيانات المسماة واستخلاص العلاقات الدلالية بين هذه الكيانات. يتم استخدام تقنيات متعددة مثل نماذج ماركوف المخفية والحقول العشوائية الشرطية لتحقيق هذه الأهداف. كما تستعرض الدراسة تطبيقات مختلفة لاستخلاص المعلومات في مجالات مثل الطب الحيوي والاستخبارات المالية. تعتمد منهجية البحث على الدراسات التتبعية لتتبع أحدث التقنيات والخوارزميات المستخدمة في هذا المجال. وتناقش الدراسة أيضا التحديات المرتبطة باستخلاص المعلومات غير الخاضع للإشراف واستخلاص المعلومات المفتوح من المدونات الكبيرة مثل شبكة الإنترنت.
Critical review
تعتبر هذه الدراسة شاملة ومفصلة في تناولها لموضوع استخلاص المعلومات من النصوص، إلا أنها قد تكون معقدة بعض الشيء للقارئ غير المتخصص. قد يكون من المفيد تضمين أمثلة عملية وتطبيقات واقعية لتوضيح الفوائد العملية لهذه التقنيات. بالإضافة إلى ذلك، يمكن تحسين الدراسة من خلال تقديم مقارنة بين مختلف الخوارزميات والتقنيات المستخدمة وتوضيح مزايا وعيوب كل منها. كما أن التركيز على التطبيقات العملية في مجالات أخرى غير الطب الحيوي والاستخبارات المالية قد يضيف قيمة إضافية للدراسة.
Questions related to the research
  1. ما هي المهمتين الرئيسيتين في استخلاص المعلومات من النصوص؟

    المهمتين الرئيسيتين هما التعرف على الكيانات المسماة واستخلاص العلاقات الدلالية بين هذه الكيانات.

  2. ما هي التقنيات المستخدمة في استخلاص المعلومات من النصوص؟

    التقنيات المستخدمة تشمل نماذج ماركوف المخفية والحقول العشوائية الشرطية.

  3. ما هي التطبيقات العملية لاستخلاص المعلومات المذكورة في الدراسة؟

    التطبيقات تشمل التقيب في الأدب الطبي الحيوي والاستخبارات المالية.

  4. ما هي التحديات المرتبطة باستخلاص المعلومات غير الخاضع للإشراف؟

    التحديات تشمل تحديد هياكل المعلومات المستخرجة والوثائق التوضيحية وفقا للبنى المعرفة، والتي تتطلب خبرة بشرية وتستغرق وقتا طويلا.


References used
Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, and Mabry Tyson. FASTUS: A finite-state processor for information extraction from realworld text. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993
Mary Elaine Califf and Raymond J. Mooney. Relational learning of patternmatch rules for information extraction. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference, pages 328–334, 1999
Tao Cheng, Xifeng Yan, and Kevin Chen-Chuan Chang. Supporting entity search: a large-scale prototype search engine. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pages 1144–1146, 2007
rate research

Read More

relation extraction systems have made extensive use of features generated by linguistic analysis modules. Errors in these features lead to errors of relation detection and classification. In this work, we depart from these traditional approaches w ith complicated feature engineering by introducing a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources. Our model takes advantages of multiple window sizes for filters and pre-trained word embeddings as an initializer on a nonstatic architecture to improve the performance.
We cast a suite of information extraction tasks into a text-to-triple translation framework. Instead of solving each task relying on task-specific datasets and models, we formalize the task as a translation between task-specific input text and output triples. By taking the task-specific input, we enable a task-agnostic translation by leveraging the latent knowledge that a pre-trained language model has about the task. We further demonstrate that a simple pre-training task of predicting which relational information corresponds to which input text is an effective way to produce task-specific outputs. This enables the zero-shot transfer of our framework to downstream tasks. We study the zero-shot performance of this framework on open information extraction (OIE2016, NYT, WEB, PENN), relation classification (FewRel and TACRED), and factual probe (Google-RE and T-REx). The model transfers non-trivially to most tasks and is often competitive with a fully supervised method without the need for any task-specific training. For instance, we significantly outperform the F1 score of the supervised open information extraction without needing to use its training set.
Information extraction and question answering have the potential to introduce a new paradigm for how machine learning is applied to criminal law. Existing approaches generally use tabular data for predictive metrics. An alternative approach is needed for matters of equitable justice, where individuals are judged on a case-by-case basis, in a process involving verbal or written discussion and interpretation of case factors. Such discussions are individualized, but they nonetheless rely on underlying facts. Information extraction can play an important role in surfacing these facts, which are still important to understand. We analyze unsupervised, weakly supervised, and pre-trained models' ability to extract such factual information from the free-form dialogue of California parole hearings. With a few exceptions, most F1 scores are below 0.85. We use this opportunity to highlight some opportunities for further research for information extraction and question answering. We encourage new developments in NLP to enable analysis and review of legal cases to be done in a post-hoc, not predictive, manner.
Recent information extraction approaches have relied on training deep neural models. However, such models can easily overfit noisy labels and suffer from performance degradation. While it is very costly to filter noisy labels in large learning resour ces, recent studies show that such labels take more training steps to be memorized and are more frequently forgotten than clean labels, therefore are identifiable in training. Motivated by such properties, we propose a simple co-regularization framework for entity-centric information extraction, which consists of several neural models with identical structures but different parameter initialization. These models are jointly optimized with the task-specific losses and are regularized to generate similar predictions based on an agreement loss, which prevents overfitting on noisy labels. Extensive experiments on two widely used but noisy benchmarks for information extraction, TACRED and CoNLL03, demonstrate the effectiveness of our framework. We release our code to the community for future research.
Key management in Wireless Sensor Networks (WSNs) is an important issue due to the absence of trusted infrastructures, on one hand, and the limited resources of sensor nodes, on the other hand. This paper surveys some recent key management approach es in WSNs. It first identifies some of the problems that confront the key management. Then, it defines some criteria for viable solutions to key management problems. Next, it explores some of the proposed key management approaches, and analyzes them according to the presented criteria. Some open research issues are discussed.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا