New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data science and knowledge extraction from raw data

علم البيانات واستخراج المعرفة من البيانات الخام

1917 0 67 0.0 ( 0 )

Download Cite

Added by شمرا محاضرة

Publication date 2019

fields Informatics Engineering

and research's language is العربية

Created by Shadi Saleh

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

No English abstract

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول الورقة البحثية موضوع علوم البيانات واستخراج المعرفة من البيانات الخام. يقدم المؤلف، شادي صالح، لمحة تاريخية عن تطور هذا المجال منذ عام 1965 عندما اقترح جون توكي إعادة هيكلة علم الإحصاء. يوضح المؤلف كيف توسع هذا العلم ليشمل تجهيز وتنقيح البيانات وتطبيق النماذج الإحصائية. كما يناقش المؤلف الفرق بين علم البيانات وعلم الإحصاء ويشير إلى أن عالم البيانات يجب أن يكون ماهرًا في توظيف تقنيات غير إحصائية. تتناول الورقة أيضًا المهارات المطلوبة لعالم البيانات، مثل البرمجة ومعالجة اللغات الطبيعية والتعلم الآلي. بالإضافة إلى ذلك، يقدم المؤلف حالات دراسية توضح كيفية استخدام هذه المهارات في مجالات مثل اكتشاف الهجمات الإلكترونية وتحليل النصوص الطبية والسياسية. أخيرًا، يسلط الضوء على بعض المهام التي يمكن إدراجها ضمن سياق تحليل البيانات والتعلم الآلي، مثل التنبؤ بالمخاطر الصحية وتحليل النصوص الرقمية.

Critical review

دراسة نقدية: على الرغم من أن الورقة تقدم نظرة شاملة ومفصلة حول علوم البيانات واستخراج المعرفة، إلا أنها تفتقر إلى الأمثلة العملية والتطبيقات الواقعية التي يمكن أن تساعد القراء على فهم كيفية تطبيق هذه المفاهيم في الحياة العملية. كما أن التركيز الكبير على الجانب التاريخي قد يكون مملًا لبعض القراء الذين يبحثون عن معلومات حديثة وتطبيقية. بالإضافة إلى ذلك، يمكن تحسين الورقة بإضافة المزيد من الرسوم البيانية والشروحات المرئية التي تسهل فهم المفاهيم المعقدة. وأخيرًا، كان من الأفضل تضمين دراسات حالة واقعية توضح كيفية استخدام علوم البيانات في حل مشكلات حقيقية في مجالات مختلفة.

Questions related to the research

ما هو الفرق الأساسي بين علم البيانات وعلم الإحصاء؟

الفرق الأساسي يكمن في أن علم البيانات يشمل تقنيات غير إحصائية مثل البرمجة ومعالجة اللغات الطبيعية، بينما يركز علم الإحصاء على النماذج الإحصائية التقليدية.
ما هي المهارات المطلوبة لعالم البيانات؟

المهارات المطلوبة تشمل البرمجة، معالجة اللغات الطبيعية، التعلم الآلي، الإحصاء، وتحليل البيانات.
ما هي بعض التطبيقات العملية لعلوم البيانات التي تم ذكرها في الورقة؟

بعض التطبيقات تشمل اكتشاف الهجمات الإلكترونية، تحليل النصوص الطبية، وتحليل النصوص السياسية.
ما هي أهمية علم البيانات في الاكتشافات العلمية؟

علم البيانات يعتبر قوة دافعة في الاكتشافات العلمية عن طريق التجريب، النمذجة، والحساب القائم على البيانات الهائلة، مما يساعد في جمع، إدارة، تحليل، ورسم البيانات لاستخلاص نتائج مفيدة.

Keywords

علوم البيانات استخراج المعرفة الإحصاء التعلم الآلي معالجة اللغات الطبيعية تحليل البيانات

References used

No references

rate research

Big Data with Machine Learning

2861 - Damascus University 2018 حلقة بحث

In recent years, time-critical processing or real-time processing and analytics of bid data have received a significant amount of attentions. There are many areas/domains where real-time processing of data and making timely decision can save thousand s of human lives, minimizing the risks of human lives and resources, enhance the quality of human lives, enhance the chance of profitability, efficient resources management etc. This paper has presented such type of real-time big data analytic applications and a classification of those applications. In addition, it presents the time requirements of each type of these applications along with its significant benefits. Also, a general overview of big data to describe a background knowledge on this scope.

Machine learning Data Analytics Big Data Applications

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

502 - Association for Computation Linguistics 2021 مقالة

This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.

mitigating data scarceness data scarceness تخفيف النفايات البيانات النفايات البيانات صناعة حمض الفوسفور

Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage?

401 - Association for Computation Linguistics 2021 مقالة

This survey/position paper discusses ways to improve coverage of resources such as WordNet. Rapp estimated correlations, rho, between corpus statistics and pyscholinguistic norms. rho improves with quantity (corpus size) and quality (balance). 1M wor ds is enough for simple estimates (unigram frequencies), but at least 100x more is required for good estimates of word associations and embeddings. Given such estimates, WordNet's coverage is remarkable. WordNet was developed on SemCor, a small sample (200k words) from the Brown Corpus. Knowledge Graph Completion (KGC) attempts to learn missing links from subsets. But Rapp's estimates of sizes suggest it would be more profitable to collect more data than to infer missing information that is not there.

انقسام ورسالة needed to improve اللازمة للتحسين صناعة حمض الفوسفور

Multilingual and Cross-Lingual Intent Detection from Spoken Data

729 - Association for Computation Linguistics 2021 مقالة

We present a systematic study on multilingual and cross-lingual intent detection (ID) from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the ID task with spoken d ata. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language varieties. Our key results indicate that combining machine translation models with state-of-the-art multilingual sentence encoders (e.g., LaBSE) yield strong intent detectors in the majority of target languages covered in MInDS-14, and offer comparative analyses across different axes: e.g., translation direction, impact of speech recognition, data augmentation from a related domain. We see this work as an important step towards more inclusive development and evaluation of multilingual ID from spoken data, hopefully in a much wider spectrum of languages compared to prior work.

cross-lingual intent detection spoken data الكشف عن النية عبر اللغات البيانات المنطوقة صناعة حمض الفوسفور

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

411 - Association for Computation Linguistics 2021 مقالة

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

data sanity check sanity check assessing the effect التحقق من البيانات الاختيار التعقل تقييم تأثير صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data science and knowledge extraction from raw data

علم البيانات واستخراج المعرفة من البيانات الخام

Ask ChatGPT about the research

No English abstract

Read More

suggested questions