New community

Subscribe to the gold package and get unlimited access to Shamra Academy

How much coffee was consumed during EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

كم تم استهلاك القهوة خلال EMNLP 2019؟مشاكل Fermi: تحدي لسبب جديد ل AI

916 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

consumed during emnlp coffee was consumed fermi problems المستهلكة خلال EMNLP. تم استهلاك القهوة مشاكل فيرمي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

تتطلب العديد من مشاكل العالم الحقيقي التطبيق المشترك لقدرات التفكير المتعددة --- توظيف تجريدات مناسبة ومعرفة المنطقية والتليان الإبداعي لاستراتيجيات حل المشكلات. للمساعدة في تقدم أنظمة منظمة العفو الدولية تجاه هذه القدرات، نقترح تحديا جديدا لسبب جديد، أي مشاكل Fermi (FPS)، وهي أسئلة لا يمكن أن تكون إجاباتها تقديرية تقريبا تقريبا لأن حسابها الدقيق هو غير عملي أو مستحيل. على سبيل المثال، كم سيكون ارتفاع مستوى سطح البحر إذا ذابت كل الجليد في العالم؟ "FPS يستخدم عادة في الاختبارات والمقابلات لإظهار وتقييم قدرات التفكير الإبداعي للبشر. أن تفعل الشيء نفسه بالنسبة لأنظمة منظمة العفو الدولية، نقدم مجموعة بياناتين: 1) مجموعة من 1K العالم الحقيقي FPS المصادر من مسابقات وأولمبياد؛ و 2) بنك FPS الاصطناعي 10K من التعقيد المتوسط لتكون بمثابة رمل للتحدي العالمي الحقيقي. بالإضافة إلى أزواج الإجابات السؤالية، تحتوي مجموعات البيانات على حلول مفصلة في شكل برنامج قابل للتنفيذ ودعم الحقائق، والمساعدة في الإشراف وتقييم الخطوات المتوسطة. نوضح أنه حتى النماذج اللغوية على نطاق واسع على نطاق واسع تؤدي بشكل سيئ في مجموعات البيانات، في متوسط تقديرات الإجراءات التي يتم إيقافها من قبل أوامر من حجمها. وبالتالي فإن مساهمتنا هي بلورة العديد من مشاكل منظمة العفو الدولية غير المتولدة في تحدي واحد، ونحن نأمل أن تحفز المزيد من التقدم في بناء أنظمة يمكن أن يكون السبب.

Many real-world problems require the combined application of multiple reasoning abilities---employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because their precise computation is either impractical or impossible. For example, How much would the sea level rise if all ice in the world melted?'' FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans. To do the same for AI systems, we present two datasets: 1) A collection of 1k real-world FPs sourced from quizzes and olympiads; and 2) a bank of 10k synthetic FPs of intermediate complexity to serve as a sandbox for the harder real-world challenge. In addition to question-answer pairs, the datasets contain detailed solutions in the form of an executable program and supporting facts, helping in supervision and evaluation of intermediate steps. We demonstrate that even extensively fine-tuned large-scale language models perform poorly on these datasets, on average making estimates that are off by two orders of magnitude. Our contribution is thus the crystallization of several unsolved AI problems into a single, new challenge that we hope will spur further advances in building systems that can reason.

References used

https://aclanthology.org/

rate research

The Reality Of The Russian-European Partnership During The Period 2000-2019

1159 - Tishreen University 2021 ورقة بحثية

This research aimed at determining the reality of the partnership between Russia and the European Union during the period 2000-2019, the extent of the relationship of GDP to foreign trade and the degree of economic openness. The descriptive and analy tical approach has been relied on in the analysis of Russia's tools and policy towards trade exchange and partnership with the European Union. Where the data of GDP, exports and imports, and the trade balance was relied upon to calculate the average annual increase, the average growth rate and the degree of economic exposure. The most important conclusions were Russia’s endeavor to build a partnership with the European Union based on joint cooperation and dealing with issues of security and common neighborhood, promoting and diversifying trade exchanges, and that the Russian economy is not exposed to the European economy. The most important recommendations were represented in the necessity of Russia diversifying its exports to European markets and not relying solely on the export of oil and natural gas.

التجارة الخارجية الشراكة الناتج المحلي الأجمالي روسيا والاتحاد الاوروبي

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain

322 - Association for Computation Linguistics 2021 مقالة

Reliable tagging of Temporal Expressions (TEs, e.g., Book a table at L'Osteria for Sunday evening) is a central requirement for Voice Assistants (VAs). However, there is a dearth of resources and systems for the VA domain, since publicly-available te mporal taggers are trained only on substantially different domains, such as news and clinical text. Since the cost of annotating large datasets is prohibitive, we investigate the trade-off between in-domain data and performance in DA-Time, a hybrid temporal tagger for the English VA domain which combines a neural architecture for robust TE recognition, with a parser-based TE normalizer. We find that transfer learning goes a long way even with as little as 25 in-domain sentences: DA-Time performs at the state of the art on the news domain, and substantially outperforms it on the VA domain.

major effort voice assistant domain جهد كبير مجال مساعد الصوت صناعة حمض الفوسفور

How many data points is a prompt worth?

531 - Association for Computation Linguistics 2021 مقالة

When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction. Proponents of prompting have argued that prompts provide a method for injecting task-specific guidance, which is beneficial in low-data regimes. We aim to quantify this benefit through rigorous testing of prompts in a fair setting: comparing prompted and head-based fine-tuning in equal conditions across many tasks and data sizes. By controlling for many sources of advantage, we find that prompting does indeed provide a benefit, and that this benefit can be quantified per task. Results show that prompting is often worth 100s of data points on average across classification tasks.

data points prompting نقاط البيانات أفعال صناعة حمض الفوسفور

FCM: A Fine-grained Comparison Model for Multi-turn Dialogue Reasoning

474 - Association for Computation Linguistics 2021 مقالة

Despite the success of neural dialogue systems in achieving high performance on the leader-board, they cannot meet users' requirements in practice, due to their poor reasoning skills. The underlying reason is that most neural dialogue models only cap ture the syntactic and semantic information, but fail to model the logical consistency between the dialogue history and the generated response. Recently, a new multi-turn dialogue reasoning task has been proposed, to facilitate dialogue reasoning research. However, this task is challenging, because there are only slight differences between the illogical response and the dialogue history. How to effectively solve this challenge is still worth exploring. This paper proposes a Fine-grained Comparison Model (FCM) to tackle this problem. Inspired by human's behavior in reading comprehension, a comparison mechanism is proposed to focus on the fine-grained differences in the representation of each response candidate. Specifically, each candidate representation is compared with the whole history to obtain a history consistency representation. Furthermore, the consistency signals between each candidate and the speaker's own history are considered to drive a model prefer a candidate that is logically consistent with the speaker's history logic. Finally, the above consistency representations are employed to output a ranking list of the candidate responses for multi-turn dialogue reasoning. Experimental results on two public dialogue datasets show that our method obtains higher ranking scores than the baseline models.

multi-turn dialogue reasoning dialogue reasoning fine-grained comparison model منطق الحوار متعدد الدوران سبب الحوار طراز مقارنة غرامة صناعة حمض الفوسفور المزيد..

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

309 - Association for Computation Linguistics 2021 مقالة

Humans can learn a new language task efficiently with only few examples, by leveraging their knowledge obtained when learning prior tasks. In this paper, we explore whether and how such cross-task generalization ability can be acquired, and further a pplied to build better few-shot learners across diverse NLP tasks. We introduce CrossFit, a problem setup for studying cross-task generalization ability, which standardizes seen/unseen task partitions, data access during different learning stages, and the evaluation protocols. To instantiate different seen/unseen task partitions in CrossFit and facilitate in-depth analysis, we present the NLP Few-shot Gym, a repository of 160 diverse few-shot NLP tasks created from open-access NLP datasets and converted to a unified text-to-text format. Our analysis reveals that the few-shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks. We also observe that the selection of upstream learning tasks can significantly influence few-shot performance on unseen tasks, asking further analysis on task similarity and transferability.

few-shot learning challenge learning challenge cross-task generalization ability تحدي التعلم قليل النار تحدي التعلم القدرة على تعميم المهام صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

How much coffee was consumed during EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

كم تم استهلاك القهوة خلال EMNLP 2019؟مشاكل Fermi: تحدي لسبب جديد ل AI

Ask ChatGPT about the research

Read More

suggested questions