Do you want to publish a course? Click here

Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to this end, pr ior work often relies on automatic evaluation of LM toxicity. We critically discuss this approach, evaluate several toxicity mitigation strategies with respect to both automatic and human evaluation, and analyze consequences of toxicity mitigation in terms of model bias and LM quality. We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the REALTOXICITYPROMPTS dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions---highlighting further the nuances involved in careful evaluation of LM toxicity.
The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this ta sk, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).
Automated storytelling has long captured the attention of researchers for the ubiquity of narratives in everyday life. The best human-crafted stories exhibit coherent plot, strong characters, and adherence to genres, attributes that current states-of -the-art still struggle to produce, even using transformer architectures. In this paper, we analyze works in story generation that utilize machine learning approaches to (1) address story generation controllability, (2) incorporate commonsense knowledge, (3) infer reasonable character actions, and (4) generate creative language.
Only a small portion of research papers with human evaluation for text summarization provide information about the participant demographics, task design, and experiment protocol. Additionally, many researchers use human evaluation as gold standard wi thout questioning the reliability or investigating the factors that might affect the reliability of the human evaluation. As a result, there is a lack of best practices for reliable human summarization evaluation grounded by empirical evidence. To investigate human evaluation reliability, we conduct a series of human evaluation experiments, provide an overview of participant demographics, task design, experimental set-up and compare the results from different experiments. Based on our empirical analysis, we provide guidelines to ensure the reliability of expert and non-expert evaluations, and we determine the factors that might affect the reliability of the human evaluation.
This papers presents a platform for monitoring press narratives with respect to several social challenges, including gender equality, migrations and minority languages. As narratives are encoded in natural language, we have to use natural processing techniques to automate their analysis. Thus, crawled news are processed by means of several NLP modules, including named entity recognition, keyword extraction,document classification for social challenge detection, and sentiment analysis. A Flask powered interface provides data visualization for a user-based analysis of the data. This paper presents the architecture of the system and describes in detail its different components. Evaluation is provided for the modules related to extraction and classification of information regarding social challenges.
Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks---reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1
The crisis in Syria since the beginning of the year 2011 had devastating effects on Syrian economy, which based on the priorities of the war economy and stopped the economic reform programs and long-term development plans to move to the social market economy and integration into the world economy.
This study is defining E-commerce, its sections, its reality in Syrian Arab Republic, and defining the most important problems and challenges facing establishment complete and successful Ecommerce in Syria, then reviewing solutions and suggestions to reach our aim.
This research aims to shed light on the motives and the challenges of the application of Basel II in banks operating in Syria. To achieve this goal, the researcher used the survey methodology, where the data was collected using a questionnaire and then analyzed by applying a set of statistical methods using the SPSS program. The results revealed that the banks operating in Syria are applying the Basel II Accord in response to regulatory and international requirement. With respect to the challenges of the application of the Accord, these banks face the challenges associated with the application of the first pillar (minimum capital requirements), as these banks do not have comprehensive historical data that can be relied on to measure credit, operational and market risks. In addition to the fact that they do not hold any international credit rating. There are also challenges associated with applying the second pillar (supervisory review), particularly related to the low number of workingstaff in the field of banking supervision, in addition to the challenges of the political circumstances and the prevailing economic conditions. Finally, the results showed an inverse relationship between the reality of the Basel II application in conventional banks operating in Syria and between each of the challenges associated with implementing the three pillars of the Accord and those associated with political and economic conditions. There is also an inverse relationship between the reality of the implementation and the challenges associated with the material and human resources merely in public banks.
The research aimed to identify the challenges that hinder the application of educational quality standards - adopted by Arab University Union Council - at HE according to teaching staff members at TU. The sample included (431) teaching staff members from different faculties and higher institutes at TU. To fulfill this research goal, a questionnaire was set to contain (39) paragraphs distributed on (5) axis : teaching staff members, academic programs, teaching methods and learning resources, university books, academic research, assessment and university ethics. The results showed that the most important challenges at the level of each axis were as follows: Faculty Members Axis: the lack of stimulus for faculty excellence in teaching academic research. Academic Programs Axis: the absence of periodic studies to make sure of the proportionality among the various programs in place at the university with the vision, mission, and objectives of the university, as well as colleges or affiliated higher institutes. Teaching Methods and Learning Resources Axis: the lack of research on the assessment of teaching methods and teaching aids used for the process. University Book Axis: the absence of a good design for the university book in terms of shape, printing, paper, graphics, and others. Academic Research Axis: the lack of research, publication and development support.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا