Do you want to publish a course? Click here

Reference-based automatic evaluation metrics are notoriously limited for NLG due to their inability to fully capture the range of possible outputs. We examine a referenceless alternative: evaluating the adequacy of English sentences generated from Ab stract Meaning Representation (AMR) graphs by parsing into AMR and comparing the parse directly to the input. We find that the errors introduced by automatic AMR parsing substantially limit the effectiveness of this approach, but a manual editing study indicates that as parsing improves, parsing-based evaluation has the potential to outperform most reference-based metrics.
Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches have so far failed to correlate better than ROUGE with human judgments. In this paper, we extend previous approaches and propose a unified framework, named QuestEval. In contrast to established metrics such as ROUGE or BERTScore, QuestEval does not require any ground-truth reference. Nonetheless, QuestEval substantially improves the correlation with human judgments over four evaluation dimensions (consistency, coherence, fluency, and relevance), as shown in extensive experiments.
ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.
Bridges are important and vital structures that provide communication between different regions. Due to their importance, their design has had a great attention throughout the world, as evidenced by the continuous development of seismic design code s such as AASHTO, which now adopts the performance-based design, and requires through a set of requirements and criteria that the bridge performance under the design earthquake is at life safety (LS) level and thus ensures that it does not collapse. Since most of the local bridges were built at periods where these criteria were not adopted, it is important to verify their performance and that they meet these criteria. In this paper, the seismic performance of an existing bridge, which represents a model for a wide range of local multi-span simply supported bridges, was evaluated by developing a 3D model of the bridge using SAP2000V19.1 and applying the nonlinear static analysis. The results of analysis have been used to verify that the bridge meets the AASHTO seismic requirements, which include the check of the P-Δ requirement, displacement demand/ capacity, members (columns) ductility, as well as the check of the shear demand/capacity of the columns. The results showed that the performance of the studied bridge under the seismic intensities adopted in the Syrian code achives the acceptable level which is life safity (LS), but it exceeds this level under high seismic intensity. The research also showed that there is compatibility between the results of nonlinear static analysis and AASHTO requirements.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا