Do you want to publish a course? Click here

An item response theory evaluation of the Light and Spectroscopy Concept Inventory national data set

62   0   0.0 ( 0 )
 Added by Colin Wallace
 Publication date 2017
  fields Physics
and research's language is English




Ask ChatGPT about the research

This paper presents the first item response theory (IRT) analysis of the national data set on introductory, general education, college-level astronomy teaching using the Light and Spectroscopy Concept Inventory (LSCI). We used the difference between students pre- and post-instruction IRT-estimated abilities as a measure of learning gain. This analysis provides deeper insights than prior publications into both the LSCI as an instrument and into the effectiveness of teaching and learning in introductory astronomy courses. Our IRT analysis supports the classical test theory findings of prior studies using the LSCI with this population. In particular, we found that students in classes that used active learning strategies at least 25% of the time had average IRT-estimated learning gains that were approximately 1 logit larger than students in classes that spent less time on active learning strategies. We also found that instructors who want their classes to achieve an improvement in abilities of average $Delta theta = 1$ logit must spend at least 25% of class time on active learning strategies. However, our analysis also powerfully illustrates the lack of insight into student learning that is revealed by looking at a single measure of learning gain, such as average $Delta theta$. Educators and researchers should also examine the distributions of students abilities pre- and post-instruction in order to understand how many students actually achieved an improvement in their abilities and whether or not a majority of students have moved to post-abilities significantly greater than the national average.



rate research

Read More

Ishimoto, Davenport, and Wittmann have previously reported analyses of data from student responses to the Force and Motion Conceptual Evaluation (FMCE), in which they used item response curves (IRCs) to make claims about American and Japanese students relative likelihood to choose certain incorrect responses to some questions. We have used an independent data set of over 6,500 American students responses to the FMCE to generate IRCs to test their claims. Converting the IRCs to vectors, we used dot product analysis to compare each response item quantitatively. For most questions, our analyses are consistent with Ishimoto, Davenport, and Wittmann, with some results suggesting more minor differences between American and Japanese students than previously reported. We also highlight the pedagogical advantages of using IRCs to determine the differences in response patterns for different populations to better understand student thinking prior to instruction.
72 - Brahim Lamine 2015
Conceptual tests are widely used by physics instructors to assess students conceptual understanding and compare teaching methods. It is common to look at students changes in their answers between a pre-test and a post-test to quantify a transition in students conceptions. This is often done by looking at the proportion of incorrect answers in the pre-test that changes to correct answers in the post-test -- the gain -- and the proportion of correct answers that changes to incorrect answers -- the loss. By comparing theoretical predictions to experimental data on the Force Concept Inventory, we shown that Item Response Theory (IRT) is able to fairly well predict the observed gains and losses. We then use IRT to quantify the students changes in a test-retest situation when no learning occurs and show that $i)$ up to 25% of total answers can change due to the non-deterministic nature of students answer and that $ii)$ gains and losses can go from 0% to 100%. Still using IRT, we highlight the conditions that must satisfy a test in order to minimize gains and losses when no learning occurs. Finally, recommandations on the interpretation of such pre/post-test progression with respect to the initial level of students are proposed.
Research-based assessment instruments (RBAIs) are ubiquitous throughout both physics instruction and physics education research. The vast majority of analyses involving student responses to RBAI questions have focused on whether or not a student selects correct answers and using correctness to measure growth. This approach often undervalues the rich information that may be obtained by examining students particular choices of incorrect answers. In the present study, we aim to reveal some of this valuable information by quantitatively determining the relative correctness of various incorrect responses. To accomplish this, we propose an assumption that allow us to define relative correctness: students who have a high understanding of Newtonian physics are likely to answer more questions correctly and also more likely to choose better incorrect responses, than students who have a low understanding. Analyses using item response theory align with this assumption, and Bocks nominal response model allows us to uniquely rank each incorrect response. We present results from over 7,000 students responses to the Force and Motion Conceptual Evaluation.
Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of time and memory consumption on association rule mining by taking application of specific information into account. It proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than some recently reported new frequent-pattern mining.
Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks. Recent results from large pretrained models, though, show that many of these datasets are largely saturated and unlikely to be able to detect further progress. What kind of datasets are still effective at discriminating among strong models, and what kind of datasets should we expect to be able to detect future improvements? To measure this uniformly across datasets, we draw on Item Response Theory and evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا