Analytic Selection of a Valid Subtest for DIF Analysis when DIF has Multiple Potential Causes among Multiple Groups
Document
Description
The study examined how ATFIND, Mantel-Haenszel, SIBTEST, and Crossing SIBTEST function when items in the dataset are modelled to differentially advantage a lower ability focal group over a higher ability reference group. The primary purpose of the study was to examine ATFIND's usefulness as a valid subtest selection tool, but it also explored the influence of DIF items, item difficulty, and presence of multiple examinee populations with different ability distributions on both its selection of the assessment test (AT) and partitioning test (PT) lists and on all three differential item functioning (DIF) analysis procedures. The results of SIBTEST were also combined with those of Crossing SIBTEST, as might be done in practice.
ATFIND was found to be a less-than-effective matching subtest selection tool with DIF items that are modelled unidimensionally. If an item was modelled with uniform DIF or if it had a referent difficulty parameter in the Medium range, it was found to be selected slightly more often for the AT List than the PT List. These trends were seen to increase as sample size increased. All three DIF analyses, and the combined SIBTEST and Crossing SIBTEST, generally were found to perform less well as DIF contaminated the matching subtest, as well as when DIF was modelled less severely or when the focal group ability was skewed. While the combined SIBTEST and Crossing SIBTEST was found to have the highest power among the DIF analyses, it also was found to have Type I error rates that were sometimes extremely high.
ATFIND was found to be a less-than-effective matching subtest selection tool with DIF items that are modelled unidimensionally. If an item was modelled with uniform DIF or if it had a referent difficulty parameter in the Medium range, it was found to be selected slightly more often for the AT List than the PT List. These trends were seen to increase as sample size increased. All three DIF analyses, and the combined SIBTEST and Crossing SIBTEST, generally were found to perform less well as DIF contaminated the matching subtest, as well as when DIF was modelled less severely or when the focal group ability was skewed. While the combined SIBTEST and Crossing SIBTEST was found to have the highest power among the DIF analyses, it also was found to have Type I error rates that were sometimes extremely high.