Psychological Test Construction
Terms
undefined, object
copy deck
- What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)
- This describes the extent to which items contribute to the stated goals of testing
-
The first dimension of Relevance is
CONTENT APPROPRIATENESS - If content appropriate, the item assesses the behavior domain the test is intended to evaluate
- 2nd dimension of Relevance: TAXONOMIC LEVEL
- Does item reflect appropriate cognitive or ability level of population its intended for?
- 3rd dimension of Relevance: EXTRANEOUS ABILITIES
- To what extent are knowledge or skills needed that is outside the domain being eval'd?
- ITEM DIFFICULTY
- It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false-(.75)
- ITEM DISCRIMINATION
- Extent an item differ-entiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable
- CLASSICAL TEST THEORY
- Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity
- ITEM RESPONSE THEORY (IRT)
- Tests based on examinees level on the trait being measured vs. total test score.
- "Item Characteristic Curve".
- Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability
- RELIABILITY
- The ability of a measure to provide consistent, dependable results.
- RELIABILITY COEFFICENT
- Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret
- TEST-RETEST RELIABILITY
- Administering the same test to same group on 2 diff. occasions.
- ALTERNATE FORM RELIABILITY
- 2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.
-
INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half
B. Coefficient Alpha - Admin test once to a single group. Coeff. of internal consistency is calculated
- Split-Half
- 2 scores are derived by splitting test into = halves, and are then correlated. Often uses odd-even# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test
- Cronbach's Coeff. Alpha
- A Test is given once and a formula applied to determine reliability. The result--the avg. reliability obtained from all possible splits of the test.
- Kuder-Richardson Formula
- Variation of Cronbach's when items scored dichotomously (Yes/No)
- Inter-Rater Reliability
- Reliability determined % of agreement between 2 or more raters. Associate:Kappa Statistic
-
Standard Error of Measurement
(SEM) - An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of test. The greater the reliability, the smaller the SEM. Know the formula.
- Standard Error of Estimation(SEE)
- Another form of SEM that takes into account regression to the mean. The SEE formula is used in the WAIS and WMS. Centers confidence interval on Estimated True vs. Observed score. Its a correction for tru-score regression toward the mean.
- Validity
- Test Accuracy—Does it measure what its intended to measure
- Content Validity
- The extent a test adequately samples the content or behavior domain it is trying to measure
- What is the primary way that content validity is established?
- Answer: The judgment of subject matter experts. If experts agree items are adequate and representative, then the test is said to have content validity
- What qualitative evidence do you look for in a task that has good content validity?
-
1)coefficient of internal consistency will be large
2)The test will correlate highly with other tests of the same domain
3)pre-and posttest evaluations of the program designed to increase familiarity with domain will indicate appropriate changes - Construct validity
- When the test has been found to measure the trait or hypothetical construct that it is intended to measure
- In order to establish construct validity, what must occur?
- Answer: there needs to have been a systematic accumulation of evidence showing that the test actually measures the construct it was designed to measure--like intelligence, self-esteem
- Convergent Validity
- One method of evaluating a tests construct validity. One correlates test scores with scores on measures that do and don't purport to assess the same trait. High correlations with measures of the same trait provide evidence of Convergent Validity
- Discriminant Validity
- Another aspect of Construct Validity...when there are low correlations with measures of unrelated characteristics
- Multitrait-Multimethod Matrix
- A method of systematically organizing data to assess a test's convergent and discriminant validity. A matrix table is generated and comprised of correlation coefficients. Needs to be two or more traits that a each been assessed using two or
- Factor Analysis
- A statistical analysis conducted to identify minimum number of common factors required to account for intercorrelations among a set of tests, subtests, or test items.
- What method could you use to see if there is good construct validity and associated good convergent and discriminant validity?
- Factor Analysis
- Question: How do you determine the meaning of a factor loading and the amount of variability in test scores that is explained by the factor?
- Square of the correlation coefficient obtained in the factor analysis.
- The correlation between a test and a factor is referred to as a what?
- A Factor Loading
- Communality
-
Indicates the total amount of variability in test scores that is explained by the identified factors that have been correlated
(e.g., Factor I and Factor II). - Specificity
- The portion of true score variability that has not been explained by the factor analysis
- Orthogonal Rotation
- When a rotation is orthogonal, the resulting factors are uncorrelated. The attribute measured by one factor is independent from the attributes measured by the other factor.
- Oblique Rotation
- When the rotation is oblique, the resulting factors are correlated and the attributes measured by the factors are not independent.
- Criterion-Related Validity
- whenever test scores are utilized to draw conclusions about and examinees likely standing or performance on another measure this is especially important.
- Which type of validity is key in a situation where the goal of testing is to predict how well an applicant will do on a measure of job performance after they are hired?
- Answer: Criterion-related validity when the resulting criterion-related validity coefficient is sufficiently large, this confirms at the predictor (or test) has criterion related validity
-
Concurrent Validity
(associated with criterion related validity) - When criterion data is collected prior to or at the same time as predictor data
- Predictive Validity
- Criterion data is collected after predictor data. Preferred when purpose of testing is to predict future performance on the criterion.
- How do you interpret criterion-related validity coefficients?
- You square the correlation coefficient to interpret it only when it represents the correlation between two different tests or other variables.
- What is shared variability?
- When the correlation between two measures is squared, it provides a measure of shared variability.
- Standard Error of Estimate(SEE)
- Used to construct a confidence interval around day predicted or estimated criterion score. The magnitude of the SEE affected by the standard deviation of the criterion scores and the predictor's criterion related validity coefficient
- Incremental Validity
- The increase in correct decisions that can be expected if the predictor is used as a decision-making tool. Criterion and predictor cut off scores must be set.
- True Positives
- Predicted to succeed by the predictor and are successful on the criterion
- False positives
- Predicted to succeed by the predictor and are not successful on the criterion
- True Negatives
- Predicted to be unsuccessful by predictor and are unsuccessful on the criterion
- False Negatives
- Predicted to be unsuccessful by the predictor and are successful on the criterion
- Base rate
- The proportion of people who were selected without use of the predictor and who are currently considered successful on the criterion
- Positive Hit Rate
- The True Positives divided by the Total Positives. The positives are people who are i.d. as having the disorder by the predictor. Negatives are people who are not i.d. as having the disorder by the predictor