This site is 100% ad supported. Please add an exception to adblock for this site.

Psychological Test Construction

Terms

undefined, object
copy deck
What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)
This describes the extent to which items contribute to the stated goals of testing
The first dimension of Relevance is
CONTENT APPROPRIATENESS
If content appropriate, the item assesses the behavior domain the test is intended to evaluate
2nd dimension of Relevance: TAXONOMIC LEVEL
Does item reflect appropriate cognitive or ability level of population its intended for?
3rd dimension of Relevance: EXTRANEOUS ABILITIES
To what extent are knowledge or skills needed that is outside the domain being eval'd?
ITEM DIFFICULTY
It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false-(.75)
ITEM DISCRIMINATION
Extent an item differ-entiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable
CLASSICAL TEST THEORY
Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity
ITEM RESPONSE THEORY (IRT)
Tests based on examinees level on the trait being measured vs. total test score.
"Item Characteristic Curve".
Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability
RELIABILITY
The ability of a measure to provide consistent, dependable results.
RELIABILITY COEFFICENT
Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret
TEST-RETEST RELIABILITY
Administering the same test to same group on 2 diff. occasions.
ALTERNATE FORM RELIABILITY
2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.
INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half
B. Coefficient Alpha
Admin test once to a single group. Coeff. of internal consistency is calculated
Split-Half
2 scores are derived by splitting test into = halves, and are then correlated. Often uses odd-even# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test
Cronbach's Coeff. Alpha
A Test is given once and a formula applied to determine reliability. The result--the avg. reliability obtained from all possible splits of the test.
Kuder-Richardson Formula
Variation of Cronbach's when items scored dichotomously (Yes/No)
Inter-Rater Reliability
Reliability determined % of agreement between 2 or more raters. Associate:Kappa Statistic
Standard Error of Measurement
(SEM)
An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of test. The greater the reliability, the smaller the SEM. Know the formula.
Standard Error of Estimation(SEE)
Another form of SEM that takes into account regression to the mean. The SEE formula is used in the WAIS and WMS. Centers confidence interval on Estimated True vs. Observed score. Its a correction for tru-score regression toward the mean.
Validity
Test Accuracy—Does it measure what its intended to measure
Content Validity
The extent a test adequately samples the content or behavior domain it is trying to measure
What is the primary way that content validity is established?
Answer: The judgment of subject matter experts. If experts agree items are adequate and representative, then the test is said to have content validity
What qualitative evidence do you look for in a task that has good content validity?
1)coefficient of internal consistency will be large

2)The test will correlate highly with other tests of the same domain

3)pre-and posttest evaluations of the program designed to increase familiarity with domain will indicate appropriate changes
Construct validity
When the test has been found to measure the trait or hypothetical construct that it is intended to measure
In order to establish construct validity, what must occur?
Answer: there needs to have been a systematic accumulation of evidence showing that the test actually measures the construct it was designed to measure--like intelligence, self-esteem
Convergent Validity
One method of evaluating a tests construct validity. One correlates test scores with scores on measures that do and don't purport to assess the same trait. High correlations with measures of the same trait provide evidence of Convergent Validity
Discriminant Validity
Another aspect of Construct Validity...when there are low correlations with measures of unrelated characteristics
Multitrait-Multimethod Matrix
A method of systematically organizing data to assess a test's convergent and discriminant validity. A matrix table is generated and comprised of correlation coefficients. Needs to be two or more traits that a each been assessed using two or
Factor Analysis
A statistical analysis conducted to identify minimum number of common factors required to account for intercorrelations among a set of tests, subtests, or test items.
What method could you use to see if there is good construct validity and associated good convergent and discriminant validity?
Factor Analysis
Question: How do you determine the meaning of a factor loading and the amount of variability in test scores that is explained by the factor?
Square of the correlation coefficient obtained in the factor analysis.
The correlation between a test and a factor is referred to as a what?
A Factor Loading
Communality
Indicates the total amount of variability in test scores that is explained by the identified factors that have been correlated
(e.g., Factor I and Factor II).
Specificity
The portion of true score variability that has not been explained by the factor analysis
Orthogonal Rotation
When a rotation is orthogonal, the resulting factors are uncorrelated. The attribute measured by one factor is independent from the attributes measured by the other factor.
Oblique Rotation
When the rotation is oblique, the resulting factors are correlated and the attributes measured by the factors are not independent.
Criterion-Related Validity
whenever test scores are utilized to draw conclusions about and examinees likely standing or performance on another measure this is especially important.
Which type of validity is key in a situation where the goal of testing is to predict how well an applicant will do on a measure of job performance after they are hired?
Answer: Criterion-related validity when the resulting criterion-related validity coefficient is sufficiently large, this confirms at the predictor (or test) has criterion related validity
Concurrent Validity
(associated with criterion related validity)
When criterion data is collected prior to or at the same time as predictor data
Predictive Validity
Criterion data is collected after predictor data. Preferred when purpose of testing is to predict future performance on the criterion.
How do you interpret criterion-related validity coefficients?
You square the correlation coefficient to interpret it only when it represents the correlation between two different tests or other variables.
What is shared variability?
When the correlation between two measures is squared, it provides a measure of shared variability.
Standard Error of Estimate(SEE)
Used to construct a confidence interval around day predicted or estimated criterion score. The magnitude of the SEE affected by the standard deviation of the criterion scores and the predictor's criterion related validity coefficient
Incremental Validity
The increase in correct decisions that can be expected if the predictor is used as a decision-making tool. Criterion and predictor cut off scores must be set.
True Positives
Predicted to succeed by the predictor and are successful on the criterion
False positives
Predicted to succeed by the predictor and are not successful on the criterion
True Negatives
Predicted to be unsuccessful by predictor and are unsuccessful on the criterion
False Negatives
Predicted to be unsuccessful by the predictor and are successful on the criterion
Base rate
The proportion of people who were selected without use of the predictor and who are currently considered successful on the criterion
Positive Hit Rate
The True Positives divided by the Total Positives. The positives are people who are i.d. as having the disorder by the predictor. Negatives are people who are not i.d. as having the disorder by the predictor

Deck Info

51

permalink