EPPP Test Construction 2
Terms
undefined, object
copy deck
- Reliability Coefficient
-
Measure of how much obtained score is true ability
-Interpret directly (70% means 70% is true ability, 30% error)
-A good test should have at least 0.7 or higher - Classical Test Theory
-
Results are:
1. True Score (Ability)
-true variance
2. Some Error (Fatigue...)
-error variance - Reliablity
-
-Establish reliability first (Test can be reliable but not valid.)
-Consistency - Validity
- -Accuracy
- Validity can not exceed...
- the square root of reliablity
- Types of Reliability
-
1. Test-retest reliability (Coefficient of stability)
2. Alternate Forms (Considered the best but least used)
3. Internal Consistency (Compares test against itself) - Types of Internal Consistency Reliablity
-
1. Split-Half (split test, problem is restricted range)
-can use Spearman-Brown Prophecy Formula to make it like 2 tests
2. Inter-Item Consistency (compare items on one test one against the other in a systematic way)
-can use Cronbach's Alpha (compare items on test individually against all others systematically) or Kuder Richardson Formula 20 (special version of Cronbach, use when you have true/false or yes/no dichotomous test items) - Kappa Coefficient
- Inter rater reliability
- Standard Error of Measurement
-
-Based on reliability coefficient
-Try to get an idea of what a person's true ability is
-Based on a person's single score but has properties of a normal curve
-the more reliable the test, the less the SE of measurement - Standard Error of Mean
- -How will sample represent population?
- It is best to have ___________ items and _____________ test takers for a test to be most reliable.
-
Homogeneous
Heterogeneous - Content Validity
-
Based on expert judgement
-academic tests - Criterion-Related Validity
-
Outcome
-look at relationship between predictor and outcome
-used most often in personnel psych (predicting job performance, etc)
-two types are predictive validity (who will become schizophrenic?, predicts future behavior) and concurrent validity (who is schizophrenic now?, test results NOW) - Construct Validity
-
Can not directly define
-Two types are convergent (compare new test with established test that measures same construct) and divergent (discriminant validity - you want your test to have nothing in common with another test of a different construct) - Multitrait-Multimethod Matrix
-
If it's a single trait, will establish convergent validity - need at HIGH monotrait number to establish convergent validity
-If it's a heterogeneous trait, will need a low trait number to establish divergent validity - Face Validity
- Does the test make sense to the people who are taking it?
- Cross Validation
-
Give test instrument again and again
-Shrinkage may occur (range of scores will shrink slightly when you initially cross validate instruments) - Incremental Validity
- Can we increase that number of correct decisions we are already making?
- Three things to establish Incremental Validity
-
1. base rate - moderate (number of decisions you are already making correctly)
2. selection ratio - need low selection ratio (number of jobs available to number of applicants)
3. validity coefficient - high validity on predictor and criterion - Criterion-Referenced Scores
- -Do not compare score to anyone else, just meeting a standard
- Norm-Referenced Scores
-
-Score is compared to other individuals
-Two types: percentile ranks (not used as much now) and standard scores (transformed scores that allow you to compare) - Floor Effect
-
-bunch of test takers at bottom of test range
-need to have enough easy items - Ceiling Effect
- -need to have enough difficult items to discriminate between best test takeers