This site is 100% ad supported. Please add an exception to adblock for this site.

Test construction EPPP

Terms

undefined, object
copy deck
1) what does "p" stand for
2) what is the formula for calculating "p"
3) what is the range of "p"?
4) What do larger/smaller values mean?
5) many tests retain items with moderate difficulty levels. What
1) item difficulty index
2) total # examinees passing item / total # examinees
3) 0 - 1.0
4) larger = easy item
smaller = difficult item
5) p is close to .50
when p is close to .50:
1) test score variability increases/decreases?
2) test reliability increases/decreases?
3) discrimination between examinees increases/decreases?
4) what is the distribution
1) increase
2) increase
3) increase
4) normal
1) while most tests look for moderate difficulty levels (p), what type of test prefers retaining more difficult items
2) what value of p is usually optimal for these tests
1) true/false
2) .75 (whereas, most other tests look for moderate difficulty, p=.50)
1) what is item discrimination
2) to measure item discrimation, you calculate __. What is the symbol for this index?
3) how is it calculated
4) what is the range of this index
5) an item with what index is generally considered accepta
1) extent to which a test item discriminates BETWEEN examinees
2) item discrimination index (D)
3) D = U - L
L = % of examinees in lower scoring group
U = % of examinees in higher scoring group
4) -1.0 to +1.0
5) .35 or higher
1) What is "D"
2) when D = +1.0, what does this mean
3) when D = -1.0, what does this mean
1) D = item discrimination index
2) all upper-scoring group got the item correct
none of the lower-scoring group
3) all of the lower-scoring group got the item correct
none of the upper-scoring group
classical test theory vs. item response theory
1) which is sample invariant (same across different samples)
2) which is sample dependent (varies from sample to sample)
1) item response theory
2) classical test theory
1) Item response theory involves deriving __ for EACH item
2) what does it show
1) item characteristic curve
2) level of ability AND probability of answering item correctly
On an item characteristic curve
1) what is on the x axis? y axis?
2) how do you determine difficulty level
3) how do you determine the item's ability to discriminate btwn high and low achievers
4) how do you determine the probability
1) x = ability level
y = probability of correct response
2) look where on the curve, 50% got the correct response.
Then look for corresponding ability level.
3) slope of the curve
4) y intercept (point at which curve intercepts the vertical axis) - proportion of people with low ability who answered the item correctly
on an item characteristic curve, what does a steep slope indicate
the steeper the slope, the greater the item's ability to discriminate btwn high and low achievers
when using an achievement test developed on the basis of item response theory, an examinee's test score will indicate __
ability level
in classical test theory, an examinee's scores is composed of 2 components: __ & __
1) true score and error
reliability provides an estimate of the proportion of variability in an examinee's score that is due to __
TRUE differences among examinees on attributes measured by test
reliability coefficient
1) range
2) what does a low "r" mean?
3) what does a high "r" mean?
1) 0.0 to +1.0
2) r = 0 -> all variability in score is due to error
3) r = +1 -> all variability reflects true score variability (reliable)
1) r with subscript "xx" stands for
2) r with subscript "xy" stands for
1) reliability coefficient
2) validity coefficient
a reliability coefficient of .84 indicates that
1) __% of variability in scores is due to TRUE score differences
2) __% is due to error
1) 84%
2) 16%
which method for estimating reliability is associated with:
1) degree of stability (consistency)
2) coefficient of equivalence
3) coefficient alpha
1) test-retest
2) alternate forms
3) internal consistency
test-retest reliability
1) what is primary source of measurement error
2) it is inappropriate for determining reliability of test measuring what type of attribute
1) time sampling factors
2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood)
which method for estimating reliability can be used for the following:
1) aptitude
2) mood
3) speeded test
1) test-retest, alternate forms, internal consistency
2) internal consistency
3) alternate forms
alternate forms reliability
1) 2 primary sources of measurement error
2) it is inappropriate for determining reliability of test measuring what type of attribute
1) content sampling and time sampling factors
2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood)
internal consistency reliability
1) methods for evaluating
2) not appropriate for assessing reliability of what type of test? It will produce a coefficient that is too high/low?
1) split-half and coefficient alpha
2) speeded; too high
1) split-half reliability assesses what type of reliability
2) it usually under/over? estimates a test's true reliability
3) how is this corrected
1) internal consistency
2) under
3) Spearman-Brown prohecy forumula
As the length of a test decreses, the reliability decreases/increases?
decreases
1) Cronbach's coefficient alpha assesses what type of reliability
2) it provides the lower/upper boundary of a test's reliability
1) internal consistency
2) lower
1) KR-20 is used for what type of reliability?
2) it is a variation of what other method
3) how does it differ
1) internal consistency
2) coefficient alpha
3) KR-20 is used when items are scored dichotomously
which method for evaluating internal consistency reliability is used when items are scored dichotomously
KR-20
coefficient alpha
1) as the test content become more heterogeneous, coefficient alpha increases/decreases?
1) decreases
what correlation coefficient is uses with inter-rater reliability
kappa statistic
for inter-rater reliability, percent agreement will provide an over/under? estimate of the test's reliability
overestimate
consensual observer drift will aritificially inflate/deflate inter-rater reliability
inflate
1) what is the most thorough methond for estimating reliability
2) which method is NOT appropriate for speed tests
1) alternate forms
2) internal consistency
what method is used to estimate the effects of lengthening and shortening a test on its reliability coefficient
Spearman-Brown
Spearman-Brown tends to over/under? estimate a test's true reliability
overestimate
when the range of scores is restricted, the reliability coefficient is high/low
low
is the reliability coefficient high or low when:
1) item has low difficulty
2) item has average difficulty
3) item has high difficulty
1) high
2) low
3) low
to maximize reliability coefficient
1) increase/decrease test length
2) increase/decrease range of scores
3) increase/decrease heterogeneity among examinees
4) increase/decrease the probabilty of guessing correctly
5) p should be
1) increase
2) increase
3) increase
4) decrease
5) .50 (average item difficulty)
a reliability coefficient of __ is considered acceptable
.80 or larger
1) what is the standard error of measurement
2) what is the standard error of estimate
1) used to construct a confidence interval around a measured (obtained) score
2) used to construct a confidence interval around a predicted (estimated) crterion score
what is the formula for
1) standard error of measurement
2) standard error of estimate
1) SD x square root of 1 minus reliability coefficient squared

2) SD x square root of 1 minus validity coefficient squared
obtained test scores tend to be inaccurate estimates of true scores
1) scores ABOVE the mean tend to over/under?estimate true scores
2) scores BELOW the mean tend to over/under?estimate true scores
1) over
2) under
when the standard error of measurement = __,
an examinee's obtained scores can be interpreted as her true score
0
which of the following would be most appropriate for estimating reliability for anxiety
1) test-retest
2) alternate forms
3) coefficient alpha
3
what are the minimum and maximum values of the standard error of measurement
minimum = 0
maximum - SD of test scores
how do you establish content validity
judgment of subject matter experts
1) what are the types of construct validity
2) what methods are used to assess
1) convergent and disciminant
2) multitrait-multimethod matrix
AND
factor analysis
multitrait-multimethod matrix
1) which coefficient provides evidence of convergent validity? Is the coefficient large/small?
2) which coefficient provides evidence of discriminant validity? Is the coefficient large/small?
1) large monotrait-heteromethod
2) small heterotrait-monomethod
OR
small heterotrait-heteromethod
what does a factor analysis assess?
construct validity (convergent and discriminant)
In a factor matrix, correlation coefficients (factor loadings) indicate the degree of association btwn __ and __
each test and each factor
a test has a factor loading of .78 for Factor I. This means that __% of variability in the tests is accounted for by Factor I.
61% (.78 squared)
what is communality
total amount of variability in test scores explained by identified factors
Communality for a test is .64

This means that __% of variability in scores is explained by a combination of identified factors
64%

NOT squared because it is already square. Communality IS the amount of shared variance.
according to factor analysis, a test's reliability consists of what two components
communality and specificity

communality = factors tests share in common
specificity = factors specific to the test (not measured by other tests)
a communality is a lower/upper? estimate of a test's reliablity coefficient
lower-limit
if a test has a communality of .64, the reliability coefficient will necessarily be __
.64 or larger
two types of rotations in a factor analysis
1) which one is associate with uncorrelated factors?
2) with correlated factors?
1) orthogonal
2) oblique
when factors are orthogonal/oblique?, a test's communality can be calculated from factor loadings
orthogonal
In factor analysis, when factors are orthogonal, how do you calcualte communality?
communality = SUM of squared factor loadings
when a criterion-related validity coefficient is large, what does this indicate
predictor has criterion-related validity
what are the forms of criterion-related validity
concurrent and predictive
1) validity coefficients rarely exceed __
2) validity coefficients as low as __ might be acceptable
1) .60
2) .20-.30
how do you evaluate a predictor's incremental validity
scatterplot
for a scatterplot used to assess incremental validity, what determines:
1) positive/negative
2) true/false
1) predictor
2) criterion
how do you calculate incremental validity? How is each component calculated?
incremental validity = postive hit rate - base rate

base rate = true positive + false negative divided by total # of people

positive hit rate = true positives divided by total positives
if incremental validity = .34

test can be expected to increase proportion of sucessful employees by __%
34
relationship btwn predictor reliability and validity
1) what is the equation

relationship btwn predictor AND criterion reliability and validity
2) what is the equation
1) predictor's criterion-related validity coefficent is less than or equal to (cannot exceed) the sqaure root of its reliability coefficient

predictor's validity coefficient is less than or equal to (cannot exceed) the square root of the predictor's reliability coefficient TIMES the criterion's reliability coefficient
If a predictor has a reliability coefficient of .81, it's validity coefficient will necessarily be __ (exact number)
.90 or less
1) what is the correction for attenuation formula used for
2) does it tend to over/under?estimate the actual validity coefficient
to estimate what a predictor's validity coefficient WOULD be if the predictor and/or criterion were perfectly reliable (reliability coefficients = 1.0)
2) overestimate
criterion contamination
1) tends to inflate the relationship btwn __
2) results in an artificially high __ coefficient
1) predictor and criterion

2) criterion-related validity coefficient
when cross-validating a predictor on another smaple, the cross-validation coefficient tends to __
shrink
"shrinkage" refers to the the shrinking of __ when __
validity coefficient when the predictor is cross-validated
norm-referenced vs. criterion-referenced
Which are the following:
1) percentile ranks
2) percentages
3) regression equation
4) z-score
5) IQ score
1) norm
2) criterion
3) criterion
4) norm
5) norm
distribution of percentile ranks has what kind of shape
flat (rectangular)
regardless of the shape of the raw score distribution
what is the tranformation called when:
1) distribution of transformed scores DIFFERS in shape from the distribution of raw scores
2) has SAME shape?
3) example of the first?
1) nonlinear transformation
2) linear transformation
3) percentile ranks
when using correction for guessing, the resulting distribution will have (compared to original distribution)
1) lower/higher mean
2) smaller/larger SD
1) lower
2) larger

Deck Info

73

permalink