Test 2 for Final Preparation
Terms
undefined, object
copy deck
- observational study
- observes individuals and measures variables of interest but doesn't attempt to influence responses; poor way to gauge the effects of an intervention; frequently confounded with lurking variables
- experiment
- deliberately imposes some treatment on individuals in order to observe their responses
- confounded
- the explanatory or lurking variables' effects on a response variable can't be distinguished from each other
- population
- the entire group of individuals that we want information about
- sample
- part of the population that we actually examine in order to gather information
- design of a sample
- the method used to choose the sample from the population; poor sample designs produce misleading conclusions
- voluntary response sample
- people who choose themselvs by responding to general appeal; biased because people with strong opinions especially negative opinions are most likely to respond
- convenience sampling
- chooses the individuals the easiest to reach
- biased
- systematically favors certain outcomes
- simple random sample
- of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected
- table of random digits
- long string of digits with 2 properties 1. Each entry in the table is equally likely to be any of the 10 digits 0-9 2. The entries are independent of each other. That is, knoweldge of one part of the table gives no information about any other part
- probability sample
- sample chosen by choice. We must know what samples are possible and what chance, or probability, each possible sample has
- stratified random sample
- divide the population into groups of similar individuals, called strata, and then choose a separate SRS in each stratum and combine these SRSs to form the full sample
- undercoverage
- some groups in the population are left out of the process of choosing the sample
- nonresponse
- occurs when an individual chosen for the sample can't be contacted or refuses to cooperate
- response bias
- respondents lie, especially if asked about illegal or unpopular behavior. The sample then underestimates such behavior in the population
- subjects
- the individuals studied in an experiment
- factors
- the explanatory variables in an experiment
- treatment
- a specific experimental condition applied to the subjects
- randomization
- use of chance to divide experimental subjects into groups
- randomized comparative experiment
- an experiment that uses both comparison and randomization
- the basic principles of statistical design of experiments
- 1. Control the effects of lurking variables by comparing several treatments 2. Randomize--use impersonal chance to assign subjects to treatments 3. Use enough subjects in each group to reduce chance variation in the results
- statistically significant
- an observed effect so large that it would rarely occur by chance
- double blind experiment
- neither the subjects nor the people who work with them know which treatment each subject is receiving
- matched pairs design
- compares 2 treatments. Choose pairs of subjects that are as closely matched as possible. Assign one of the treatments to one of the subjects in a pair by tossing a coin or reading odd and even digits. The other subject gets the remaining treatment. SOmetimes each 'pair' in a matched pairs design consists of just one subject, who gets both treatments one after the other.
- random
- individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions
- probability
- of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions
- sample space S
- set of all possible outcomes
- event
- outcome or a set of outcomes of a random phenomenon. Subset of the sample space
- probability model
- mathematical description of a random phenomenon consisting of 2 parts, a sample space S and a way of assigning probabilities to events
- probability rules (4)
- 1. Any probability is a number between 0 and 1 2. All possible outcomes together must have probability 1 3. The probability that an event does not occur is 1 minus the probability that the event does occur 4. if 2 events have no outcoems in common, the probability that one or the other occurs is the sum of their individual probabilities
- normal distributions are probability models
- idealized description for data because the area under the curve equals 1.
- parameter
- number that describes the population. In statistical practice, the value of a parameter is not known because we cannot examine the entire population SUCH AS MU
- statistic
- number that can be computed from the sample data without making use of any unknown parameters. In practice, we often use a statistic to estimate an unknown parameter SUCH AS X BAR
- If x bar is rarely exactly right and varies from sample to sample, why is it nonetheless a reasonable estimate of the population mean mu?
- if we keep on taking larger and larger samples, the statistic x bar is guaranteed to get closer and closer to the parameter mu.
- law of large numbers
- draw observations from any population with finite mean mu. As the number of observations drawn increases, the mean x bar of the observed values gets closer and closer to the mean mu of the population
- sampling distribution
- is the distribution of values taken by the statistic in all possible samples of the same size from the same population
- mean of a sampling distribution of x bar
- mu; because the mean of x bar is equal to mu, we say that the statistic x bar is an unbiased estimator of the parameter of mu.
- standard deviation of a sampling distribution of x bar
- sigma/square root of n. How close the estimator falls to the parameter in most samples is determined by the spread of the sampling distribution. If individual observations have standard deviation alpha, then sample means x bar from samples of size n have standard deviation alpha/root n. Averages are less variable than individual observations.
- if individual observations have the N(mu, alpha0, then the sample mean x bar of n independent observations has ⬦
- N(mu, alpha/root n) distribution
- central limit theorem
- draw and SRS of size n from any population with mean mu and finite standard deviation alpha. When n is large, the sampling distribution of the sample mean x bar is approximately Normal: x bar is approximately N(mu, alpha/root n)
- statistical inference
- provides methods for drawing conclusions about a population from sample data
- basic calculation of confidence interval
- estimate plus-or-minus margin of error. The confidence level C, which give s the probability that the interval will capture the true parameter value in repeated samples. That is, the confidence level is the success rate for the method
- goals of confidence levels
- we would like high confidence and a small margin of error. High confidence says that our method almost always gives correct answers. A small margin of error says that we have pinned down the parameters quite precisely
- Understanding confidence levels
- * as Z gets smaller, the confidence level gets smaller. There's a trade off between the confidence level and the margin of error. To obtain a smaller margin of error from the same data, accept a lower confidence. **The standard deviation alpha measures the variation in the population. It's easier to pin down mu when alpha is small ***increasing the sample size n reduces the margin of error for any fixed confidence level. Because n appears under a square root sign, we must take four times as many observations in order to cut the margin of error in half
- how to solve for a specified margin of error
- n =(z*alpha/m)squared
- when you use confidence intervals?
- when you want to estimate a population parameter
- Null hypothesis
- the statement being tested in a statistical test . Usualy the null hypothesis is a statement of no effect or no difference
- the P value
- the probability, computed assuming that the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed. The smaller the P value, the stronger the evidence against the null hypothesis provided by the data
- What types of P values are strong and why?
- small p values are evidence against Hnought because they say that the observed result is unlikely to occur when Hnought is true. Large P values fail to give evidence against Hnought.
- statistically significant at level alpha
- if the p value is as small or smaller than alpha, we say that the data are statistically significant
- significance level
- the decisive value of P that determines how much evidence againt H nought we will insist on
- alpha
- the probability of wrongly rejecting Hnought when mu is zero
- power
- the probability that a test rejects the null hypothesis when an alternative is true
- cautions for the Z procedure
- 1data must be an SRS from the population 2.different measures are needed for different designs (Z isn't the correct procedure for designs more complex than SRS) 3.outliers distort results of confidence intervals and z tests 4. Shape of the population distribution matters; must have normality of hte sample mean x bar, z is reasonably accurate for any reasonably symmetric distribution for sampel for a sample of even moderate size 5. must know the standard deviation of hte population
- margin of error in a confidence interval doesn't cover all errors--explain
- 1. The margin of error ignores everything except sample-to-sample variation due to choosing the sample randomly 2. Undercoverage and nonresponse are often more serious than random sampling error. The margin of error doesn't take these difficulties into account
- significance test-why is a small p key?
- the purpose of the significance test is to describe the degree of evidence provided by the sample against the null hypothesis. How small a P is convincing depends on 1. How plausible is Hnought 2. What are the consequences of rejecting Hnought?
- difference between confidence intervals and significance tests
- The confidence interval estimates the size of an effect rather than asking if it is too large to reasonably occur by chance alone
- When is inference most reliable?
- When the data comes from a probability sample or a randomized comparative experiment. The deliberate use of chance ensures that the laws of probability apply to the outcomes and this in turn ensures that statistical inference makes sense.
- type two error
- the probability of :If we fail to reject Hnought when Ha is true, 1-B
- type one error
- probability of: if we reject Hnought when Hnought is true (alpha!!)