Methods 2
Terms
undefined, object
copy deck
- 4 main assumptions of linear model and their assumtpions
-
1. Fixed X
2. Sum of errors=0
leads to unbiasedness
3. Homoscdeastacity
4. No Autocorrellation
makes estimators efficient - 5th assumption of linear model
- errors are normally distributed
- Solution for OLS in Linear Algebra
- B=(X'X)^-1(X'Y)
- What does SER do
- Best overall indicator of regression effectiveness
- Blue
- best linear unbiased estimator
- What does GOF tell us
- How well the model fits the data
- Variance
- avg deviation from the mean -> how spread out the distribution is
- Central Tendency ->E(X)
- the mean, median, and mode
- Dispersion
- range, variation, and SD -> distribution about the mean
- Covariance
- measure of how two variables vary together
- correllation
- standard covariance
- standard error
- measure of accuracy -> standard deviation of the sampling distribution of that statistic
- probability
- measure of uncertainty
- Random variable
- assigns outcome to event
- mutually exclusive
- when one thing happens another cannot
- conditional probability
- ratio of joint to marginal
- parameters
- defines what distributions look like
- 2 ways to define distributions
-
1. probability distrabution function
2. cumlative density function - Central limit theory
- bigger the scale the better the data
- 3 hypothesis tests
-
1. compare mean to some hypothetical value
2. standardize its deviation
3. use properties of normal curve - hypothesis
- statement that something is true
- Null
- a hypothesis to be tested
- Point estimate
- value of a statistic used to estimate a parameter
- P value requirement
- 5 or below to reject null
- T-test requirement
- outside of +- 2 standard deviations reject null
- T value
- number of SD's away from the mean
- Identity Matrix
- matrix with 1's alone the diagnol
- Regression Analysis
- process of estimating parameters from samlpe data
- Residuals
- difference between regression and actual results
- RSS
- difference between regression line an mean
- 4 measure of GOF
-
1. standard error of regression
2. test on each individual slope coeficient
3. R^2
4. F-test - Total variance
- absence of any other info other then mean
- Two parts of TSS
-
1. ESS
2. RSS - ESS
- Error sum of Squares
- What is ESS
- variation in Y not accounted for by X
- What does a large RSS tell us
- means more Y variation explained by X
- R^2
- coefficient of determination
- What does a large RSS with respect to TSS tell us
- how much variation of Y is explained by X
- OLS
- process by which we turn data into theoretical quantities
- What is OLS
- process by which we turn data into theoretical quantities
- Diagnostics
- warn us of inappropiate uses of OLS
- Problems with data to focus on
-
1. unusual data
2. non-constant variation
3. non-normal errors - b1
- b1=sum((xi-meanx)(yi-meany))/sum(xi-meanx)^2
- b0
- b0=Y-b1(meanx)
- Null hypothesis
- no difference between hypothesis and reality...where you fail to reject null
- skew
- where the tail in a set of data is elongated
- 3 units of linear algebra analysis
-
1. scalar
2. vector
3. matix - singularity
- when one column is a linear function of another -> when variables are perfectly correlated
- Linear model
- tells us the trend between variables
- Regression coefficients
- b0 and b1
- P(Y|X)
- probability distribution of Y for specific values of x...probability of seeing value of Y conditional on X
- Influence
- Leverage * Discrepancy
- Outlier
- ususual Y for level of X -> does not mean seperate from data cloud
- Method of deletion
-
1. remove outlier
2. calculate line
3. calculate residual as if influential case had been present - How do you fix heteroscedascity
- transform the Y variable
- Leverage
- how much influence does a point Y exert on all fitted fields
- discrepancy
- how far from regression is outlier
- 4 elements of modeling
-
1. question
2. DV
3. IV
4. Unit of analysis - Bias
- the amount of error that arises when estimating a quantity
- Complementarity
- probability of something not happening -> 1-P
- heteroscedascity
- constant error variance