AP Statistic Review Sheet

作者: AnonTokyo

简介: AP统计复习学案(常考概念+解释)

最后修改: 2025-04-27 16:56:38.028694

文章状态: 已发布

标签:

AP statistic SOP F ebruary 16, 2025 1 Prop ert y and In terpretation of common statistics 1.1 Mean The arithmetic mean of the data ( µ for p opulation parameters, ¯ x for sample statistics): ¯ x = 1 n X i x i 1.2 Mo de The v alue that o ccurs the most times. 1.3 Median Median (Q2) is the v alue b elo w whic h 50% of the data falls. F or an ordered sequence of length n , calculate n +1 2 to get the index of the median. If the result is a decimal, tak e the nearest in tegers as indices. Find one or t w o n um b ers (a v erage if t w o) i n the sequence to get the median. 1.4 Q1 and Q3 Q1 is the v alue b elo w whic h 25% of the data falls; Q3 is the v alue b elo w whic h 75% of the data falls. F or an ordered sequence of length n , calculate n +1 4 and 3( n +1) 4 to ge t the indices of Q1 and Q3. F or eac h index, if it is a decimal, tak e the nearest in tegers as indices. Find one or t w o n um b ers (a v erage if t w o) in the sequence to get Q1 and Q3. 1.5 Range A measure of the spread of the data (max - min). 1.6 In ter-quartile Range (IQR) A measure of the range within whic h the middle 50% of the data falls: I QR = Q 3 Q 1 1.7 V ariance A measure of the disp ersion of the data ( σ 2 for p opulation parameters, s 2 for sample statistics): s 2 = 1 n 1 X i ( x i ¯ x ) 2 1.8 Standard Deviation The t ypi c al dierence of data to the mean ( σ for p opulation parameters, s for sample statistics): s = s 2 = s 1 n 1 X i ( x i ¯ x ) 2 1
1.9 Z-score A standardized v alue indicates ho w man y standard deviations a particular data is from the mean: z i = x i ¯ x s - P ositiv e, 0, negativ e ab o v e mean, exact mean, b elo w mean. - Allo ws comparison across dieren t distributions with dieren t scales or units. 1.10 Residual Residual is the dierences b et w een actual v alue and exp ected v alues: e = y ˆ y Residual in Linear Regression is exp ec ted to ha v e mean at zero, and smaller v ariance is b etter. 1.11 Correlation Co ecien t Measures the strength and direction of the linear relationship b et w een t w o v ari ables : r = 1 n 1 X i x i ¯ x s x · y i ¯ y s y = 1 n 1 X z x · z y The sign of the correlation co ec ien t represen ts the p ositiv e or negativ e correlation. Correlation Strength | r | Linear Correlation Degree 0.0 No Correlation 0.0 0.2 V ery W eak 0.2 0.4 W eak 0.4 0.6 Mo derate 0.6 0.8 Strong 0.8 1.0 V ery Strong 1.0 Linear Relationship 1.12 Co ecien t of Determination A measure of the p ercen tage of the v ar iation in the resp onse v ari able can b e explained b y the linear relationship with exp lanatory v ariable: r 2 = 1 S S R S S T = 1 S R 2 S T 2 2 Describ e/compare distribution Use con text , comparativ e languages 2.1 1-dimensional Data (SOCS) S - Shap e - Unimo dal, bimo dal - Sk ew ed left, sk ew ed righ t [mean=median sk ew ed righ t; mean¡median sk ew e d left], uniform, symmetric, b ell-shap ed (write ”appro ximately” when not sure) C - Cen ter - Median, mean S - Spread - IQR(Q3-Q1), range(max-min), standard deviation 2
O - Outliers If there exist p oin t x satisfy: (reme m b er to only c ho os e one criterion) x < Q 1 1 . 5 I QR or Q 3 + 1 . 5 I QR > x (robust) or x < ¯ x 2 s or ¯ x + 2 s > x (not robust) Then x is iden ti e d as an ou tlier. 2.2 2-dimensional Data Direction (P ositiv e or Negativ e) Strength (reference to P art 1: Correlation Co ecien t) F orm (linear or n ot) Un usual F eatures 3 Graphs 3.1 Bar Graph Displa y Cate gori c al Data The order of categories is not imp ortan t! Con v ert to frequency b efore plotting 3.2 Bo x-plot Displa y Nume r ic al Data The b o x represen ts Q1, Q3 and IQR; The line inside the b o x is Q2 (median); The whisk ers extend to minim um and maxim um, excluding outliers (Reference to P art 2 ab out the recognition of outliers); Outliers should b e mark ed with aste ri s k(*) The distributi on of the most imp ortan t v e lines (minim um, Q1, me d ian, Q3, maxim um) in the Bo x-plot can b e rev ealing. If the lin e s concen trate on the left side, then the distribution sk ew ed to the righ t; If the lines concen trate on the righ t side, the distribution sk ew ed to the left. 3.3 Histogram The x-axis represen ts the in terv als (bin s ) of the data. The y-axis represen ts the frequency (coun t) of data p oin ts within eac h bin. Most useful for displa ying the shap e of the distribution of n umerical data 3.4 Scatter Plot Explanatory V ariable (usually x); Resp onse V ariable (usually y) Clearly lab el or iden tify v ar iables with their axis, and pa y atte n tion to units! By plotting dots according to their co ordinates, w e can nd the b es t t li ne an d c al c ul ate its slop e and in tersec t. F eatures: 1. Direction: P ositiv e: P oin ts tend to rise as y ou mo v e from left to righ t; N egativ e: P oin ts tend to fall as y ou mo v e form left to righ t. 3
2. Strength: Reference to Correlation Co ecien t. 3. F orm (Linear, Non-Linear) 4. Un usual F eatures (In uen tial P oin ts): Lev erage: distance to π Outlier: Recognized in Residual Plot, signican t bigger residual, comparing to other p oin ts (or r e cognized if a p oin t if signican t f arther to the b est-t lin e ) 3.5 Other Graphs Pie c har t Help comparing parts of a whole and quic kly iden tifying dominan t c ategories. Con tingency T able Displa y the frequency distribu tion of t w o categorical v ariables. Useful in analyzing the rela- tionship b et w een t w o categorical v ar iables, often used for Chi-square tests of indep e n dence . Stem plot Recognize or dene the common part (often is the tens digit) in data, and group the data b y the common part dened, and app end the distinctiv e information of eac h data p oin t on the list of th e common part matc hes this data. The nal graph w ould b e similar to Histogram, sho wing the shap e of the distri bution of a n ume ri c al data. 4 Probabilit y 4.1 Probabilit y Basis V enn Diagram is helpful. Probabilit y pair t hat is equiv alen t when giv en: P ( A | B ) + P ( A C | B ) = 1 P ( A | B C ) + P ( A C | B C ) = 1 T est of Indep endence: If P ( A B ) = P ( A ) × P ( B ) or P ( A | B ) = P ( A ) is true, then Ev en t A and Ev en t B are indep enden t. 4.2 Ba y es Theorem The core of Ba y es Theorem: ( P ( A B ) = P ( A | B ) × P ( B ) = P ( B | A ) × P ( A ) P ( A ) = P ( A B ) + P ( A B C ) So w e can deriv e: P ( A | B ) = P ( B | A ) × P ( A ) P ( B | A ) × P ( A ) + P ( B | A C ) × P ( A C ) Common settings: E: ev en t, P: test p ositiv e Sensitivit y = P ( P | E ) Sp ecicit y = P ( P C | E C ) 5 Linear Regression Describing scattered dot plots: Strong/W eak /Negativ e Asso ciations Describ e the use of linear regression: Uses an explanatory v ariable, x , to predict the resp onse v ariable, y . Describ e le ast-square regression: Min imiz es the sum of the squares of the residuals. Describ e r and r 2 : 4
r is co ecien t of correlation that describ es the indicates b oth the dir e ction an d strength of the linear relationship. r 2 is co ecien t of determination that describ e the prop or tion of v ariation in the resp onse v ariable that is explained b y the explanatory v ariable in the mo del. 6 Randomly assign sub jects Describ e ho w to randomly select/ass ign sub jects: 1. Num b er all sub jects from 1 to n . 2. Use random n um b er generator to generate in teger range from 1 to n . 3. Selected the sub ject corresp onding to the random n um b er generated. 7 Design an exp erimen t Describ e ho w to design an exp erime n t: Determine v ariables (what are explanatory what are resp on s e) (b e careful with confoundi ng v ari- ables!). Determine exp erimen t metho d : Single blind: Su b ject don’t kno w the e x p erimen t ob jectiv e. Double blind: (Sub ject + Researc h mem b e r don’t kno w the exp erimen t ob jectiv e). Blo c k: Ran domly assign treatmen t to eac h similar blo c ks. Matc hed P air: Set sub je ct A and B as a blo c k assuming A and B are similar, randomly assign treatmen t to A and assign the other treatmen t to B . Determine con trol groups: Use placeb o or just do not giv e tr e atmen t to som e group of sub jects. 8 Construct and in terpret a condence in terv al Describ e what condence in terv al is: Condence in terv al is a range of v alues used to estimate a p opulat ion parameter. 8.1 Construct condence in terv al for p opulation prop ortion Conditions: 1. Random sample. 2. Sample size n is less than 10% of p op ulation. 3. Both coun ts of success np and failure n (1 p ) are at least 10. Where p is sample prop ortion, and n is sample size. The condence in terv al for a p opulation prop ortion p is giv e n b y: ˆ p ± z r ˆ p (1 ˆ p ) n 5
8.2 Construct condence in terv al for the dierence of t w o p opulation pro- p ortions Conditions: 1. Tw o p opulations ar e indep enden t. 2. Random sample. 3. Sample size n 1 , n 2 are less than 10% of p o p ulation. 4. Both samples ha v e c ou n ts of succes s n 1 ˆ p 1 , n 2 ˆ p 2 and failure n 1 (1 ˆ p 1 ), n 2 (1 ˆ p 2 ) of at least 10. The condence in terv al for the dierence of t w o p opulations prop ortion p 1 , p 2 is giv e n b y: ( ˆ p 1 ˆ p 2 ) ± z s ˆ p 1 (1 ˆ p 1 ) n 1 + ˆ p 2 (1 ˆ p 2 ) n 2 Where ˆ p 1 , ˆ p 2 are sample prop ortions, and n 1 , n 2 are sample sizes. 8.3 Construct condence in terv al for p opulation means Conditions: 1. Random sample. 2. Sample size n is less than 10% of p op ulation. 3. Sample size n 30 OR the p opulation is appro ximately normally distrib uted OR the sample ha v e no strong sk ewness or outliers The condence in terv al for p opulation means µ is giv en b y: ¯ x ± t s n Where ¯ x is sample mean, s is sample standard deviation, n is sample size, and degrees of freedom d f = n 1. 8.4 Construct condence in terv al for the dierence of t w o p opulations means Conditions: 1. Tw o p opulations ar e indep enden t. 2. Random sample. 3. Sample sizes n 1 , n 2 are less than 10% of p opulation. 4. Sample sizes n 1 , n 2 30 OR b oth p opulations is appro ximately normally distributed OR b oth samples ha v e n o strong sk ewnes s or outliers ( ¯ x 1 ¯ x 2 ) ± t s s 2 1 n 1 + s 2 2 n 2 Where ¯ x 1 , ¯ x 2 are sample means, s 1 , s 2 are sample standard deviations, n 1 , n 2 are sample sizes, and degrees of freedom d f = the smaller b et w een n 1 1 and n 2 1. 6
9 Hyp othesis testing 9.1 Describ e h yp othesis testing 1. Assume H 0 is v alid. 2. Calculate probabilit y of an ev e n t happ ening. 3. Compare P (ev en t) with critical v alue α . 4. If P < α , reject H 0 . Else accept H 0 . 9.2 Hyp othesis test f or p opulation prop ortion Conditions: 1. Random sample. 2. Sample size n is less than 10% of p op ulation. 3. Both coun ts of success np and failure n (1 p ) are at least 10. The test statistic for a h yp othesis test ab out a p opulat ion prop ortion p is giv en b y: z = ˆ p p 0 q p 0 (1 p 0 ) n Where ˆ p is sample prop ortion, and n is sample size. 9.3 Hyp othesis test f or the dierence of t w o p opulation prop ortions Conditions: 1. Tw o p opulations ar e indep enden t. 2. Random sample. 3. Sample size n 1 , n 2 are less than 10% of p o p ulation. 4. Both samples ha v e c ou n ts of succes s n 1 ˆ p 1 , n 2 ˆ p 2 and failure n 1 (1 ˆ p 1 ), n 2 (1 ˆ p 2 ) of at least 10. The test statistic for a h yp othesis test ab out th e dierence of p opulation prop ortions p 1 , p 2 is giv en b y: z = ˆ p 1 ˆ p 2 r ˆ p c (1 ˆ p c ) 1 n 1 + 1 n 2 Where ˆ p 1 , ˆ p 2 are sample prop ortions, n 1 , n 2 are sample sizes, and ˆ p c = n 1 ˆ p 1 + n 2 ˆ p 2 n 1 + n 2 is the com bined prop ortion. 9.4 Hyp othesis test f or p opulation means Conditions: 1. Random sample. 2. Sample size n is less than 10% of p op ulation. 3. Sample size n 30 OR the p opulation is appro ximately normally distrib uted OR the sample ha v e no strong sk ewness or outliers The test statistic for a h yp othesis test ab out a p opulati on mean µ is giv en b y: t = ¯ x µ 0 s Where ¯ x is sample mean, s is sample standard deviation, n is sample size, and degrees of freedom d f = n 1. 7
9.5 Hyp othesis test f or the dierence of t w o p opulations means Conditions: 1. Tw o p opulations ar e indep enden t. 2. Random sample. 3. Sample sizes n 1 , n 2 are less than 10% of p opulation. 4. Sample sizes n 1 , n 2 30 OR b oth p opulations is appro ximately normally distributed OR b oth samples ha v e n o strong sk ewnes s or outliers The test statistic for a h yp othesis test ab out th e dierence of p opulation m eans ] µ 1 , µ 2 is giv en b y: t = ¯ x 1 ¯ x 2 q s 2 1 n 1 + s 2 2 n 2 Where ¯ x 1 , ¯ x 2 are sample means, s 1 , s 2 are sample standard deviations, n 1 , n 2 are sample sizes, and degrees of freedom d f = the smaller b et w een n 1 1 and n 2 1. 10 Bias / Error Iden tication 10.1 Bias Iden tication Selection Bias (Occurs w h e n some groups of p eople ha v e a lo w c hance to b e c hosen; or some p eople are not includ e d in the ass u m ed p opulat ion). Non-resp onse (When some grou ps do not resp ond to the res earc h, in tro ducing dierences b et w een p eople that resp onded and p eopl e that did not resp onse, probably in tro ducing other v ariables suc h as the accessibilit y to In ternet). V olun tary Bias (When some grou ps are more inclined to tak e part in researc h, who migh t carry systematic dierenc es in their features, comparing to the o v erall p opulation). Ho w to reduce bias: Increase the Randomness of the sampling pro cess. (Double-)Blind Exp erimen ts. Stratied sampling / Cluster sampling. Common Resp onse : Increas i ng sampling size. 10.2 Error Iden tication Errors are common in h yp othesis testing, and it is also imp ortan t for us to recognize the p oten tial bias underlying. Remem b er signicance lev el ( α ) denes ”Imp os sibl e H 0 is true H 0 is false Reject H 0 T yp e I Error ( α ) Happ y ending! Not reject H 0 Happ y ending! (p o w er) T yp e I I Error ( β ) 10.2.1 T yp e I Error Probabilit y: α , the signicance lev el is the probabilit y of this error t yp e . W a ys to r e du c e p ossibilit y: Set a smaller signicance lev el ( α ). Common Resp onse : Increase sample size. Common Resp onse : Increase the n um b er of exp erimen ts to v erify the conclu s ion . 8
10.2.2 T yp e I I Error Probabilit y: β = 1 p o w er, p o w er is the correct probabilit y of rejecting H 0 . W a ys to redu c e p ossibilit y: Impro v e the p o w er of the testing through impro ving data qualit y or use testing with higher sensitivit y . Set a bigger signicance lev el ( α ), ma y help to reduce T yp e I I Error, but will conse q uen tly increase T yp e I Error. Common Resp onse : Increase sample size. Common Resp onse : Increase the n um b er of exp erimen ts to v erify the conclusion . 9
创建一个文章