# Application of statistical concepts and techniques taught in QM 292.

PHASE 1 – January 26, 2018

a. Identify and state a research question of interest. The research question must be stated such that multiple linear regression can be applied to the analysis of the question. In Phase 1 address (a) why is your research question of interest to the reader and (b) Why is your research question relevant?

Examples of questions where multiple linear regression is not applicable:

a. Is there any difference in BMI between males and females? This is test of means.

b. Does Google stock have greater performance variability as compared to Yahoo? This is a variance test.

Examples of research questions might be:

c. Females or more likely to attrite from college then males. Alternatively, this can be phrased as ‘Is there any difference in the college attrition rate between the genders’?

d. Is the consumer price index a good predictor of Christmas retail spending?

e. Was President Obama’s American Recovery and Reinvestment Act successful in stimulating the US economy after the May 2007 housing crash?

f. What factors are most likely to influence stock performance?

g. Does parental income and education influence BMI?

h. To what extent is defense spending influenced by world oil prices and US oil demand?

i. What factors (demographic, macroeconomic, industrial spending, environmental) are most likely to predict the outbreak of Ebola or other diseases (polio, malaria, bird flu)?

PHASE 2 – February 9, 2018

2. In a general format, state the model. In other words, identify your dependent and independent variables. Your model must include at least 3 independent variables, but not to exceed 7 independent variables.

a. EXAMPLE: If your research question is: “ Does parental income and education influence BMI? “, then the dependent variable would be BMI and the independent variables would be parental education and parental income.

i. The general model statement: BMI = f(parental education, parental income).

2b. Identify the source or sources of your data. When using cross-sectional data ALL data must be pulled from the same time period but can be pulled from different sources. Identify the source and time period for each data element.

2c. Data sets much have a minimum of 50 observations, but not to exceed 100 observations.

2d. Include a definition of each data element, data source, and period for which data element is captured.

For example: BMI = Body Mass Index, is measure by body mass divided by the square of the individual’s height. Data source: Health and Human Services, www.hhs.gov, fiscal year 2010, state level data.

Example #2: Parental income = is combined household annual income. Data Source: www.bls.gov, state level data for 2012.

IMPORTANT: Your dependent variable cannot be binary, categorical/ranking, or strictly a discrete variable. Your dependent variable must be a continuous variable. Your independent variables can be all continuous or a mix of discrete and continuous. Your independent variables CANNOT be strictly discrete variables.

Submission of Phases I – II: Must be submitted in WORD format. The word document should be attached to your email. Do not send Phase I and II as part of the body of an email. As part of the Phase II submission include Phase I. Definition of variables (see Phase II ‘2b’) should include time period data is captured, explicit definition of variable measurement or how the variable will be transformed for inclusion in Phase IV. For example:

INCORRECT: Weight –‘ how much a person weighs.’

CORRECT: Weight – ‘weight as measured in pounds’, data source Center for Disease Control, www.cdc.gov, individual level data, 2010.

INCORRECT: Unemployment rate – ‘the unemployment rate’

CORRECT: Unemployment rate – ‘the number of people unemployed per 1000, data source Bureau of Labor Statistics, www.bls.gov, state level data 2011.

Transformed Variables (see Phase 3 for example): Categorical variables such as gender, race, color, manufacturing sector, team, geographic region, as examples, need to be transformed into quantitative variables. For example, gender is captured as M or F, this will need to be transformed into 0,1 variable.

Example: M = 0 and F = 1 or M =1 and F = 0. The definition should read if M then M = 0 and if F then F=1

Example: Let’s assume your data contains 4 geographic regions, North, South, East, and West then for Phase II you will need to define the states that comprise the North geographic region, similarly for east, west, and south. In Phase III, you will need to transform these variables into columns of 0,1 dummy variables. See examples below.

