MSU

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Assignment5代写 linear regression

Question 1

  1. Inside the dataset qog, create a new variable called regime that takes the value of “Democracy” if
    p_polity2 is higher than 0 (1 to 10), and “Autocracy” if lower than or equal to 0 (0 to-10), using
    ifelse() function.
  2. Using the function in dplyr package, create a separate dataset called lac_data that only contains the
    observations from the region of Latin America & Caribbean.
  3. Calculate the yearly mean of corruption index of countries in Latin America & Caribbean region,
    separately for democracies and autocracies, and store the results in a new dataset called lac_means.
    [Hint: you need to include more than one variable in group_by and you can ignore the warning messages
    from the chunk.]
  4. Draw a line plot showing the mean corruption levels of Latin American and Caribbean countries across
    each year.
    • Draw two lines, one for democracies and one for autocracies, using different colors or line types.
    • Give appropriate title for the plot and axis labels.
    • Use a theme different from the default setting.
    [Hint: you don’t need to care NAs found on the graph. However,if you want to remove it, you can remove
    the NAs from lac_means]
  5. What trend can you observe from the resulting plot?
    Answer:
    1
    Question 2
  6. Now I would like come back to the whole dataset qog and examine whether the economy of a country is
    related to the corruption level. Assuming that there is a linear relationship between the two variables,
    f
    it a model with vdem_corr as the dependent variable and mad_gdppc as the independent variable.
    Store the model results in a object named mod1.
  7. Provide a summary of the model results. From the results, when a country economy grows, does it
    tend to be more or less corrupt?
    Answer:
  8. What is the predicted corruption index of a country with the GDP per capita of 20,000 USD?
  9. Next, we are going to predict the corruption index for all observations in the data. First, create a data
    frame called gdppc_data that contains only the mad_gdppc column from the qog.
  10. Then, created a new data qog_pred that contain all the columns from the qog plus a new additional
    column called pred_corr with all the predicted values of vdem_corr from the mod1.
  11. Finally, plot the actual observations of age and ideology score, and a line from the linear model result,
    using the qog_pred data.
    • Plot the actual GDP per capita and corruption index with points.
    • Change the shape of the points to something other than the default.
    • Add a line of the predicted corruption index using additional geom_line() and change the line color
    to something other than black.
    • Give appropriate titles and axis labels.
    [Hint: The dataset is quite large, so the graph may take some time to load. Please be patient.]
    [Hint2: The resulting plot may appear visually unbalanced, but this is expected.]
  12. Based on the plot comparing the linear model’s predictions and the actual observed data, as well as
    your understanding of linear models, what can you conclude from the plot above? How well does the
    model represent real-world patterns? Can you say that economic development causes lower political
    corruption? [Hint: remember that the corruption index ranges from 0 to 1.]
    Answer: