Work Hours
Everyday: 北京时间8:00 - 23:59
Question 1
- Inside the dataset qog, create a new variable called regime that takes the value of “Democracy” if
p_polity2 is higher than 0 (1 to 10), and “Autocracy” if lower than or equal to 0 (0 to-10), using
ifelse() function. - Using the function in dplyr package, create a separate dataset called lac_data that only contains the
observations from the region of Latin America & Caribbean. - Calculate the yearly mean of corruption index of countries in Latin America & Caribbean region,
separately for democracies and autocracies, and store the results in a new dataset called lac_means.
[Hint: you need to include more than one variable in group_by and you can ignore the warning messages
from the chunk.] - Draw a line plot showing the mean corruption levels of Latin American and Caribbean countries across
each year.
• Draw two lines, one for democracies and one for autocracies, using different colors or line types.
• Give appropriate title for the plot and axis labels.
• Use a theme different from the default setting.
[Hint: you don’t need to care NAs found on the graph. However,if you want to remove it, you can remove
the NAs from lac_means] - What trend can you observe from the resulting plot?
Answer:
1
Question 2 - Now I would like come back to the whole dataset qog and examine whether the economy of a country is
related to the corruption level. Assuming that there is a linear relationship between the two variables,
f
it a model with vdem_corr as the dependent variable and mad_gdppc as the independent variable.
Store the model results in a object named mod1. - Provide a summary of the model results. From the results, when a country economy grows, does it
tend to be more or less corrupt?
Answer: - What is the predicted corruption index of a country with the GDP per capita of 20,000 USD?
- Next, we are going to predict the corruption index for all observations in the data. First, create a data
frame called gdppc_data that contains only the mad_gdppc column from the qog. - Then, created a new data qog_pred that contain all the columns from the qog plus a new additional
column called pred_corr with all the predicted values of vdem_corr from the mod1. - Finally, plot the actual observations of age and ideology score, and a line from the linear model result,
using the qog_pred data.
• Plot the actual GDP per capita and corruption index with points.
• Change the shape of the points to something other than the default.
• Add a line of the predicted corruption index using additional geom_line() and change the line color
to something other than black.
• Give appropriate titles and axis labels.
[Hint: The dataset is quite large, so the graph may take some time to load. Please be patient.]
[Hint2: The resulting plot may appear visually unbalanced, but this is expected.] - Based on the plot comparing the linear model’s predictions and the actual observed data, as well as
your understanding of linear models, what can you conclude from the plot above? How well does the
model represent real-world patterns? Can you say that economic development causes lower political
corruption? [Hint: remember that the corruption index ranges from 0 to 1.]
Answer: