MSU

MSU密歇根州立大学 | PLS 202 Introduction to Data Analytics and the Social Sciences | R语言 Assignment3代写

Module 3 Assignment
Download the Assignment3_SS25_data.rda file from D2L and save it in the same location with the .rmd
f
ile.
Please be sure to submit your homework in PDF format, including all your code and output. Failure to do
so will result in a deduction of 10% points from the total grade of Assignment 3. Read all the questions
carefully and make sure to answer all sub-questions, especially in interpretation of your code results.
Load the Assignment3_SS25_data.rda file in your environment (Be sure to use the correct function! The
f
ile format is .rda not .csv!)
#load your data here
You should see qog loaded in your environment. It is a subset of data from the Quality of Government
Institute within the Department of Political Science at the University of Gothenburg. The dataset contains
a lot of variables from various sources to measure the quality of governments around the world .
Now get started with the questions!
Question 1
The p_polity2 variable is a numeric variable indicating the level of democracy in a country according to
the Polity V project. This variable can take on any value between-10 to 10, with 10 being most democratic.

  1. On average, what is the polity2 score of the countries in the total dataset? Give both mean and median
    values.
  2. What is the most common value of polity2 score? [Hint: you can sort the table to print the mode]
  3. What is the standard deviation of polity2 score? How do you interpret the dispersion of data from the
    standard deviation result? (i.e. is it spread out or concentrated?)
  4. Draw a histogram of polity2 score. Set the number of breaks as 21 and give appropriate title and label.
  5. Is your histogram support your interpretation in Q1.3? How come?
    Question 2
    Some scientific research argues that countries with autocracy will have higher level of political corruption
    than the countries with democracy. Let’s test this theory with the given data.
  6. Create a dummy variable called autocracy that takes the value of 1 if p_polity2 is 0 or lower (-10 to
    0), and 0 if higher than 0 (1 to 10), using ifelse() function.
  7. Find the mean level of political corruption (vdem_corr) when countries are autocracy vs. the mean
    when they are not. The vdem_corr ranges from 0 to 1, with higher level meaning more corruption. Is
    there a difference?
    1
  8. I would like to be sure about that difference. Load the confintr package and find the confidence
    intervals of mean corruption level for autocracies and democracies.
  9. Do they overlap? What does that tell us?
  10. Instead of making a dummy variable, we could simply calculate a correlation coefficient for corruption
    level and polity2 score. Try this.
  11. Interpret the result. You should interpret BOTH correlation coefficient and statistical significance from
    the result.
    Question 3
    Now I would like to compare the political corruption level of countries in two different regions, Middle East
    & North Africa and Sub-Saharan Africa.
  12. Create a subset of data of only observations from Middle East & North Africa, and one of only Sub
    Saharan Africa observations. [Hint: use the region variable to set the condition of rows]
  13. Conduct a t-test to see if there is a statistically significant difference in the means of vdem_corr in
    these two regions.
  14. Which region has a higher level of political corruption on average? Answer from the result of t-test.
  15. Can you tell the difference is statistically significant? Explain the reason from confidence interval
    shown in the t-test result.
    2