R graphics construction:
I decided to do my use case on the relationship between GDP versus Employment Rate. I was interested to see how strong the correlation if any, was between these two data sets for OECD countries.
I obtained my raw data from http://stats.oecd.org/ and http://en.wikipedia.org/wiki/List_of_OECD_countries_by_GDP_per_capita.
I cleaned this raw data and created two CSV files – GDP.txt and Employment rate.txt.
The first file is Employment rate – this stored each country and it’s employment rate for 2012.
The second file is GDP – this stored each country and it’s Gross Domestic Product for 2012.
I now downloaded the “R” GUI programming interface for windows.
After opening up the “R” programming environment, I created a data frame for each of my CSV files.
gdp <- read.csv(“GDP.txt”, header=T)
ER <- read.csv(“Employment rate.txt”, header=T)
I then merged the above data frames and stored them in a new data frame called “countries”.
countries <- merge(x = gdp, y = ER)
The above created a data frame with 3 columns – Country, GDP and Employment_rate.
When I run print(countries), I obtain the following.
Country GDP Employment_Rate
Australia 44407 72.4
Austria 44141 72.5
Belgium 40838 61.8
Canada 42114 71.8
Chile 21486 61.7
CzechRepublic 27527 66.0
Denmark 42787 73.1
Estonia 24260 66.2
Finland 39160 69.4
France 36933 63.8
Germany 41927 72.7
Greece 25987 52.3
Hungary 22635 56.6
Iceland 39117 78.7
Ireland 43803 58.7
Israel 31364 66.0
Italy 34141 56.9
Japan 35482 70.4
Korea 30011 64.2
Luxembourg 89417 64.8
Netherlands 43348 75.3
NewZealand 32888 72.7
Norway 66135 75.8
Poland 22782 59.6
Portugal 25802 62.3
SlovakRepublic 25948 59.8
Slovenia 28482 64.8
Spain 32559 56.6
Sweden 42865 73.7
Switzerland 53641 79.0
Turkey 18328 48.2
UK 35671 69.4
USA 51689 67.0
I then plotted the GDP column against the Employment_rate column.
I then plotted a line showing the positive correlation between Employment Rate and GDP as seen below.
line <- lm(countries$Employment_Rate ~ countries$GDP)
I also ran the cor.test function as seen below;
Pearson’s product-moment correlation
data: countries$GDP and countries$Employment_Rate
t = 3.1327, df = 31, p-value = 0.003767
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
As seen from the above results I got a p-value of 0.003767 which imparts a definite correlation.
I obtained a value of 0.49 for the cor value; this imparts a definite positive correlation between the employment rate and GDP.
Information gleamed at a glance.
The information gleamed from the dataset is that there is a definite positive correlation between employment rate and the GDP per Country.
The R graphics proved to be excellent for proving the correlation between GDP and Employment rate.
What other ideas/concepts could be represented via R Graphics.
This could be further expanded to include every country in the world to see would the correlation be similar to the above.
More data sets could be developed to test the correlations between;
IQ and GDP
IQ and Employment rate
Health and GDP
Health and Employment
Education and poverty
The above amongst other concepts/ideas would be very interesting to analyse further using R graphics.
Image of R course completion.