Purpose of this notebook

The main purpose of this notebook is to learn how to get started in R and apply some basic commands like vectors, data importation, loops, if statements, and of course linear regression.

While this notebook is beginner friendly, it does require some basic understanding of how the OLS regression algorithm works.

Theory

Gross fixed capital formation (GFCF) includes spending on land improvements (fences, ditches, drains, and so on); plant, machinery, and equipment purchases; the construction of roads, railways, private residential dwellings, and commercial and industrial buildings.

Foreign direct investment (FDI) is an investment from a party in one country into a business or corporation in another country with the intention of establishing a lasting interest.

Data Importation

The Data was taken from the website “Perspective Monde”. Instead of downloading it as a CSV file and then importing it and extract vectors, they provide it in a ready format for R and Python as vectors and Data frames for convenience, which is what we did here:

Date_FDI=c(1970:2019)


# Vectors ( This writing is ignored while running the command because it's considered a comment )

FDI=c( 20000000, 23100000, 13000000, 5490000, -20400000, 5020000, 38014962.74, 7994056.273,
      11759988.24, 7437548.637, 89416222.59, 58581335.99, 79528177.1, 46123623.5, 46989196.56, 
      19975166.86, 549182.4961, 59574900.78, 84661627.57, 167056032.1, 165122977.8, 317462140.6,
      422470462.5, 491466064.6, 550924373.9, 334768272.9, 357393801.8, 1079341332, 308712164.4,
      826974026.9, 426553283.9, 2824557252, 480355698, 2312829823, 893325392.8, 1670609689, 
      2460787164, 2825801376, 2466288357, 1970323920, 1240625859, 2521362081, 2841954371, 
      3360909924, 3525384612, 3252913902, 2153363905, 2680109856, 3544387229, 1599761098)

Date_GFCF=c(1960:2018)

GFCF=c( 199877917.2, 229133346.5, 250410018.8, 306056694, 296850449.6, 313012548.2, 336528030.8, 
       415769192.8, 433356367.9,479596877.8, 590455488.6, 647326732.7, 690532081.4, 845163018.3, 
       1128655774, 2229981493, 2755187473, 3530966180, 3295653635, 3815239414, 5675430575, 
       5550459177, 5462138469, 4509216318, 3747639748, 3779465368, 4569944203, 
       4839690401, 5786813414, 7065573384, 7781959744, 8054243608, 8362423940, 8088129003, 
       8392956449, 9350603852, 9414815900, 8909512890, 10128138832, 10930128728, 10480934331, 
       10202074980, 11120357844, 13498857067,16273601240, 17759604422, 20021964801, 
       25416061424, 31838380450, 29413188368, 28576723851, 31926847056, 32032590051, 
       32894652311, 32860711609, 28703644911, 31025847566, 31424989682, 33556322647)

length=c(length(Date_FDI),length(FDI),length(Date_GFCF),length(GFCF))
print(length)
## [1] 50 50 59 59

If you want to import the data as an excel or csv file, use the following commands :

# For csv files
data = read.csv2("filename.csv")

# For Excel files, we have to install the 'readxl' package and then read it.
install.packages("readxl")
library(readxl)
data = read_excel("filename.xlsx", sheet="name")

# Extract vectors from csv file
vector = data$column_name

Data Cleaning

Note that the FDI has 50 observations starting from 1970 to 2019. The GFCF has 59 observations starting from 1960 to 2018. We will remove the first 10 observation from the GFCF as well as the last one from the FDI the last one so that the two vectors become equal in length:

GFCF_2 = GFCF[-(1:10)]
FDI_2 = FDI[-length(FDI)]
length_2=c(length(GFCF_2),length(FDI_2))
print(length_2)
## [1] 49 49

Now that the vectors are equal, we can bind them in a dataframe and visualize its head as follows:

data=data.frame(Year=c(1970:2018),GFCF_2,FDI_2)
print(head(data,2))
##   Year    GFCF_2    FDI_2
## 1 1970 590455489 20000000
## 2 1971 647326733 23100000
print(tail(data,2))
##    Year      GFCF_2      FDI_2
## 48 2017 31424989682 2680109856
## 49 2018 33556322647 3544387229

Data visualisation & Descriptive statistics

# Histograms

hist(GFCF_2,col='cornflowerblue')
hist(FDI_2,col='cornflowerblue')

# Line charts using the 'ggplot' library
library(ggplot2)
ggplot(data, aes(x=c(1970:2018), y=GFCF_2)) + geom_line(color="blue") + theme_bw()
ggplot(data, aes(x=c(1970:2018), y=FDI_2)) + geom_line(color="blue") + theme_bw()