First steps with R / Part II
Statistics and tables
This part of my small open series about the R programming language is about simple statistical functions and working with and creating tables.
I'm a beginner in R and since I don't have the time for a full day's training, I'm trying to dabble for instructions on the net that explain it as easily for me as I'm trying to do it for you. The exercises mentioned here hopefully help me to be better able to create presentable and evaluable tables.
In the first part of my small open series I dealt with simple string functions.
As a total beginner in R I am currently looking for tutorials that I can implement with my knowledge.
I came across the website The Programming Historian, where there are currently (as of January 2020) 80 lessons ( Link: https://programminghistorian.org/en/lessons/) for programming historians, but they should be interesting for a programming designer like me. I have chosen to use Taylor Arnold's Basic Text Processing in R lesson to evaluate Tom Wolfe's text "Look Homeward, Angels! The transcript and my notes on this exercise will be available in the next episode.
But before I do, I will first work through Taryn Dewar's R Basics lesson and play around with the data and functions mentioned above for better understanding.
This working note deals among other things with these functions of the programming language R:
setwd(), dir(), paste(), write.table(), data(), mean(), median(), min(), max(), quantile(), summary(), cbind(), rbind(), rownames(), colnames(), t()
Simple statistical functions
Mean() und Median()
The tutorial contains some basics for handling statistical values, such as the mean value, the median value, maximum values and minimum values.
As you may have noticed, my blog and especially this series of articles is a kind of public note function for myself.
So here is a short protocol of my gimmicks with R and the functions mean() and median() using the passenger list provided in R and the one in the above mentioned article of The Programming Historian.
In the first step I open R and set up my working directory.
setwd("C:/R/r-wolfe")
The function mean() returns the mean value of a data set, the function median() the so-called central value - the value of the data base that is exactly in the middle - this does not necessarily have to be the mean value.
The data from the passenger list in R is obtained with data(AirPassengers), as you can see a simple table with the number of passengers (in 1000) who flew between January 1949 and December 1960. Please pay attention to upper and lower case.
data(AirPassengers) AirPassengers Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1949 112 118 132 129 121 135 148 148 136 119 104 118 1950 115 126 141 135 125 149 170 170 158 133 114 140 1951 145 150 178 163 172 178 199 199 184 162 146 166 1952 171 180 193 181 183 218 230 242 209 191 172 194 1953 196 196 236 235 229 243 264 272 237 211 180 201 1954 204 188 235 227 234 264 302 293 259 229 203 229 1955 242 233 267 269 270 315 364 347 312 274 237 278 1956 284 277 317 313 318 374 413 405 355 306 271 306 1957 315 301 356 348 355 422 465 467 404 347 305 336 1958 340 318 362 348 363 435 491 505 404 359 310 337 1959 360 342 406 396 420 472 548 559 463 407 362 405 1960 417 391 419 461 472 535 622 606 508 461 390 432
The average value of all months is obtained with
mean(AirPassengers) [1] 280.2986
and the median with
median(AirPassengers) [1] 265.5
We get the smallest value with the function min().
min(AirPassengers) [1] 104
We get the highest value from the table Passengers with the function max().
max(AirPassengers) [1] 622
Quantiles as measures of location
quantile() and summary()
We can also display quantiles with the function quantile(). Quantiles are important measures of situation in statistics and especially in population statistics and ergonomics. The 25% quantile is the value for which 25% of all values are smaller than this value.
quantile(AirPassengers) 0% 25% 50% 75% 100% 104.0 180.0 265.5 360.5 622.0
In R there is practically a function that summarizes the most important statistical data, this is the function summary(). From left to right it represents the minimum value, the quantile point of 25%, the median, the average, the 75% quantile and the maximum value.
summary(AirPassengers) Min. 1st Qu. Median Mean 3rd Qu. Max. 104.0 180.0 265.5 280.3 360.5 622.0
Simple table functions
Create your own tables
How can we now create tables ourselves? This is amazingly easy in R and is really fast.
Let's assume we want to write this data into a table:
data1 <- c(2,30,38,13) data2 <- c(7,20,36,3) data1 [1] 2 30 38 13 data2 [1] 7 20 36 3
To create a matrix from this data, we can use the cbind() function, roughly translated as column bind. Of course we can change the order of the inserted data in the brackets. For this R uses our object names as column headers.
table <- cbind(data1,data2) table data1 data2 [1,] 2 7 [2,] 30 20 [3,] 38 36 [4,] 13 3
And it would look like this with the function rbind() - row bind. The object names are used as row labels.
table <- rbind(data1,data2) table [,1] [,2] [,3] [,4] data1 2 30 38 13 data2 7 20 36 3
And we can also append rows, in this case the values from data3:
data3 <- c(1,2,3,5) table <- rbind(table,data3) > table [,1] [,2] [,3] [,4] data1 2 30 38 13 data2 7 20 36 3 data3 1 2 3 5
Of course, the names for the rows and columns don't look that sparkling yet. This can be changed relatively quickly. We simply rename the columns as well as the row headers by using rownames() for the rows and colnames() for the columns:
rownames(table) <- c("Januar","Februar","März") > table [,1] [,2] [,3] [,4] Januar 2 30 38 13 Februar 7 20 36 3 März 1 2 3 5 colnames(table) <- c("Woche 1","Woche 2","Woche 3","Woche 4") > table Woche 1 Woche 2 Woche 3 Woche 4 Januar 2 30 38 13 Februar 7 20 36 3 März 1 2 3 5
If we do not like this arrangement, we simply invert the table with t(). The t stands for transpose, which can easily be translated as convert.
t(table) Januar Februar März Woche 1 2 7 1 Woche 2 30 20 2 Woche 3 38 36 3 Woche 4 13 3 5
And of course we can also carry out evaluations for the table again:
summary(table) summary(table) Woche 1 Woche 2 Woche 3 Woche 4 Min. :1.000 Min. : 2.00 Min. : 3.00 Min. : 3 1st Qu.:1.500 1st Qu.:11.00 1st Qu.:19.50 1st Qu.: 4 Median :2.000 Median :20.00 Median :36.00 Median : 5 Mean :3.333 Mean :17.33 Mean :25.67 Mean : 7 3rd Qu.:4.500 3rd Qu.:25.00 3rd Qu.:37.00 3rd Qu.: 9 Max. :7.000 Max. :30.00 Max. :38.00 Max. :13
And for individual lines or columns, by putting in square brackets the line before the comma or the column after the comma.
summary(table[1,]) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.00 10.25 21.50 20.75 32.00 38.00 > summary(table[,1]) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.500 2.000 3.333 4.500 7.000
Or simply calculate an average value for the third column:
mean(table[,3]) [1] 25.66667
That was now only a small flyover over very very simple table and statistic functions in the programming language R.
I am glad about suggestions or criticism.
tl, dr;
Simple statistical functions in R and creating and working with tables in this programming language.
Comments (0)