First steps with R / Part II

Statistics and tables

This part of my small open series about the R programming language is about simple statistical functions and working with and creating tables.

I'm a beginner in R and since I don't have the time for a full day's training, I'm trying to dabble for instructions on the net that explain it as easily for me as I'm trying to do it for you. The exercises mentioned here hopefully help me to be better able to create presentable and evaluable tables.

In the first part of my small open series I dealt with simple string functions.

As a total beginner in R I am currently looking for tutorials that I can implement with my knowledge.

I came across the website The Programming Historian, where there are currently (as of January 2020) 80 lessons ( Link: https://programminghistorian.org/en/lessons/) for programming historians, but they should be interesting for a programming designer like me. I have chosen to use Taylor Arnold's Basic Text Processing in R lesson to evaluate Tom Wolfe's text "Look Homeward, Angels! The transcript and my notes on this exercise will be available in the next episode.

But before I do, I will first work through Taryn Dewar's R Basics lesson and play around with the data and functions mentioned above for better understanding.

This working note deals among other things with these functions of the programming language R:

setwd(), dir(), paste(), write.table(), data(), mean(), median(), min(), max(), quantile(), summary(), cbind(), rbind(), rownames(), colnames(), t()

Simple statistical functions

Mean() und Median()

The tutorial contains some basics for handling statistical values, such as the mean value, the median value, maximum values and minimum values.

As you may have noticed, my blog and especially this series of articles is a kind of public note function for myself.

So here is a short protocol of my gimmicks with R and the functions mean() and median() using the passenger list provided in R and the one in the above mentioned article of The Programming Historian.

In the first step I open R and set up my working directory.

setwd("C:/R/r-wolfe")

Set up working directory.

The function mean() returns the mean value of a data set, the function median() the so-called central value - the value of the data base that is exactly in the middle - this does not necessarily have to be the mean value.

The data from the passenger list in R is obtained with data(AirPassengers), as you can see a simple table with the number of passengers (in 1000) who flew between January 1949 and December 1960. Please pay attention to upper and lower case.

data(AirPassengers)
AirPassengers
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

The AirPassengers table is included in the installation of R

The average value of all months is obtained with

mean(AirPassengers)
[1] 280.2986

mean()

and the median with

median(AirPassengers)
[1] 265.5

median()

We get the smallest value with the function min().

min(AirPassengers)
[1] 104

min()

We get the highest value from the table Passengers with the function max().

max(AirPassengers)
[1] 622

max()

Quantiles as measures of location

quantile() and summary()

We can also display quantiles with the function quantile(). Quantiles are important measures of situation in statistics and especially in population statistics and ergonomics. The 25% quantile is the value for which 25% of all values are smaller than this value.

quantile(AirPassengers)
   0%   25%   50%   75%  100% 
104.0 180.0 265.5 360.5 622.0

The function quantile() shows us values that do not exceed or fall below a set of values.

In R there is practically a function that summarizes the most important statistical data, this is the function summary(). From left to right it represents the minimum value, the quantile point of 25%, the median, the average, the 75% quantile and the maximum value.

summary(AirPassengers)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  104.0   180.0   265.5   280.3   360.5   622.0

Use summary() to briefly evaluate a table.

Simple table functions

Create your own tables

How can we now create tables ourselves? This is amazingly easy in R and is really fast.

Let's assume we want to write this data into a table:

data1 <- c(2,30,38,13)
data2 <- c(7,20,36,3)
data1
[1]  2 30 38 13
data2
[1]  7 20 36  3

A few sample data.

To create a matrix from this data, we can use the cbind() function, roughly translated as column bind. Of course we can change the order of the inserted data in the brackets. For this R uses our object names as column headers.

table <- cbind(data1,data2)
table
     data1 data2
[1,]     2     7
[2,]    30    20
[3,]    38    36
[4,]    13     3

The function cbind() binds the table.

And it would look like this with the function rbind() - row bind. The object names are used as row labels.

table <- rbind(data1,data2)
table
      [,1] [,2] [,3] [,4]
data1    2   30   38   13
data2    7   20   36    3

With rbind() the table is concatenated line by line.

And we can also append rows, in this case the values from data3:

data3 <- c(1,2,3,5)

table <- rbind(table,data3)
> table
      [,1] [,2] [,3] [,4]
data1    2   30   38   13
data2    7   20   36    3
data3    1    2    3    5

Append lines with rbind().

Of course, the names for the rows and columns don't look that sparkling yet. This can be changed relatively quickly. We simply rename the columns as well as the row headers by using rownames() for the rows and colnames() for the columns:

rownames(table) <- c("Januar","Februar","März")
> table
        [,1] [,2] [,3] [,4]
Januar     2   30   38   13
Februar    7   20   36    3
März       1    2    3    5

colnames(table) <- c("Woche 1","Woche 2","Woche 3","Woche 4")
> table
        Woche 1 Woche 2 Woche 3 Woche 4
Januar        2      30      38      13
Februar       7      20      36       3
März          1       2       3       5

rownames(), colnames()

If we do not like this arrangement, we simply invert the table with t(). The t stands for transpose, which can easily be translated as convert.

t(table)
        Januar Februar März
Woche 1      2       7    1
Woche 2     30      20    2
Woche 3     38      36    3
Woche 4     13       3    5

Convert (invert) tables with t().

And of course we can also carry out evaluations for the table again:

summary(table)
summary(table)
    Woche 1         Woche 2         Woche 3         Woche 4  
 Min.   :1.000   Min.   : 2.00   Min.   : 3.00   Min.   : 3  
 1st Qu.:1.500   1st Qu.:11.00   1st Qu.:19.50   1st Qu.: 4  
 Median :2.000   Median :20.00   Median :36.00   Median : 5  
 Mean   :3.333   Mean   :17.33   Mean   :25.67   Mean   : 7  
 3rd Qu.:4.500   3rd Qu.:25.00   3rd Qu.:37.00   3rd Qu.: 9  
 Max.   :7.000   Max.   :30.00   Max.   :38.00   Max.   :13

summary() function is applied to the current table.

And for individual lines or columns, by putting in square brackets the line before the comma or the column after the comma.

summary(table[1,])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   10.25   21.50   20.75   32.00   38.00 
> summary(table[,1])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.500   2.000   3.333   4.500   7.000

Or simply calculate an average value for the third column:

mean(table[,3])
[1] 25.66667

That was now only a small flyover over very very simple table and statistic functions in the programming language R.

I am glad about suggestions or criticism.

tl, dr;

Simple statistical functions in R and creating and working with tables in this programming language.

Cronhill