First steps with R / Part II
Statistics and tables
This part of my small open series about the R programming language is about simple statistical functions and working with and creating tables.
I'm a beginner in R and since I don't have the time for a full day's training, I'm trying to dabble for instructions on the net that explain it as easily for me as I'm trying to do it for you. The exercises mentioned here hopefully help me to be better able to create presentable and evaluable tables.
In the first part of my small open series I dealt with simple string functions.
As a total beginner in R I am currently looking for tutorials that I can implement with my knowledge.
I came across the website The Programming Historian, where there are currently (as of January 2020) 80 lessons ( Link: https://programminghistorian.org/en/lessons/) for programming historians, but they should be interesting for a programming designer like me. I have chosen to use Taylor Arnold's Basic Text Processing in R lesson to evaluate Tom Wolfe's text "Look Homeward, Angels! The transcript and my notes on this exercise will be available in the next episode.
But before I do, I will first work through Taryn Dewar's R Basics lesson and play around with the data and functions mentioned above for better understanding.
This working note deals among other things with these functions of the programming language R:
setwd(), dir(), paste(), write.table(), data(), mean(), median(), min(), max(), quantile(), summary(), cbind(), rbind(), rownames(), colnames(), t()
Simple statistical functions
Mean() und Median()
The tutorial contains some basics for handling statistical values, such as the mean value, the median value, maximum values and minimum values.
As you may have noticed, my blog and especially this series of articles is a kind of public note function for myself.
So here is a short protocol of my gimmicks with R and the functions mean() and median() using the passenger list provided in R and the one in the above mentioned article of The Programming Historian.
In the first step I open R and set up my working directory.
The function mean() returns the mean value of a data set, the function median() the so-called central value - the value of the data base that is exactly in the middle - this does not necessarily have to be the mean value.
The data from the passenger list in R is obtained with data(AirPassengers), as you can see a simple table with the number of passengers (in 1000) who flew between January 1949 and December 1960. Please pay attention to upper and lower case.
The average value of all months is obtained with
and the median with
We get the smallest value with the function min().
We get the highest value from the table Passengers with the function max().
Quantiles as measures of location
quantile() and summary()
We can also display quantiles with the function quantile(). Quantiles are important measures of situation in statistics and especially in population statistics and ergonomics. The 25% quantile is the value for which 25% of all values are smaller than this value.
In R there is practically a function that summarizes the most important statistical data, this is the function summary(). From left to right it represents the minimum value, the quantile point of 25%, the median, the average, the 75% quantile and the maximum value.
Simple table functions
Create your own tables
How can we now create tables ourselves? This is amazingly easy in R and is really fast.
Let's assume we want to write this data into a table:
To create a matrix from this data, we can use the cbind() function, roughly translated as column bind. Of course we can change the order of the inserted data in the brackets. For this R uses our object names as column headers.
And it would look like this with the function rbind() - row bind. The object names are used as row labels.
And we can also append rows, in this case the values from data3:
Of course, the names for the rows and columns don't look that sparkling yet. This can be changed relatively quickly. We simply rename the columns as well as the row headers by using rownames() for the rows and colnames() for the columns:
If we do not like this arrangement, we simply invert the table with t(). The t stands for transpose, which can easily be translated as convert.
And of course we can also carry out evaluations for the table again:
And for individual lines or columns, by putting in square brackets the line before the comma or the column after the comma.
Or simply calculate an average value for the third column:
That was now only a small flyover over very very simple table and statistic functions in the programming language R.
I am glad about suggestions or criticism.
tl, dr;
Simple statistical functions in R and creating and working with tables in this programming language.
Comments (0)