![]() ![]() ![]() R provides a great platform for machine learning and advanced data analytics. So far, we’ve seen how R and ggplot can be used for data exploration activities. The output of this code is a series of visualisations, similar to the one shown below, one for each year from 2013 to 2019. ) # display the list of plots print ( pl ) # numbers metric formatter format_si <- function (. In this example, the R function below de-clutters a plot by setting the theme for the plot then applying additional aesthetic to remove panel grid. One of the key benefits of using a coded approach to generating visualisations is the ability to reproduce and reuse the code with different data sets. For example, performing aggregate summaries on numerical fields (such as the Total Value of offence) by financial year or Offence Description first, then calling the appropriate plotting functions to display the results in a visualisation Reproduce and Reuse Plotting Code The user often needs to continue transforming the data set to make and suitable for producing the different required visuasliations. These tasks are achieved using the code below.ĭf %>% group_by ( OFFENCE_DESC ) %>% summarise ( Value = first ( FACE_VALUE )) %>% top_n ( 5, Value ) %>% # Specify x and y axes ggplot ( aes ( x = reorder ( OFFENCE_DESC, Value ), y = Value )) + # Choose a bar chart geom_bar ( stat = "identity" ) + #geom_label(aes(label=Value)) + # apply a theme theme_light () + # set x axis label labs ( x = "Offence Description" ) + # flip coordinates (x axis becomes y axis) coord_flip () To produce the aesthetics in this plot, I applied a theme setting to control the borders, grid lines and general style elements, specify the text labels for axes and flip the plot coordinate. The above visualisation is based on a bar chart plot. To produce such result in R, the user must query the data, produce a data frame with the required results and then produce a visualisation that meets their requirements, such as the stacked bar chart below. This basic summary is already producing some insights: the minimum offence face value is $20, whilst the maximum offence face value is a whopping $18,455.Īt this stage, the user may want to find out what other offences attract large fines. In the example below, a summary data set is created to inspect the data range of the offences, total number and value of offences, minimum and maximum offence face value, and total number of records in this data set. Producing summaries of the data set is another data manipulation task that are commonly performed at this stage of the analysis. # load the data set while setting the timezone locale to "Australia/Sydney" dfraw % head () dfraw %>% spec () # changing the data type for OFFENCE_MONTH to date dfraw $ OFFENCE_MONTH % dmy () # selecting a sub set of columns df % select ( OFFENCE_FINYEAR, OFFENCE_MONTH, OFFENCE_CODE, OFFENCE_DESC, FACE_VALUE, TOTAL_NUMBER, TOTAL_VALUE ) Some basic data preparation was required to correct the data type for the OFFENCE_MONTH field, and to select a subset of fields that are required for this analysis. I am using the tidyverse package for loading and manipulating the data set. The data set used for this post is the same NSW Roads Offences and Penalties data that I used in my previous posts for exploring the use of PowerBI and Tableau for building data visualisations and stories. The process I followed is summarised in the table of content for this R Markdwon file. Once you have created the rmd file, you now ready to start writing code to perform the usual data preparation and exploration activities.Īll the code used to prepare for this blog post is published as an R Markdown on rpubs at this link. Typically this will be done in an R integrated development environment such as RStudio, a tool that most data scientists are familiar with. To get started with R Markdown (rmd), the user must create an R Markdown, or a Notebook file. In this blog post I will be exploring the use of R Markdown with ggplot to produce visualisations and communicate data insights. It implements a Grammar of Graphics as a scheme for data visualization which breaks up graphs into semantic components such as scales and layers. Ggplot is a data visualization package for the statistical programming language R. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents that can be used to document and share the results of data processing and analysis including visualisations with others. The document contains chunks of embedded R code and content blocks. ![]() R Markdown is a file format for creating dynamic documents with R by writing in markdown language. Using R markdown and ggplots for data visualisation ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |