Before we built any models, we always look at the descriptive statstics of data first. Usually we have a lot of variable. If we print/plot them in one file, it’s hard for people to search and read. A simple R shinyapp can handle this easily. Currently I’m working on Kaggle competion “Allstate Claims Severity”. (https://www.kaggle.com/c/allstate-claims-severity). The dataset has more than 100 variable. Here I will show you how I built an R shinyapp that you can select the variable and plot you want. You can apply it to any data set and any plot.

### Preprocessing data

After downloading the data, we have two files, the training set and the test set. First we can combine two sets together. Since the test doesn’t have the predictor “loss”, we just need to add a column called “loss” with all “NA” in it. We also need to add a variable for both set to indicate the traning set and the test set. Then we rowbind this two sets into one dataframe. Since the file is too big, we can save it as a smaller RDS file so the shinyapp can read it faster.

test <- read.csv("test.csv")

# There is no loss variable in test file.
test$loss <- NA test$type <- "test"
training$type <- "training" alldata <- rbind(training, test) saveRDS(alldata) ### shinyUI Now we begin to build our shinyapp. A shinyapp is made of two parts, the “shinyUI” part and the “shinyServer” part. The “shinyUI” part is for input, and the “shinyServer” part is for output. We use the “selectInput” function to select the variable, which data set, and the type of plot. All this will be in the list variable called “input”. library(shiny) library(ggplot2) ui <- shinyUI(fluidPage( # Application title titlePanel("Allstate data Descriptive statistics"), sidebarLayout( sidebarPanel( # select dataset selectInput("Dataset", "Training/Test", choices = unique(alldata$type)),

# select variable
selectInput("Variable", "Column",
choices = colnames(alldata)[-1]),

# select plot type
selectInput("Plot", "Plot Type",
choices = c("Histogram", "QQplot"))
),

mainPanel(

# The plot is called Descriptive and will be created in ShinyServer part
plotOutput("Descriptive")
)
)
))

### ShinyServer

The “ShinyServer” part controls the output of shinyapp. First we need to subtract the variables we want. There are two types of variable in this data set, continuous variable and the categorical variable. We need to use different code to plot.

server <- shinyServer(function(input, output) {

output$Descriptive <- renderPlot({ # subset of data plotdata <- alldata[ alldata$type == input$Dataset, input$Variable]

# choose the type of plot
if (input$Plot == "Histogram"){ # whether the variable is continuous or not if (substr(input$Variable, 1,4) == "cont"){

# histogram for continuous variable
ggplot(data.frame(plotdata),aes(x=plotdata))+ geom_histogram()

} else {

# barplot for categorical variable
ggplot(data.frame(plotdata),aes(x=plotdata))+ geom_bar()

}

# if select the QQplot
} else if (input$Plot == "QQplot") { # whether the variable is continuous or not if (substr(input$Variable, 1,4) == "cont"){

# QQplot for continuous variables
ggplot(data.frame(plotdata),aes(sample=plotdata)) + stat_qq()

}}
})
})

# Run the application
shinyApp(ui = ui, server = server)