Visualization of Model Predictions
model.Rmd
The as_visualizer()
function automatically creates
appropriate visualizers for model predictions on tasks with one or two
features. By default, it uses ggplot2
for 1D and 2D
visualizations. For 2D tasks, you can optionally specify
type = "surface"
to get interactive plotly
surface plots.
Let’s start with the california_housing
data set. The
goal is to predict the median house value for California districts. We
subset the data set to only use the features median_income
and housing_median_age
, and sample 2000 observations for
faster rendering. The median_income
feature is the median
income in block group and housing_median_age
is the median
age of a house within a block.
task = tsk("california_housing")
task$select(c("median_income", "housing_median_age"))
task$filter(rows = sample(task$nrow, 2000))
We load the support vector machine learner for regression.
learner = lrn("regr.svm")
Now we create a visualizer object, using the plotly
backend (type = "surface"
).
vis = as_visualizer(task, learner = learner, type = "surface")
First, the learner is trained on the entire task. After that a grid is created for the two features and the predictions of the model are computed for each grid point. The predictions are then visualized using an interactive surface plot.
vis$plot()
Draw with contour lines above z dimension.
vis = as_visualizer(task, learner = learner, type = "surface")
vis$add_contours()$plot()
We can add the training points to the plot using method chaining.
vis$add_training_data()$plot()
We can also flatten the surface to arrive at a 2D contour plot by
using the flatten = TRUE
parameter.
vis$plot(flatten = TRUE)
To switch back to the surface plot, simply use
flatten = FALSE
(or omit the parameter since it’s the
default).
It is also possible to visualize classification tasks. We use the
pima
data set and impute the missing values. We select the
features insulin
and mass
and train a support
vector machine for classification.
task = tsk("pima")
task = po("imputemean")$train(list(task))[[1]]
task$select(c("insulin", "mass"))
learner = lrn("classif.svm", predict_type = "prob")
We create a visualizer object, using the default ggplot2
backend, and plot the predictions.
vis = as_visualizer(task, learner = learner)
vis$plot()
We can add (potential) decision boundaries to the plot using method chaining.
vis$add_boundary(values = c(0.3, 0.5, 0.7))$plot()
For classification tasks, add_training_data()
supports
setting different colors and shapes for the different classes.
For surface plots, the same class-specific styling is supported.
vis_surface = as_visualizer(task, learner = learner, type = "surface")
vis_surface$add_training_data(
color = c(pos = "red", neg = "blue"),
shape = c(pos = 0, neg = 1)
)$plot()