Exercise 1 – ML Basics

Exercise 1 – ML Basics

Introduction to Machine Learning

Exercise 1: HRO in coding frameworks

Throughout the lecture, we will frequently use the R package mlr3, resp. the Python package sklearn, and its descendants, providing an integrated ecosystem for all common machine learning tasks. Let’s recap the HRO principle and see how it is reflected in either mlr3 or sklearn. An overview of the most important objects and their usage, illustrated with numerous examples, can be found at the mlr3 book and the scikit documentation.

  1. How are the key concepts (i.e., hypothesis space, risk and optimization) you learned about in the lecture videos implemented?
Solution
# You initialize your `learner` with its properties defined by the parameters
model <- lrn("regr.lm")
print(model)
# Before training them on actual data, they just contain information on the
# functional form of f. Once a learner has been trained we can examine the
# parameters of the resulting model.
x <- seq(0, 8, by = 0.01)
set.seed(42)
y <- -1 + 3 * x + rnorm(mean = 0, sd = 4, n = length(x))
dt <- data.frame(x = x, y = y)
task <- TaskRegr$new(id = "mytask", backend = dt, target = "y")
# Optimization happens rather implicitly as sklearn only acts as a wrapper for
# existing implementations and calls package-specific optimization procedures
# within the function `model.fit()`:
model$train(task)
sprintf("Model MSE: %.2f", model$predict_newdata(dt)$score())
<LearnerRegrLM:regr.lm>
* Model: -
* Parameters: list()
* Packages: mlr3, mlr3learners, stats
* Predict Types:  [response], se
* Feature Types: logical, integer, numeric, character, factor
* Properties: loglik, weights
'Model MSE: 15.10'
# You initialize your "learner" or model with its properties defined by the 
# parameters, e.g.,:
model = LinearRegression(fit_intercept=True)
# Before training them on actual data, they just contain information on the 
# functional form of f. Once a learner has been trained we can examine the 
# parameters of the resulting model.
print(model)
x = np.arange(0, 8, 0.01)
np.random.seed(42)
y = -1 + 3*x + np.random.normal(loc=0.0, scale=4, size=len(x))
# Optimization happens rather implicitly as sklearn only acts as a wrapper for 
# existing implementations and calls package-specific optimization procedures 
# within the function `model.fit()`:
model.fit(x.reshape(-1, 1),y) # reshape for one feature design matrix
print(
    'Model MSE: ', metrics.mean_squared_error(y, model.predict(x.reshape(-1, 1)))
)
LinearRegression()
Model MSE:  15.461825608784347
  1. Have a look atmlr3::tsk("iris") / sklearn.datasets.load_iris. What attributes does this object store?
Solution
tsk("iris")
<TaskClassif:iris> (150 x 5): Iris Flowers
* Target: Species
* Properties: multiclass
* Features (4):
  - dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
iris = load_iris() 
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names
print("Type of object iris:", type(iris))
print("Feature names:", feature_names)
print("Target names:", target_names)
print("\nShape of X and y\n", X.shape, y.shape)
print("\nType of X and y\n", type(X), type(y))
Type of object iris: <class 'sklearn.utils._bunch.Bunch'>
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

Shape of X and y
 (150, 4) (150,)

Type of X and y
 <class 'numpy.ndarray'> <class 'numpy.ndarray'>
  1. Instantiate a regression tree learner. What are the different settings for this learner?
  • R Hint: use lrn("regr.rpart") (mlr3::mlr_learners$keys() shows all available learners).
  • Python Hint: use the DecisionTreeRegressor module and use get_params() to see all available settings.
Solution
# List available learners in base mlr3 package
mlr_learners$keys()

# Inspect regression tree learner
lrn("regr.rpart")

# List configurable hyperparameters
lrn("regr.rpart")$param_set
  1. 'classif.cv_glmnet'
  2. 'classif.debug'
  3. 'classif.featureless'
  4. 'classif.glmnet'
  5. 'classif.kknn'
  6. 'classif.lda'
  7. 'classif.log_reg'
  8. 'classif.multinom'
  9. 'classif.naive_bayes'
  10. 'classif.nnet'
  11. 'classif.qda'
  12. 'classif.ranger'
  13. 'classif.rpart'
  14. 'classif.svm'
  15. 'classif.xgboost'
  16. 'clust.agnes'
  17. 'clust.ap'
  18. 'clust.cmeans'
  19. 'clust.cobweb'
  20. 'clust.dbscan'
  21. 'clust.diana'
  22. 'clust.em'
  23. 'clust.fanny'
  24. 'clust.featureless'
  25. 'clust.ff'
  26. 'clust.hclust'
  27. 'clust.kkmeans'
  28. 'clust.kmeans'
  29. 'clust.MBatchKMeans'
  30. 'clust.mclust'
  31. 'clust.meanshift'
  32. 'clust.pam'
  33. 'clust.SimpleKMeans'
  34. 'clust.xmeans'
  35. 'regr.cv_glmnet'
  36. 'regr.debug'
  37. 'regr.featureless'
  38. 'regr.glmnet'
  39. 'regr.kknn'
  40. 'regr.km'
  41. 'regr.lm'
  42. 'regr.nnet'
  43. 'regr.ranger'
  44. 'regr.rpart'
  45. 'regr.svm'
  46. 'regr.xgboost'
<LearnerRegrRpart:regr.rpart>: Regression Tree
* Model: -
* Parameters: xval=0
* Packages: mlr3, rpart
* Predict Types:  [response]
* Feature Types: logical, integer, numeric, factor, ordered
* Properties: importance, missings, selected_features, weights
<ParamSet>
                id    class lower upper nlevels        default value
 1:             cp ParamDbl     0     1     Inf           0.01      
 2:     keep_model ParamLgl    NA    NA       2          FALSE      
 3:     maxcompete ParamInt     0   Inf     Inf              4      
 4:       maxdepth ParamInt     1    30      30             30      
 5:   maxsurrogate ParamInt     0   Inf     Inf              5      
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]>      
 7:       minsplit ParamInt     1   Inf     Inf             20      
 8: surrogatestyle ParamInt     0     1       2              0      
 9:   usesurrogate ParamInt     0     2       3              2      
10:           xval ParamInt     0   Inf     Inf             10     0
# help(DecisionTreeRegressor) 
rtree = DecisionTreeRegressor() # default setting
print(rtree.get_params())
{'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'random_state': None, 'splitter': 'best'}

Exercise 2: Loss functions for regression tasks

In this exercise, we will examine loss functions for regression tasks somewhat more in depth.

set.seed(1L)
x <- runif(20L, min = 0L, max = 10L)
y <- 0.2 + 3 * x
y <- y + rnorm(length(x), sd = 0.8)

ggplot2::ggplot(data.frame(x = x, y = y), ggplot2::aes(x = x, y = y)) +
  ggplot2::geom_point() + 
  ggplot2::theme_bw() + 
  # ggplot2::geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
  ggplot2::annotate("point", x = 10L, y = 1L, color = "orange", size = 2)

  1. Consider the above linear regression task. How will the model parameters be affected by adding the new outlier point (orange) if you use
  1. \(L1\) loss
  2. \(L2\) loss in the empirical risk? (You do not need to actually compute the parameter values.)
huber_loss <- function(res, delta = 0.5) {
  if (abs(res) <= delta) {
    0.5 * (res^2)
  } else {
    delta * abs(res) - 0.5 * (delta^2)
  }
}

x <- seq(-10L, 10L, length.out = 1000L)
y <- sapply(x, huber_loss, delta = 5L)

ggplot2::ggplot(data.frame(x = x, y = y), ggplot2::aes(x = x, y = y)) +
  ggplot2::geom_line() + 
  ggplot2::theme_bw()

  1. The second plot visualizes another loss function popular in regression tasks, the so-called (depending on \(\epsilon > 0\); here: \(\epsilon = 5\)). Describe how the Huber loss deals with residuals as compared to \(L1\) and \(L2\) loss. Can you guess its definition?

Exercise 3: Polynomial regression

Assume the following (noisy) data-generating process from which we have observed 50 realizations: \[y = -3 + 5 \cdot \sin(0.4 \pi x) + \epsilon\] with \(\epsilon \, \sim \mathcal{N}(0, 1)\).

  1. We decide to model the data with a cubic polynomial (including intercept term). State the corresponding hypothesis space.

  2. State the empirical risk w.r.t. \(\boldsymbol{\theta}\) for a member of the hypothesis space. Use \(L2\) loss and be as explicit as possible.

  3. We can minimize this risk using gradient descent. Derive the gradient of the empirical risk w.r.t \(\boldsymbol{\theta}\).

  4. Using the result for the gradient, state the calculation to update the current parameter \(\boldsymbol{\theta}^{[t]}\).

  5. You will not be able to fit the data perfectly with a cubic polynomial. Describe the advantages and disadvantages that a more flexible model class would have. Would you opt for a more flexible learner?

Exercise 4: Predicting abalone

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Solution Lorem ipsum dolor sit amet.

foo # bug: needs text between embeddings (issue)