ML07: What is “Robust” ?

Robust statistics / robust model / robustness

Yu-Cheng Kuo
Analytics Vidhya

--

Robust/robustness is a commonly used but often not elaborated concept in statistics/machine learning. We get started with some instance:
1. Robust: median, IQR, trimmed mean, Winsorized mean
2. Non-robust: mean, SD, range

Outline
(1)
Definition of “Robust”
(2)
Dealing with Errors and Outliers
1. Treating outliers as errors, then removing them
2. Using domain knowledge and finding outliers possible
3. Using robust methods
(3)
Another Instance
(4)
Parametric, Non-parametric and Robust Approaches
1. Parametric approach
2. Robust approach
3. Non-parametric approach
(5)
Another Robust Method: Resampling
1. Resampling
2. Jack-knifing
3. Bootstrap
(6)
References

“All models are wrong, but some are useful” — G. E. P. Box

Figure 1: Fitting data with and without outliers [1]

(1) Definition of “Robust”

Let’s take a close look at the definitions of “robust / robustness” from a variety of sources:

1. Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. [2]

2. A robust concept will operate without failure and produce positive results under a variety of conditions. For statistics, a test is robust if it still provides insight into a problem despite having its assumptions altered or violated. In economics, robustness is attributed to financial markets that continue to perform despite alterations in market conditions. In general, a system is robust if it can handle variability and remain effective. [3]

3. Robust statistics, therefore, are any statistics that yield good performance when data is drawn from a wide range of probability distributions that are largely unaffected by outliers or small departures from model assumptions in a given dataset. In other words, a robust statistic is resistant to errors in the results. [4]

Then, we turn to a classic statistics book Problem Solving: A Statistician’s Guide published in 1988:

4. A statistical procedure which is not much affected by minor departures is said to be robust and is fortunate that many procedures have this property. For example the t-test is robust to departure from normality. [5]

(2) Dealing with Errors and Outliers

Now that we went through all kinds of definitions with few variations, let’s see why we need robust statistics / robust model.

While handling with outliers, we have a couple of approaches at hand:

1. Treating outliers as errors, then removing them

A straightforward method often be adopt without full consideration. Here, we may remove the outliers, leaving the data points with missing values.

2. Using domain knowledge and finding outliers possible

It may be sensible to treat an outlier as a missing observation, but this may be improper if the distribution is heavy-tailed.

Extreme observations which may, or may not, be errors are more difficult to handle. The tests deciding which outliers are ‘significant’, but they are less important than advice from people ‘in the field’ as to which suspect values are obviously silly or impossible and should be viewed with caution. [5]

3. Using robust methods

An alternative approach is to use robust methods of estimation which automatically downweight extreme observations. For example one possibility of univariate data is to use Winsorization by which extreme observations are adjusted toward the overall mean, perhaps to the second or third most extreme value (either large or small as appropriate). However, many analysts prefer a diagnostic parameter approach which isolates unusual observations for further study. [5]

My recommended procedure for dealing with outlying observations, when there is no evidence that they are errors, is to repeat the analysis with and without the suspect values. If the conclusions are similar, then the suspect values “don’t matter”. If the conclusions differ substantially, then one should be wary of making judgements which depend so crucially on just one or two observations (called influential observation). [5]

(3) Another Instance

The assumptions of LDA (linear discriminant analysis) are that features are independent, continuous, normally distributed. If the preceding assumptions are violated, LDA performs badly; then, in this case, regression is more robust than LDA, and neural network is more robust than regression. [6]

After understanding this instance, we shall move on to a larger scope.

(4) Parametric, Non-parametric and Robust Approaches [5]

Now we know that a model is only an approximation to reality. The model can be spoiled by:

(a) Occasional gross errors. (Gross errors are caused by experimenter carelessness or equipment failure. These “outliers” are so far above or below the true value that they are usually discarded when assessing data. The “Q-Test” is a systematic way to determine if a data point should be discarded. [7])

(b) Departures from the secondary assumptions, i.e. distributional assumptions, e.g. the data are not normal or are not independent.

(c) Departures from the primary assumptions.

“Traditional” statisticians usually get around (a) with diagnostic checks, where usual observations are isolated or ‘flagged’ for further study. This can be regarded as a step towards robustness.

Here are 3 approaches we can adopt to tackle with the issues above:

1. Parametric approach

A classical parametric model-fitting approach comes first in our minds. There are 4 main assumption of parametric approach [8]:

{1} Normal distribution of data
{2} Homogeneity of variance
{3} Interval data
{4} Independence

2. Robust approach

Robust methods may involve fitting a parametric model but employ procedures which do not depend critically on the assumptions implicit in the model. In particular, outlying observations are usually automatically downweighted. Robust method can therefore be seen as lying somewhere in between classical and non-parametric methods.

Some statisticians prefer a robust approach to most problems on the grounds that little is lost when no outliers are present, but much is gained if there are. Outliers may spoil the analysis completely, and thus some robust procedures may become routine.

3. Non-parametric approach

A non-parametric (or distribution-free) approach makes few assumptions about the distribution of the data as possible. It’s widely used for analyzing social science data which are often not normally distributed, but rather may be severely skewed.

Non-parametric methods get around problem (b) above and perhaps (a) to some extent. Their attractions are that (by definition) they are valid under minimal assumptions and generally have satisfactory efficiency and robustness properties. Some of the methods are tedious computationally although this is not a problem with a computer available. However, non-parametric results are not always so readily interpretable as those from a parametric analysis. Non-parametric analysis should thus be reserved for special types of data, notably ordinal data or data from a severely skewed or otherwise non-normal distribution.

Let’s further probe into “Nonparametric Tests vs. Parametric Tests” [9], featuring the advantages of each other:

Advantages of Parametric Tests

1. Parametric tests can provide trustworthy results with distributions that are skewed and non-normal
2. Parametric tests can provide trustworthy results when the groups have different amounts of variability
3. Parametric tests have greater statistical power

Advantages of Nonparametric Tests

1. Nonparametric tests assess the median which can be better for some study areas
2. Nonparametric tests are valid when our sample size is small and your data are potentially non-normal
3. Nonparametric tests can analyze ordinal data, ranked data, and outliers

Initial data analysis may help indicate which approach to adopt. However, if still unsure, it may be worth trying more than one method. If, for example, parametric and non-parametric tests both indicate that an effect is significant, then one can have confidence in the result. If, however, the conclusions differ, then more attention must be paid to the truth of secondary assumptions.

(5) Another Robust Method: Resampling [5]

1. Resampling

There are a number of estimation techniques which rely on resampling the observed data to assess the properties of a given estimator. They are useful for providing non-parametric estimators of the bias and standard error of the estimator when its sampling distribution is difficult to find or when parametric assumptions are difficult to justify.

2. Jack-knifing

The usual form of jack-knifing is an extension of resampling. Given a sample of n observations, the observations are dropped one at a time giving n (overlapping) groups of (n-1) observations. (cf. Leave-One-Out Cross-Validation, LOOCV) The estimator is calculated for each group and these values provide estimates of the bias and standard error of the overall estimator.

3. Bootstrap

A promising alternative way of re-using the sample is bootstrapping. The idea is to simulate the properties of a given estimator by taking repeated samples of size n with replacement from the observed empirical distribution in which X1, X2, …, Xn are each given probability mass 1/n. (cf. jack-knifing takes sample size (n-1) without replacement.) Each sample gives an estimate of the unknown population parameter.

The average of these values is called the bootstrap estimator, and their variance is called the bootstrap variance. A close relative of jack-knifing, called cross-validation (CV), is not primarily concerned with estimation, but rather with assessing the prediction error of different models. Leaving out one (or more) observations at a time (i.e. Leave-One-Out Cross-Validation, LOOCV), a model is fitted to the remaining points and used to predict the deleted points.

(6) References

[1] The University of Adelaide (Unidentified). Robust Statistics. Retrieved from

[2] Wikipedia (Unidentified). Robust statistics. Retrieved from

[3] Kenton, W. (2020). Robust. Retrieved from

[4] Taylor, C. (2019). Robustness in Statistics. Retrieved from

[5] Chatfield, C. (1988). Problem Solving: A Statistician’s Guide. London, UK: Chapman & Hall.

[6] Lewis, N.D.(2016). Learning from Data Made Easy with R: A Gentle Introduction for Data Science. [Place of publication not identified]: CreateSpace Independent Publishing Platform.

[7] University of California (Unidentified). Analysis of Errors. Retrieved from
http://faculty.sites.uci.edu/chem1l/files/2013/11/RDGerroranal.pdf

[8] Klopper, J.H. (Unidentified). Assumptions for parametric tests. Retrieved from

[9] Frost, J. (Unidentified). Nonparametric Tests vs. Parametric Tests. Retrieved from

--

--

Yu-Cheng Kuo
Analytics Vidhya

CS/DS blog with C/C++/Embedded Systems/Python. Embedded Software Engineer. Email: yc.kuo.28@gmail.com