- Missing values
- Date and time formats
- De-duplication
- Outliers
- Normalization

When normalizing a dataset, one maps the original data range into another data range. This is a form of scaling. The generalized steps for normalization of a dataset field is to identify the minimum and maximum values in the orginal dataset field, identify the minimum and maximum values of the new normalized scale, then calculate the new normalized field value of any number x in the original dataset using an equation similar in form to newvalue = (max’-min’)/(max-min)*(value-max)+max’.

Many statistics can be heavily influenced by outliers. One strategy is to set all outliers to a specified percentile of the data. This is called Winsorization of Winsorizing. In Winsorization, extreme values are limited in the statistical data to reduce the effect of these spurious outliers. For example, a 90\% Winsorisation would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile. Winsorised estimators are usually more robust to outliers than their more standard forms, although there are alternatives, such as trimming, that will achieve a similar effect. Note that Winsorising is not equivalent to trimming or truncation. In a trimmed estimator, the extreme values are discarded; in a Winsorised estimator, the extreme values are instead replaced by certain percentiles (the trimmed minimum and maximum). Therefore, statistics derived from the two sets would not be equivalent (e.g. a Winsorised mean is not equal to a truncated mean).

Smoothing a dataset refers to the creation an approximating function that attempts to capture important patterns in the data while leaving out noise and other unimportant patterns. There are many forms of smoothing.