Aaron Schroeder

Uniqueness

The concept of data uniqueness can be generalized as the number of unique valid values that have been entered in a record field, or as a combination of record field values within a dataset. Uniqueness is not generally discussed in terms of data quality, but for the purposes of answering research questions, the variety and richness of the data is of paramount importance. Most notably, if a record field has very little value uniqueness (e.g. entries in the field ‘State’ for an analysis of housing within a county, which of course would be within a single state), then its utility would be quite low and can be conceptualized as having low quality in terms of the research question at hand.

A basic birth year distribution plot is shown using R and SQL:

values_birth_year = dbGetQuery(con, "SELECT birth_year
                                     FROM student_mobility_fields_2005_2015")

# frequency distribution plot of birth_year values
birth_year_frequencies = table(values_birth_year$birth_year)
barplot(birth_year_frequencies, main="Birth Year Value Distribution", horiz=TRUE)

Data Science Framework

Data Discovery

Data Profiling

Data Preparation

Data Linkage

Data Exploration

Data Analysis

Data Fitness

Data Science Platform

Server Configuration

Workstation Configuration

Code Examples & Tools

Code Examples

Tools

Data Management

Project Management

Uniqueness