Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1 Introduction

Abstract

Principal Component Analysis (PCA) is a widely used dimensionality reduction technique used in Machine Learning and Statistics. It is an unsupervised method that helps reduce the complexity of data by transforming it into a lower-dimensional space, making patterns and relationships within the data easier to understand, while preserving the most relevant information.

Keywords:Principal Component AnalysisPCAMachine LearningMLEducationMultimodalModalitiesTeachingLearningComputer ScienceCS
Updated: 13 May 2026

Machine Learning (ML) is a field of study that focuses on designing algorithms that learn patterns directly from data. Rather than relying on explicitly programmed rules, machine learning systems use data to automatically identify relationships, detect structure, and make predictions or decisions.

At the core of machine learning lies data.

A dataset consists of multiple objects (also called data points or samples). To use an object in a machine learning algorithm, we must represent it numerically. But how do we represent a real-world object so that a mathematical function can process it?

We do this by measuring relevant properties of the object. These measurements are called features. All features describing an object are combined into a vector, called a feature vector.

Each feature represents one measurable dimension of the data. If a dataset contains:

A visual representation of one-dimensional, two-dimensional and three-dimensional data.

Figure 1:One-dimensional, two-dimensional and three-dimensional data. Source: Gleeson (2017)

The Curse of Dimensionality

As the number of features (dimensions) in a dataset increases, the data becomes harder to work with and understand.

In high-dimensional spaces:

These issues are known as the curse of dimensionality.

Enter: PCA

To address the curse of dimensionality, we need a way to reduce the number of features while keeping the most important information in the data.

Principal Component Analysis (PCA) achieves this by constructing a smaller set of new features that summarize the original data, allowing us to represent the same information in fewer dimensions without significant loss of information.

Figure of PCA

Figure 2:Dimensionality reduction with PCA. Source: Vutukuri (2025)

Before we can understand how PCA works, we first need to review some essential mathematical concepts from linear algebra and statistics. These foundations are covered in the next chapter.

References
  1. Gleeson, P. (2017). Escaping the Curse of Dimensionality. FreeCodeCamp. https://www.freecodecamp.org/news/the-curse-of-dimensionality-how-we-can-save-big-data-from-itself-d9fa0f872335/
  2. Vutukuri, K. (2025). Principal Component Analysis (PCA) & Dimensionality Reduction. Medium. https://medium.com/@kiranvutukuri/27-principal-component-analysis-pca-dimensionality-reduction-b7ed1b724a02