How To Find Outliers in Data Using Python
…And How To Handle Them
This article was originally written for CareerFoundary.com. Check them out if you’re interested in becoming a UX designer, UI designer, web developer, or data analyst from scratch
Outliers are important
Identifying and dealing with outliers can be tough, but it is an essential part of the data analytics process, as well as for feature engineering for machine learning. So how do we find outliers? Luckily, there are several methods for identifying outliers that are easy to execute in Python using only a few lines of code. Before diving into methods that can be used to find outliers, let’s first review the definition of an outlier and load a dataset. By the end of the article, you will not only have a better understanding of how to find outliers, but also know how to work with them when preparing your data for machine learning.
We’ll cover all of this using the following headings:
- What is an outlier?
- How do you find outliers in your dataset?
- Finding outliers using statistical methods
- Working with outliers using statistical methods
- Wrapping up and next steps