Page Nav

HIDE

Breaking News:

latest

Ads Place

6 Common Mistakes Machine Learning Beginners Make and How to Avoid Them

https://ift.tt/3mStaSS Mistakes I’ve made on my journey and how you can avoid being like me when starting out Photo from Unsplash by Lala...

https://ift.tt/3mStaSS

Mistakes I’ve made on my journey and how you can avoid being like me when starting out

Photo from Unsplash by Lala Azizli

Machine learning is a hot topic that has been growing rapidly in popularity. It’s easy to understand why: AI and machine learning are taking over!

However, it can be overwhelming for those who are just starting out; there’s so much information available on the subject.

I’ve made some mistakes myself when first getting started with machine learning, but I’m here to tell you how to avoid them.

In this blog post, I will discuss 6 common mistakes that beginners make with machine learning and how you can avoid them!

1. Not Cleaning Your Data First

Cleaning up your data before getting started is extremely important. If you are not cleaning the data first, it will be harder to make any machine learning related decisions because of all the “noisy” features that are included in the dataset.

For example, if one of your columns has a string value like “red”, but another column has only numeric values, then there might be an issue with this feature.

Also, you want to remove or replace categorical variables for other numerical ones — after all, we deal mostly with numbers when doing machine learning!

The same goes for missing data: don’t just delete rows where some of the features have missing entries; instead try imputing them using mean/mode values based on their entire distribution (or something similar).

Cleaning the data allows you to make more accurate predictions — thus helping you avoid those pesky mistakes!

To learn how you can clean your data you can check out the post below:

The complete beginner’s guide to data cleaning and preprocessing

2. Ignoring Outliers

Outliers can have a huge impact on your machine learning models, so it’s important that you don’t ignore them.

Sometimes they are simply due to noise in the data, but other times they could be indicative of something more serious (like fraud). If you’re not careful, these outliers can completely skew your results and give you inaccurate predictions.

There are a few ways to deal with outliers:

  • Remove them from the dataset
  • Transform them using methods like Box-Cox transformation or median filtering
  • Use robust estimators like median or trimmed mean instead of the regular mean

How you choose to handle outliers really depends on your data and what type of analysis you’re trying to perform. But no matter what, you should always be aware of them and take them into account!

To learn how to detect and treat outliers check out the post below:

Detecting and Treating Outliers | How to Handle Outliers

3. Starting with Huge Datasets

It’s always tempting to start with a huge dataset when you’re first getting started with machine learning. After all, the more data you have, the better your models will be, right?

Well… not necessarily.

In fact, starting with too much data can actually be harmful to your models. This is because it takes time and resources to train models on large datasets — and if your model isn’t able to accurately predict outcomes, you won’t know which features are actually important (since so many will be included).

So instead of starting with a huge dataset, try splitting it up into smaller chunks and training different models on each one. Once you’ve found a model that performs well, then you can scale-up by increasing the size of the dataset.

This approach will help you avoid overfitting, which can be a huge issue when working with large datasets.

To learn how to deal with different sizes of data check out the post below:

17 Strategies for Dealing with Data, Big Data, and Even Bigger Data

4. Overfitting

Overfitting is a huge problem that beginners face when training machine learning models. It happens when your model is too specific to the data it’s trained on — in other words, if you train your model on small datasets with lots of features and outliers, then there’s no telling how well it will perform once you apply it to real life situations where these variables don’t exist!

To avoid overfitting, try using cross-validation instead of just one single dataset for your analysis. Cross validation allows you to split up the data into smaller chunks so that each chunk can be used as an independent test set (which reduces the chances of overfitting). This approach has worked wonders for me.

If you’re still having trouble with overfitting, then try using a more sophisticated technique like boosting or Bayesian inference. These methods will help you build models that are less likely to be affected by overfitting.

To learn how to deal with overfitting check out the post below:

8 Simple Techniques to Prevent Overfitting

5. Not Understanding the Basic Math

This one’s pretty self-explanatory — if you don’t understand the basic math behind machine learning, then you’re going to have a tough time implementing it correctly.

Luckily, this is something that can be easily fixed by taking some online courses or reading up on the subject matter. Trust me: understanding the basics of linear regression and matrix operations will make your life so much easier!

Once you’ve got a good grasp of the mathematical concepts, try applying them to some real world problems. This is where you’ll really start to learn how everything works.

To learn the mathematics of data science check out the post below:

Mathematics for Data Science

6. Sticking With Just One Model

When you first start out with machine learning, it can be tempting to try and build one model that does everything. However, this is usually a recipe for failure — since different models are good at predicting certain things (while terrible at others).

For example: decision trees tend to perform well when making predictions about categorical data where there’s no obvious correlation between features. But they’re not very useful when trying to make numerical predictions or solve regression problems.

Logistic regression works great for numbers but isn’t so hot with categorical data… And these are just two examples of how different algorithms behave! So if you want your models to have the best chance at being accurate, then use multiple types of analysis on each problem instead of just sticking with one.

This approach will also help you avoid overfitting, since you’ll have several models to compare and contrast.

To learn different models you can use check out the post below:

6 Predictive Models Models Every Beginner Data Scientist should Master

Start Practicing Today

So there you have it — five mistakes beginners make when starting out with machine learning and how you can avoid them! Keep these tips in mind and you’ll be on your way to becoming a machine learning pro in no time. :)


6 Common Mistakes Machine Learning Beginners Make and How to Avoid Them was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/3FQI0ke
via RiYo Analytics

No comments

Latest Articles