https://ift.tt/RujxqvZ Identifying the signals of a failing data science project Photo by Brett Jordan on Unsplash Attack every probl...
Identifying the signals of a failing data science project
Attack every problem with tenacity, but don’t be ignorant to apparent failure.
Data science is wrought with left turns and dead ends. It’s a byproduct of the curiosity we have to follow in order to answer questions about poorly understood problems. I find it’s in the nature of most data scientists to get excited when they encounter a problem they don’t know the answer to. Often finding that answer leads to something extraordinary or at least teaches you something you didn’t know before. The trouble is, knowing the difference between navigating by machete through the deep wilds and admitting you’re lost and it’s time to call for an airlift out of the forest.
The point isn’t to seek failure — it’s to fail fast
We never take on a project assuming it won’t work. You may take on something challenging, something unknown, something new that you don’t quite yet know how to solve; but your trek always starts uphill with the intention to find an answer. These are often the most fruitful undertakings that lead to breathtaking results. It’s just sometimes we find, after many attempts, that the answer is simply out of reach… and it’s not always obvious.
Stick with me as we continue the hiking analogy. When you’re on the trail, you can compare your surroundings to your expected route on the map. It takes skill in triangulating where you are at, and some are more versed than others, but you have a general guide to know when you are heading off course. That doesn’t mean you abandon the hike when you find yourself in unexpected territory, it just means you need to readjust to ensure you arrive at your final destination. Taking time to explore is part of the experience. However, if you find yourself unable to find a route back then it's time to stop, assess the situation, and consider bailing out instead of roaming around aimlessly.
So what we need is a map for our data science projects. How can we determine we are too far off course for any further correct? How do we overcome our own stubbornness of “no, I got it, it’s all good”? Well, through the same means when we get physically lost. Take a moment to assess the situation, ask yourself a few key questions, and be honest with yourself.
Assessing The Situation
Your answers to these 5 questions will allow you to separate temporary frustrations apart from a smoldering dumpster (or maybe it’s already on fire and you’ve just been enjoying the warmth).
1. Do I understand the original problem better now than when I started?
If you are unable to articulate the issue in greater detail or have a better grasp on the minutiae behind the scenes then it’s likely you need to explore more. Even the vaguest projects impart some clarity the more you work on them, so don’t chalk it up to a bad request just yet. Maybe answers aren’t in the current dataset you’re working with, maybe it’s time to perform interviews with other SMEs, or you could seek out experiencing the process first-hand as a user. Regardless, this is a signal to not jump ship without becoming more knowledgeable of the problem.
To be clear, this decision doesn’t come during the problem discovery phase. Of course, you will meet with SMEs to take time to understand the request. If you haven’t performed any exploratory analysis or at least attempted some form of modeling, then you haven’t gone deep enough. As important as interviews are, we should never make a call to bail on them alone. Regardless if the problem seems tough from their point of view, we still need to give it a respectable shot.
2. Do I know why it’s not working?
Okay so you know more, awesome, you’ve run multiple experiments, cool—the output is still trash. Alright, easy there. Is it not working because you don’t have X feature? Are your training records ‘completed’ data and production runs on ‘in-progress’ records? This may seem obvious, but honestly ask yourself if you know why it sucks and isolate any barriers. It’s extremely uncommon for those affected by the project not to be interested in identifying methods to help you overcome them. Just ask.
Here are some common questions to help you probe deeper:
- How can I better visualize my features or results?
- How fast does the performance stabilize during training?
- Are any features not normalized (if relevant to the model)?
- How old is my training/test data compared to production data?
- Do certain segments perform better than others?
3. What ideas do I have left to try?
Now the head-shaped dent in the wall seems to be deepening. Make sure the ideas you are entertaining are not out of desperation. If you’ve been going round-and-round for weeks or months and you’re throwing stuff at the wall just to see if it sticks, then you may not have real ideas to try. I’m not saying that throwing crazy at the project doesn’t sometimes lead to fun success. I’m alluding to the fact that you are reading this article and are 3 questions deep so far. Truly evaluate if you have legitimate ideas. If so, then write them down and once you are at the end of the list with no further success, then it may be worth taking a hint.
There are many techniques depending on the model type, the volume of data, balance of classes, and more — this is likely worth a post all on its own. However, I want to leave you with some guidance. Below are the techniques that I’ve experienced to provide the biggest benefits.
- Apply a weak learner first, or segment the data based on a key feature and train separate models. This could either lead to deploying an ensemble model or simply learning more about a segment of your data and enhancing your previous model.
- Try more advanced methods to address class imbalance, beyond your basic up or down sampling. This is assuming you originally addressed the class imbalance from the start… which you did already, right?
- Use fewer features. Might sound unintuitive to some, but we often inject our own bias into how we think prediction should work. This step is ridiculously simple to try, and sometimes it might surprise us.
- Simplify the model. Complex problems don’t often require complex solutions. One great way to experience this is interacting with Google’s Tensorflow playground, more layers and neurons != smarter model.
4. What am I not working on because of this?
Whether you are working on a personal or professional project, there is a high likelihood there is something else waiting in the wings. If not, then there may not be much harm in continuing to your wheels. I expect that’s not the case though. It may not even be another data science project, maybe this is robbing your time from taking that new class. You can (usually) return back to this project and try again if it’s really needed. This is a big question that really asks if your time would be better spent elsewhere. Take a moment to truly reflect on that.
5. Have I learned something new?
I save this as the last question for those who are like me that just hate feeling like you have failed. You might have poured everything into this and you just don’t want to let go. It could be that you were unable to answer this problem but in doing so you gained a deeper understanding of something you were only a beginner in. Or maybe you learned about a method that you’ve never previously heard about. This is the epitome of growth. This attempt, although not completely successful, has made a meaningful contribution to your future capabilities. Don’t discount how valuable that is. I don’t doubt you started your data science journey working on nonsense projects just to learn how to stand up a neural network. If you found value in that, then why wouldn’t you find similar value in this attempt.
What Moving On Means
You may have noticed I have not mentioned anything about “what are the consequences of not completing the project”. I find answering this question is often irrelevant. You wouldn’t have begun this project if it wasn’t originally deemed worthwhile. Why would the importance change from when you started to now? Answering why you started the hike in the backcountry provides no insights into determining how lost you are.
Even if the stakes are high and jumping ship still makes you uncomfortable then take the learnings you now have and reframe the problem. Maybe you can’t solve the entire issue but there could be components that are addressable. Alternatively, you could have uncovered a related problem that is more appropriate for a data science solution. Don’t be afraid to recognize that your time may be better spent elsewhere.
“When we give ourselves permission to fail, we, at the same time, give ourselves permission to excel.” ― Eloise Ristad
When Giving Up Is Productive was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Towards Data Science - Medium https://ift.tt/rsO6tc7
via RiYo Analytics
No comments