Page Nav

HIDE

Breaking News:

latest

Ads Place

11 Ways to Learn More Data Science

https://ift.tt/3q6vPdG Use these methods to take your expertise to the next level! From Aaron Burden on Unsplash https://unsplash.com/pho...

https://ift.tt/3q6vPdG

Use these methods to take your expertise to the next level!

From Aaron Burden on Unsplash https://unsplash.com/photos/xG8IQMqMITM

0. But first, a note on reading about stuff that you “already know”

I’m passionate about education. I’ve been a teacher at many grade levels, and I own a tutoring center that serves kids from age 4 to 18. I’ve tutored hundreds of students myself over 10 years. I’ve spent a lot of time trying to teach concepts, to students, peers, friends, direct reports, you name it. I say this because there is one thing that I beg you to listen to, and it’s the number one issue I’ve seen in students at all levels:

Our self-assessment almost always overestimates our mastery of a topic.

We just don’t know what we don’t know. People aren’t great at seeing where their own understanding has small gaps. For any topic, we have a few lines of knowledge that we can spout, but we just aren’t aware of the edge cases that exist until we see them. We don’t have all the knowledge of how every topic intersects with every related one, and many times, those answers are not easy to figure out. Therein lies expertise. Therein lies why experience is valuable.

There is so much about even the basic Data Science topics that we haven’t yet come across.

For that reason, I find value in reading about topics that I kind of already know. There are always new angles, new ways of explaining it, new ways to shed more light. Statistics is hard. Coding is hard. We have a collection of knowledge nuggets, but there exists statistics questions and coding situations that will completely stump you. And frankly, it’s because our understanding after a couple years of study is still shallow.

In addition, even if a read gives you no new knowledge nuggets, being repeatedly exposed to a topic depends your familiarity. And when you’re more familiar with a topic, when you’ve spent more time thinking about it (ideally from multiple perspectives) and strengthened your brain’s neural connections around that subject, you’ll be a quicker and clearer thinker about that topic.

So be proactive about reading everything you can about Data Science, even if you think you already know it. Read pages on common situations you’d find in data science, business analytics, and coding.

Sure, if you know the basics, sometimes you can figure out all the corollaries on your own, if you think about the right problem for long enough. But reading other people’s life experience, perspectives, and explanations is a shortcut to mastery.

That being said, let’s dive into where we can get that knowledge.

I. Buy Used Books

There are a million books relevant to Data Scientists. From classics like the O’reilly series that many consider a must-read, to the seminal textbooks of statistics and artificial intelligence, to the little-known explorations of an experienced analyst who decided to write down a career worth of insights, there is more data science knowledge than you’ll ever actually need in the sum total of all of these books.

Honestly, I just search amazon for topics that I’m working on, and buy a bunch of relevant used books. Sure, some turn out to be not the right fit. But for $5 a book, and considering the massive impact that even 1 good Data Science book can have on your career, I figure my time is better spent reading them than researching which ones to buy.

So don’t be shy, go grab 15 DS/Stats books for 5 bucks each. Sure, you might end up not reading half of them because they’re not exactly relevant / worth reading at this time. But I promise that you’ll end up getting countless helpful knowledge nuggets from what you do read.

Not to mention the context and confidence from wrapping your head around the full scope of what there is to know. Just reading the table of contents of a dozen books can round out your awareness of what there is to know.

II. Get Recommendations from Medium

Well you’re already here on Medium, so you may already be aware of the sheer volume of Data Science articles that exist on this platform. There are great people to follow, and great channels/publications full of great articles: In addition to the mountain of articles that is Towards Data Science, there are publications or individual writers that focus on statistics, causal inference, deep learning, time series, business analytics and strategy, product management — whatever you need to know.

Conference talks, white papers and academic publications are of course still the formal channel for research developments. But if you aren’t beating a benchmark, they aren’t really a place for the details of your data science methodology.

Medium is a place where entry, mid, senior, and even executive level people can discuss their Data Science methods and process. It may even be the central channel through which the majority of career data scientists are communicating their ideas and methods these days. The libraries and tools they use, how they think about problems, what their code looks like, and common points of confusion — it’s all there. Many of these articles walk you through a whole methodology step by step, with code and explanations interspersed. It’s like having a million colleagues, in that you can learn from their every step and every thought, every day. Because even if you understand a topic like regularization, there is still plenty of discussion to have around what happens in practice.

Of course Medium is hit and miss — anyone can publish. The popular titles can be clickbait. But it’s more concise than a whitepaper, more skimmable than listening to a talk, and the best articles are probably more relevant to your work than the new Transformers paper.

So use it well. Follow the writers and publications who are writing about topics related to your work (or desired work). Set it up so that you can get email updates when relevant articles are written. On my android phone, I’ve got the Medium widget always displaying ~10 relevant articles, and I read the new 10 on there every day.

III. Subscribe to email newsletters

There are a ton of Data science email newsletters that are well-written and provide weekly or monthly updates on hot topics in the field. Subscribing to The Batch or Import A.I. keeps me updated on applications of Data Science, dilemmas that are rising in the field, and interesting new developments.

To find these, there are a number of Medium articles where people have kindly compiled a list of recommendations. If a newsletter is free, why not subscribe?

There’s also Substack — where you can pay to subscribe to an email list. Like with used books, there’s a high ROI when you invest in learning about your career. Don’t be too shy about some burrito money if it’s the cost of leveling up your career. The financial dividends alone will be well worth it — let alone the meaning and purpose that we can get from thriving in our field.

So, subscribe to all the good Data Science emails that you can find. While you may ignore some of those emails, the ones you do read will offer valuable insights into Data Science. They tend to be pretty fun too! Enjoy!

IV. Read the docs, the guides, and yes, even the code.

There’s a ton of value in actually reading all the available information on a Python library or Data Science tool that you are using, or considering using. Yes, the entire documentation guide.

Whether it’s the user guide for Databricks or the entire scikit-learn API, there’s a ton to learn about what each tool can do, how it works, and perhaps even why it works that way.

Don’t get too comfortable just using one functionality of a tool and never learning what else it can do. You might be surprised by the utilities offered by a common tool.

That note includes Python itself. Don’t live in ignorance of all of Python’s wonderful functionality. Read it’s documentation/API. There are a ton of methods that will come in handy, like memoization, serialization, I/O, concurrency, collections, itertools, functools and operators, and pathlib. Don’t forget about pandas, numpy and scipy too.

You can even learn general Data Science that’s valuable outside the tools. The scikit-learn API is a nice survey of models, and has a couple of charts comparing them. Similarly, many causal inference libraries have guides that contain notes on the Math of causal inference, charts and flow diagrams that discuss model pros and cons, as well as in-depth discussions of business problems where causal inference can be utilized.

There’s another layer of depth that seems a bit daunting to entry-level Data Scientists: reading the entire codebase of a great Python library. I’ve learned a ton from reading the code for spaCy. You’ll become aware of every class, method and attribute, gain a deeper understanding of how they actually work, and learn how to write good code!

Of course, the more you read, the more you learn. Reading the whole API or the entire source code of a tool is very educational. But of course, consider when it is and isn’t worthwhile. A repo with a ton of stars and contributions is probably a good lesson in how to write code, but an unknown repo might teach you anti-patterns. Or if you’re doing a spike and considering a bunch of different libraries for your project, reading the entire API of each one is probably a bit more in-depth than necessary (but it is a darn good way to understand the differences between libraries!). As always, do a quick cost-benefit estimate to decide what level of depth is appropriate for you at this time.

But if the only barrier is our laziness, then perhaps it’s time to read the docs :)

V. Data Science Courses

I think we’ve all heard of online data science courses.

For one, there are a hundred startups that tout the ability to get you a job in Data Science if you just complete their mini-degree in data science. Sure, these courses do tend to focus on the basics, but truth be told, there’s plenty of good information in them. If you can take some mini-courses for free or cheaply from these companies, then don’t pass up on the free knowledge.

MOOC platforms like Coursera, Udacity, and EdX have their own share of Data Science courses, often taught by professors in conjunction with a university. Perhaps they promise a certificate that will make you resume stand out. Now I can’t promise that the certificate has value, but I do think the knowledge in the course certainly has value.

Udemy has a number of courses as well, which can be a succint, 20 hour walkthrough of a tool or topic that’s important to your career. Courses are only ~$10, so perhaps that’s more valuable than getting guac on your burritos this week. Again, think of the ROI in your career in the long term.

Many top Universities have also posted their Data Science courses online. You can find the syllabus, including the lecture slides, book/paper recommendations, and assignments, and essentially take the course on your own.

And don’t just look for the course titled “Data Science.” Searching for Artificial Intelligence, Causal Inference, Bayesian Probability, Statistics and Decision Making will all yield courses that will teach about Data Science. And let’s not neglect the coding courses either. From basic python to mastering event buses and concurrency, there are always plenty of courses that would take your knowledge to the next level.

Extra note: spaCy has its own course for learning NLP with spaCy. Hopefully more top-notch libraries will follow this trend and at least offer a loom walkthrough.

VI. Watch Youtube

If Medium is a mountain of words about Data Science, then Youtube is an ocean of images and sounds.

Like Medium writers, there are tons of Youtube creators making Data Science content. People screenshare their code or make nice slides as they discuss core concepts. If you dig a bit, you can find people who give excellent explanations, and will level up for sure. There are videos from Kaggle employees working through problems, cutesy walkthroughs of how every ML model works, and hot new papers summarized into 5 minutes.

Some of the previous sections in this article, like user guides and online courses, exist on Youtube as well. Companies like Snowflake post instructional videos, which you can watch to gain mastery of the tool. As for courses, you can watch every lecture from an entire university course on Data Science, Linear Algebra, Optimization, Causal Inference or Programming.

There are also tons of recorded talks from conferences that are relevant to your Data Science career. There are talks on papers, talks from professors, talks about a method like survival analysis — they‘ll all show up if you search for them.

In addition, you can search for a company’s data science talks (maybe look up key team members). Be the Data Scientist on your team that watches the talks on the ML architectures at the FAANG companies, or on how data science is used to solve a business problem at a company that’s focused on that problem.

From Google’s recently popular Making Friends with ML, to the ever-charming StatQuest, to talks from the professors and companies who are inventing causal inference every day, there’s surely a career worth of knowledge on Youtube alone.

VII. Read all the Google Results

Googling is a rabbit hole, but it’s so, so valuable.

Whenever I’m curious about or working on a topic, let’s say “regularization on categorical variables”, I google that phrase. Then I open at least the first 10 links, read those pages, and take notes.

The results can be Wikipedia, stack exchange or other forums, blog posts and medium articles, research papers and white papers, a professor’s website, an excerpt from a book, and plenty more.

As I read those, I tend to find insights that deepen my understanding. I also arrive at deeper questions, which I then Google. As this Googling branches into many related questions, it helps to copy all of those search results into a doc; and keep those notes and questions organized.

Or rather than searching by the data science concept, sometimes I search by the business problem we are solving. How have people solved this before? What considerations are relevant — related problems and related solutions.

I suppose this category is a catchall for a number of different resources. But indeed there’s no substitute for vigorously Googling what you’re working on. A good Data Scientist doesn’t reinvent the wheel, they obtain and apply the knowledge of all the statistics and data science projects that have already been done.

VIII. Interview prep content

There’s a ton of interview prep content out there, and it can provide plenty of helpful knowledge for your data science career. I love reading articles titled something like “50 tricky Data Science interview questions,” because I always learn something I didn’t know about a model like catboost, or a goal like dimensionality reduction or explainability. The get-you-a-data-science-job startups have their own share of interview prep content with similar compilations of key data science information.

Interview query has a ton of questions around every facet of real Data Science work. From product release questions about causality to practicing SQL questions, there’s infinite content to deepen your understanding. They also write a ton of content providing frameworks, tips and insights for approaching data science problems and choosing the right models.

I’ll also mention Brilliant.org, where I’ve found some lovely educational pieces. Their article on linearity of expectation taught me plenty. Outside of being ever-prepared for interviews, it’s fun to learn some tricky math, and it deepens our understanding of probabiltiy and statistics.

You can also watch videos of practice interviews. People post examples of FAANG interview questions being answered. Watching these through always leaves me with a new idea, whether its how to address non-normal data distributions, or a framework for asking the right questions about a business problem.

IX. Competitions

Finally, where there are competitions, there is knowledge.

Reading through all the historical Kaggle competitions provides endless examples of problem formulations — the datasets and objectives that data science can be used for. If it’s your role to come up with or reorient a data science project, there’s no substitute for reading through examples of sensible inputs & outputs.

You can also read through Kaggle notebooks, where people have written the code to clean datasets and run ML models. Not only can those provide examples of good code, or methods you didn’t know about, but it can also provide a starting point if you need to clean a similar dataset or run a similar ML model.

Similarly, Leetcode questions are a great learning opportunity. It may not be Data Science, but mastering Python makes our jobs easier. Reading through Python solutions to Leetcode questions has deepened my familiarity with some Python methods, attributes and techniques, which perhaps I had seen, but wouldn’t have readily used on my own.

Doing the Kaggle and Leetcode competitions will surely be helpful. As will doing the challenges on your own time, outside the competition. But just reading through the solutions or submissions of other people can teach us a lot. Some argue that there’s no substitute for doing it without help, perhaps because it forces you to practice a process. But you can read through far more solutions if you go straight to them — and as for process, you should probably learn a good one from an expert rather than coming up with your own.

X: Data Science Twitter

Make a Twitter just for Data Science, and just follow Data Science posters. It’s really helpful to be able to click into the Twitter app and just see posts on recent papers, and insightful questions about where the field is heading. Follow professors, data science companies, and data science content creators. Some accounts are awesome and drop a one sentence summary on a new paper with surprisingly high frequency.

Don’t water that great content down by filling your feed with distracting content. Save that for another account. Perhaps more generally, get these data science email subscriptions, youtube recommendations and medium widgets front and center, and reduce the other content. It’s your career, isn’t it? I promise, being consumed by learning is ultimately very rewarding.

XI: Work on More Business Problems

This list probably should’ve been 5 or 7 ways, and then I decided on 10, but here we are at 11, because we can’t leave this one out.

I’ve learned a ton by working with Data Scientists, as well as other stakeholders like Product Managers to create end products that move the needle for a company.

Certainly there’s the obvious reason, that you can learn directly from other Data Scientists. They might teach you math, models, or causal concepts that you didn’t know. You might learn new libraries, methods and syntax from how they code. But there are many other ways that you’ll be forced to learn by the dynamics of working toward a real business goal.

Producing the analytics to inform major product initiatives requires nuanced discussion, and can even be counterintuitive.

Causal inference almost always rears its head when it comes to making business decisions with data. When Analysts and PMs notice correlations in the data, the company should only take actions if these correlations are actually aligned with causality. If not, then it may be the exact wrong business action. If we are comparing groups where other key covariates are unbalanced, we need to correct for that to derive business decisions. We might need to create quasi-experiments, which may push us to speculate about all of these causal forces in the business, and using the proper causal inference data science.

In addition, calculating the uncertainty that matters to the business, and making decisions with those in mind can be really tricky. While PMs know to look at statistical significance for their A/B tests, we need to ensure that the correlation-driven ideas are also based in statistical significance, using the proper error analysis to answer the question being asked.

Any new prediction or causal inference problem can make us think about data representation and feature engineering in new ways. All of the predictors or covariates for a measured treatment and/or outcome can be represented in so many ways, but which representation is relevant to outcome. Especially in causal inference where we aren’t just driving up accuracy, we have to ask which representation really captures the confounding, and doesn’t control for the wrong aspects (mediators)? These problems can help us find connections in data science that we might not have been found otherwise.

Creating the right product features can involve the same issues. For example, if you show a user a heat-map that they’ll use as suggestions for their actions, then that heat-map should be a map of true treatment effects, and the color differences should be chosen carefully to reflect the uncertainty in those treatment effects. The product need forces us to take on data science questions that we may not have thought of.

Further, doing data science in a business means optimizing or working with finite resources for development, training, prediction, memory, storage, and more. You may need to explore new ways to sample data to meet those constraints, but you’ll still have to address those questions around causality and errors! That’s way beyond blindly running a scikit-learn model!

Each new business problem will push you into a new set of questions, so embrace the opportunity to work on many of them.

Thanks for reading! I hope something in this list can be incorporated into your learning process.

Check out my other articles:


11 Ways to Learn More Data Science was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/3K58LnV
via RiYo Analytics

ليست هناك تعليقات

Latest Articles