Page Nav

HIDE

Breaking News:

latest

Ads Place

How to Explore Machine Learning and Natural Language Processing as a High School Student

https://ift.tt/LVewE5O A simple, realistic guide from one HS gal to another Photo by fabio on  Unsplash Hey there! Do you love deciphe...

https://ift.tt/LVewE5O

A simple, realistic guide from one HS gal to another

Photo by fabio on Unsplash

Hey there! Do you love deciphering the nuance of languages, mulling over creative writing pieces, and churning out breaking news stories for your school newspaper? But are you, perhaps at the same time, fascinated by all the funky things you can do with strings after that AP Computer Science Exam in May?

Man, you remind me a lot of myself! And if you’re here after discovering that WHOA — there’s an interdisciplinary field that applies Computer Science and Language both (*ahem the computational analysis of language*), but all the guides online are either geared towards adults or too intimidating just for some exploration purposes — well, this guide is perfect for ya!

By the end, you will have a good general view of machine learning and natural language processing, including key concepts and why they are so important today with only a high schooler’s background. As for a tangible product, you’ll walk away with a small mini project, the knowledge of which you can then apply to your future endeavors!

P.S. — I promise you that the further you get into this guide, the more fun it gets. (with puzzles at the end!) The beginning may be an onslaught of *familiarize yourself with this, familiarize yourself with that* droning (I hope I don’t sound too much like Google-Translate reading dry junk mail haha), so I don’t fault you if you fall asleep, but you got this! And it’ll be worth it!

Table of Contents

Introduction

Now, before I begin, I do want to make a disclaimer: If you’re here hoping to turn into an expert in Machine Learning (ML) and Natural Language Processing (NLP), this guide is not for you. This guide is meant to help high schoolers or any other curious soul who is genuinely intrigued by this fascinating field and would love a starting point for exploring some hands-on, approachable action items in the field, even if you don’t have a research or university/industry mentor to guide you. (In fact, if you do complete this guide, I think it’s a great way to demonstrate interest in ML and NLP when applying for summer research summer programs that do require background knowledge in this field later on!)

TL;DR — I have constructed this guide considering the usual barriers high schoolers have in terms of education level and time (including no prior research/project experience), so I have made certain generalizations that applied to me and many I know; Of course, there are many exceptions (As a matter-of-fact, I know so many amazingly advanced, talented, humble, classmates), but I hope that this guide is accessible to all, regardless of starting level.

With that said, let’s explore!

Prerequisites

Although it would be ideal (from any professional-level standpoint) to wield knowledge in Calculus, Linear Algebra, and other deep fundamentals, anyone who is currently going through or has gone through the American schooling system may note that:

  • Math Limitations: Most don’t learn Calculus until junior or senior year, (freshman or sophomore year for those who are especially ambitious,) which means realistically, it may be difficult for a curious 14-year-old kid to explore a field by buying a linear algebra textbook and trying to understand the fundamentals when they haven’t even finished Algebra 2 Honors (the prerequisite to Pre-Calc) at school.
  • Time & Priorities: Students have lots of things on their plate, whether it be music, sports, art, dance, clubs, local tutoring, or other interests, so time is of utmost importance.

As a result, these prerequisites are conscious of the two caveats above, while also making exploration super fun if you truly are fascinated about the field. (Feel free to skip this if you already know basic Python!)

1. Learn Basic Programming…

…wherever convenient, whether it be at school or online (APCS, summer community college, Udemy, Edx etc). This part is unavoidable, but not impossible!

If you’re someone like me, who found out about ML after taking the AP Computer Science A exam or a summer community college Java I course, perfect! With a good knowledge of classes, functions, arrays, and basic programming, you’re all set for this prerequisite and can jump straight to Step 2!

Even if you haven’t, don’t worry; It’s not a huge stretch, considering many students finish APCS by sophomore year. If you need somewhere to start, Isaac Lyman’s hilariously written When You Finish Reading this You’ll Learn How to Code is a good entry point. From there, find a simple Python course online or enroll at your local community college for a semester to get down the basics.

I will continue on with the assumption that you have learned Java in Intro to CS or APCS at your high school, a very common foundational point for students.

2. Learn Python For Data Science

Now with programming (likely Java) as a foundation, Python, an essential language for creating ML projects, will be quite a bit easier to pick up!

This website that MIT uses for their Beaver Works Summer Institute for high schoolers, which specifically focuses on’ “STEM applications [of Python like] data analysis, machine learning, numerical work etc,” is a perfect resource. Just reading through the first two modules will give you a solid-enough understanding of the language in an easy-to-comprehend format, without forcing you to understand irrelevant concepts to ML.

Extra: You can also enroll in their corresponding course through the BWSI website for multiple choice and practice problems if you’d like to go further, as well as get the chance to attend their summer program for junior year! (The course was a great learning opportunity for me, although I ended up not going to their Cog*Works summer camp because of a conflict with another program; If you’re interested, check it out!)

3. Familiarize Yourself with These Specific Libraries

With ML and NLP, understanding the pandas library beforehand, which allows you to easily manipulate data, is extremely helpful. This YouTube series sums it up quite nicely (Although if you truly are cut for time, you can skip this step and spend more time exploring later.) Reading a bit about graphing (matplotlib, seaborn), which you can easily find with a simple Google Search, will also make the experience much more fun.

4. Stay Curious!

Arguably the most important prerequisite from here-on-out is a curious mindset; Nothing is more important than a simple willingness to dig and learn. Assuming you already have an APCS-equivalent background knowledge, the above can be finished in one week if you decide to be ambitious; I have intentionally made it so because I totally empathize with the constraints of a busy high school lifestyle!

Thus, as I will soon elaborate below, a lot of what you may eventually discover as a high schooler comes, NOT from courses, but from your own digging on Stack Overflow, YouTube, and Google. You truly learn so much more deeply that way. Ready? Read on!

Quick Readings & Software Installations

There are many great courses like Andrew Ng’s famous Stanford ML course, so if that’s your learning method, go ahead! But as I want this particular article to be an exploration guide you can complete relatively quickly to evaluate your interest in the field and perhaps complete a mini-project, rather than an official dive into the field, I advocate for a more hands-on, throw-you-in-the-middle-of-the-sea kind of approach. If you do find the resources below interesting and would like to spend more time, then by all means, take all the amazing open-source classes online! But here’s a way to get an idea of the cool things you can do with little background knowledge.

1. READ about the impact of ML and NLP Today:

A bit out-of-order? I know, I know. We all want to get directly into project time, but in my opinion, the most important factor for starting something is determining not just “what” you will do, but “why” you are doing it. As such, exploring a bit of the following sites will give you a deeper understanding of the real-world impact. (And hey, it’s a motivation boost, so why not?)

  • SpeechTek Magazine: Oh my gosh, I literally can’t contain my excitement! There is no better way to learn about all the epic things natural language processing can do in today’s world, from helping doctors to increasing the efficacy of marketing strategies and giving access to people with disabilities around the world, then this up-to-date, compilation of all the latest, breakthrough NLP technologies. The minute I stumbled across this I couldn’t stop reading for hours, given my journalist-mindset (we sure do a lot of poking around), and I sure hope it has the same effect on you.
  • Google: Is this self-explanatory and completely unnecessary to write down? Yes. Do I still promote it here? Yes! If SpeechTek Mag is not the motivator for you, go out there and do a bit of your own research on why NLP is so important! It’ll be a wild journey, but totally worth it.
Arguably the most important prerequisite from here-on-out is a curious mindset; Nothing is more important than a simple willingness to dig and learn.

2. Broad Overview of Conceptual Concepts in ML and NLP

Schoolgirl excitement aside, now that you are pumped up and ready to go, here are some readings that will give you an easy-to-digest overview of ML and NLP, including technical terms, pipelines, and other tools that are essential. (30 Min Max, No Coding Yet!)

A) Overview of ML:

  • Machine Learning, Explained by MIT Sloan, which gives a big-picture idea of what ML is, the three subcategories of ML (supervised, unsupervised, reinforcement learning), other AI subfields, and its applicability.
  • ML For Dummies, which explains exactly how ML works underneath, a more detailed explanation of the three subcategories, key terms (i.e. training/validation/testing), and potential biases.

B) Overview of NLP:

3. Software Installations

Now with most of the conceptual big-picture readings out of the way, it’s time to install the actual environments you will be using.

  • Option 1: Install Jupyter Notebook, a web application for creating simple, interactive code (much more fun than the usual boring IDE). You can use it locally whenever you want, which means you don’t have to connect to the internet! To install and learn how to use jupyter notebook, refer to this, this, and this.
  • Option 2: Use Google Colab, a cloud-based version of Jupyter Notebook from Google, which makes it super easy to share your code with others, just like sharing a google doc. To learn how to use Google Colab, refer to this.

Project Time! (NLP & ML)

Still with me? Go you and your epic attention span!

Arguably the most exciting part of this exploration guide is the opportunity to get our hands dirty with some fun starting code and wait-for-it…mini projects! And the wait is done. Here we go whooo!

1. Baseline Coding Guides to Work Through

Ventsislav Yordanov has created an absolutely phenomenal, easy-to-follow series of articles that will walk you through both the concepts of various aspects of the NLP pipeline, from exploratory analysis to pre-processing, and the actual code. These simple, slick guides are ones I highly recommend you work through yourself, in the following order:

2. A Comprehensive NLP Project

Almost no series can get more praise from me than this fantastic, open-source series of five video lectures from Women Who Code along with their corresponding open source project code, slideshows, and resources which walk you through the ENTIRE process of creating an NLP project, from a basic intro of NLP concepts and workflow to a detailed step-by-step process of exploratory analysis, pre-processing, various methods of encoding, and building and evaluating a model.

Whaat? Yes, you heard me right.

Finding this on YouTube was like me being a leprechaun who hit a jackpot on St. Patrick’s Day, and realizing it wasn’t just a pot of gold coins I found but a huge pot of Spicy Hot Pot that never runs out. (Whaaat? If you’re stranded on an island, hot pot is so much better.) But I digress…there’s almost no better deal you can get if you want to create a mini project with guidance than this bad boy.

TL;DR? Let me link the videos and code again to emphasize how awesome I found this resource to be! (Passive aggressiveness for the win >:))

3. Further ML Guides

If you’re still hungry for another taste after #2, here, here, and here are some additional guides and resources where you can tackle some projects with datasets available from open repositories. There are also many of them online, so by all means, go on a treasure hunt! Remind you of a recurring motif?

Arguably the most important prerequisite from here-on-out is a curious mindset; Nothing is more important than a simple willingness to dig and learn.

North American Computational Linguistics Open (Formerly Olympiad)

Yay, you’ve completed your first project! Now, as an exploratory guide directed mainly at high schoolers, this wouldn’t be complete without mentioning the North American Computational Linguistics Open (NACLO), one of the official olympiads in the USA (along with AMC for math, USACO for CS etc.) — except that this olympiad specifically focuses on linguistics puzzles with an emphasis on computational linguistics!

The stellar thing about NACLO is that it’s specifically targeted at high schoolers just like the other olympiads, so they assume you know absolutely nothing about CS or linguistics (As a NACLO semifinalist, I can attest to that; I did their puzzles with essentially no background knowledge.) In other words, you can purely solve their puzzles using logic while discovering fascinating things about NLP at the same time. Here are a few of my favorite ones pertaining to natural language processing and machine learning:

P.S. If you’re interested, NACLO also has many puzzles on the field of linguistics itself, and they host a competition every year in late January that is free for high schoolers to register for. Top 10% of competitors in the USA and Canada qualify for the invitational round, and if you do exceptionally well there? You get to participate in the International Linguistics Olympiad (ILO)!

Photo by Brett Garwood on Unsplash

Other Resources

Congrats for making it to the end!

Before I conclude with this exploratory guide, if you would like to go deeper, I’d totally recommend this and this guide written by other insanely talented high schoolers who have some really good tips on how to get started with AI/ML while still in secondary education (none of whom I know personally, but I have referred to their articles on the web in the past as I was trying to learn more about the field myself.)

Conclusion

I found all the resources above online in the midst of trying out my own projects by myself and in conjunction with programs like University of Santa Cruz’s Science Internship Program (UCSC SIP, highly recommend!) because I love exploring and digging, and it’s brought me to unimaginably awesome places. From YouTube industry deep-dives to amazing resources created by other high schoolers on how to learn, the possibilities are endless — and it’s totally approachable for teensy teens! (See the alliteration there?)

So what are you waiting for? Go explore and let me know how it goes! If you’d like to learn a bit more about who I am, read a little about me here or connect with me here. And lastly, if you have any feedback or suggestions, I’d love to hear them in the comments! ❤

One last thing: the resources I provide in this guide are all open-source, available resources on the web that have helped me tremendously. I am not affiliated with any of them, nor do I claim to take credit for their work. Keep exploring!

Arguably the most important prerequisite from here-on-out is a curious mindset; Nothing is more important than a simple willingness to dig and learn.

Hehe I really do love that motto, eh?


How to Explore Machine Learning and Natural Language Processing as a High School Student was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/8pZAeIV
via RiYo Analytics

No comments

Latest Articles