Page Nav

HIDE

Breaking News:

latest

Ads Place

Introduction to Applied Linear Algebra: Norms & Distances

https://ift.tt/3ot1yVT Photo of Yan Krukov from  Pexels Goal: This article gives an introduction to vector norms, vector distances and...

https://ift.tt/3ot1yVT
Photo of Yan Krukov from Pexels

Goal: This article gives an introduction to vector norms, vector distances and their application in the field of data science

Why you should learn it: Vector norms and distances are used to describe attributes of vectors and the relationship of different vectors to each other. It is widely used in machine learning techniques such as clustering.

Table of Contents

  • What is a norm?
  • Distance
  • Examples using Distance
  • Clustering

What is a Norm?

To understand what norms of a vectors are let us recall that vectors are an ordered finite list of numbers like this:

The vector x in this example has two elements, therefore we can easily plot the vector in a 2D-Plane, as follows:

In the above plot the first element of the vector corresponds to the x-value and the second element of the vector corresponds to the y-value. Nice to know what the elements of the vector correspond to, but what are the following attributes of a vector?

As you can see in the plot a vector is further characterized by its norm, which is the distance of the vector from the origin at x,y = 0, and it’s angle. The norm is calculated like this:

In Python you can calculate the norm like this:

A vector is small if its norm is a small number, and it is large if its norm is a large number. (The numerical values of the norm that qualify for small or large depend on the particular application and context.)

For completeness, the angle θ is calculated as:

Looking at the plot above 53 degrees makes sense as we are slightly higher than the bisector (which is at 45 degrees). But nevertheless we will not focus on the angle of a vector as this article is about norms and distances.

Distance

We can use the norm to define the Euclidean distance between two vectors a and b as the norm of their difference:

Data Science examples using distances:

  • Feature distance: If vectors represent features of two objects, we can calculate the distance, as defined above, to get the feature distance, which is a measure of how different the objects are. For a concrete example, suppose we are in midst of a pandemic situation (say Covid19), and we have vectors associated with patients in a hospital, with entries such as weight, age, presence of chest pain, diffculty breathing, and the results of virus tests. We can use the feature vector distance to tell whether one patient case is similar to another one (at least in terms of their features).
  • Document dissimilarity: Suppose we have two vectors with the histograms (frequencies) of word occurrences for two documents. Then the distance of the two vectors represents a measure of the dissimilarity of the two documents. We might expect less distance if we look at documents from the same genre or author, and more distance if we are comparing different authors and genres.

Clustering

The calculation of vector distances is crucial if you as a Data Scientist want to solve a clustering problem. Here, you are given data, which in this case would be multiple vectors corresponding to features of an entity of interest (e.g. patient in a hospital). If you have a dataset of vectors with two features, you can visualise your data as a scatterplot where each data point corresponds to a vector:

In clustering the goal is to group the vectors in groups, such that the vectors within a group are close to each other, say minimizing the distance to each other, like this:

The clustering algorithm used in this example is called k-means clustering and is fundamentally based on the calculations of vector distances as you learned in this article.

This is it for today, thank you very much for reading! Follow me if you want to be in the loop for future articles and leave a clap if you enjoyed the article!


Introduction to Applied Linear Algebra: Norms & Distances was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/3rEfkae
via RiYo Analytics

No comments

Latest Articles