Implementing Part of Speech Tagging for English Words Using Viterbi Algorithm from Scratch

https://ift.tt/2ZCFvSM Parts of speech are useful clues to sentence structure and meaning. Here’s how we can go about identifying them. Th...

https://ift.tt/2ZCFvSM

Parts of speech are useful clues to sentence structure and meaning. Here’s how we can go about identifying them.

The complete code for this article can be found HERE

Part of speech tagging is the process of assigning a part of speech to each word in a text, this is a disambiguation task and words can have more than one possible part of speech and our goal is to find the correct tag for the situation.

We will use a classic sequence labeling algorithm, the Hidden Markov Model to demonstrate, sequence labeling is a task in which we assign to each word x1 in an input word sequence, a label y1, so the output sequence Y has the same length as the input sequence X. An HMM is a probabilistic sequence model based on augmenting the Markov chain. A Markov chain makes a very strong assumption that if we want to predict the future in the sequence, all that matters is the current state. For example, predicting the probability that I write an article next week depends on me writing an article this week and no more. You can watch the video HERE for more examples and I would say a more detailed explanation.

The Hidden Markov Model allows us to talk about observed events — words that we see in the put and hidden events — part of speech tags that we think of as casual factors in our probabilistic model.

We will be implementing our tagger using the Brown Corpus where each file contains sentences of tokenized words followed by POS tags and where each line contains a sentence. You can find the manual for our data describing the tags HERE. Note we will implement our POS tagger using a bigram HMM.

We can see a sample of our data below:

[('RB', 'manifestly'), ('VB', 'admit')]

First, let's create a function for generating n-grams.

def ngrams(self, text, n):
    n_grams = []
    for i in range(len(text)): n_grams.append(tuple(text[i: i + n]))
    return n_grams

Transition probabilities

An HMM has two components that transition probabilities A and emission probabilities B.

The transition probabilities are the probability of a tag occurring given the previous tag, for example, a verb will is most likely to be followed by another form of a verb like dance, so it will have a high probability. We can calculate this probability using the equation above, implemented below:

Here we are dividing the count of the bigram by its unigram count for each bigram we create and store it in a transition_probabilities dictionary.

Emission probabilities

The emission probabilities are the probability given a tag that it will be associated with a given word. We can calculate this probability using the equation above, implemented below:

Here we are dividing the count of the tag followed by that word by a count of the same tag and storing it in an emission_probabilities dictionary.

HMM taggers make two further simplifying assumptions. The first is that the probability of a word appearing depends only on its own tag and is independent of neighboring words and tags, second assumption, the bigram assumption, is that the probability of a tag is dependent only on the previous tag, rather than the entire tag sequence. Plugging the two assumptions results in the following equation for the most probable tag sequence from our bigram tagger:

Decoding HMM

For any model, such as an HMM that contains hidden variables — the parts of speech, the task of determining the sequence of the hidden variable corresponding to the sequence of observations is called decoding, and this is done using the Viterbi algorithm.

Viterbi algorithm

The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states — called the Viterbi path — that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM). Also, a good explanatory video can be found HERE

Viterbi decoding efficiently determines the most probable path from the exponentially many possibilities. It finds the highest probability given for a word against all our tags by looking through our transmission and emission probabilities, multiplying the probabilities, and then finding the max probability. We will define a default value of 0.0000000000001 for unknown probabilities.

We will start by calculating our initial probabilities / the start probability for that state, this is the probability that the word started the sentence, in our case we used a “START” token

def initial_probabilities(self, tag):
    return self.transition_probabilities["START", tag]

Testing

To test our solution we will use a sentence already broken into words, as below:

test_sent = ["We",
            "have",
            "learned",
            "much",
            "about",
            "interstellar",
            "drives",
            "since",
            "a",
            "hundred",
            "years",
            "ago",
            "that",
            "is",
            "all",
            "I",
            "can",
            "tell",
            "you",
            "about",
            "them",
            ]

cleaned_test_sent = [self.clean(w) for w in test_sent]
print(self.vertibi(cleaned_test_sent, all_tags))

Our result i:

we,PPSS
have,HV-HL
learned,VBN
much,AP-TL
about,RB
interstellar,JJ-HL
drives,NNS
since,IN
a,AT
hundred,CD
years,NNS
ago,RB
that,CS
is,BEZ-NC
all,QL
i,PPSS
can,MD
tell,VB-NC
you,PPO-NC
about,RP
them,DTS

This is correct based on our documentation.

I look forward to hearing feedback or questions.

Implementing Part of Speech Tagging for English Words Using Viterbi Algorithm from Scratch was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Towards Data Science - Medium https://ift.tt/3vUDayk
via RiYo Analytics

Page Nav

Ads Place

Implementing Part of Speech Tagging for English Words Using Viterbi Algorithm from Scratch

https://ift.tt/2ZCFvSM Parts of speech are useful clues to sentence structure and meaning. Here’s how we can go about identifying them. Th...

Parts of speech are useful clues to sentence structure and meaning. Here’s how we can go about identifying them.

Transition probabilities

Emission probabilities

Decoding HMM

Viterbi algorithm

Testing

Related Posts

No comments

Connect WIth Us

Top of the month

MindsDB wants to give enterprise databases a brain

Symbl.ai, provider of conversational intelligence APIs and tools, gets $17M

Sora vs Veo 2: Which One Creates More Realistic Videos?

Why Data Scientists Should use Jupyter Notebooks with Moderation ?

Latest Posts

Cloud Labels

Search This Blog

Report Abuse

Contributors

Happy To Help You

Popular Tag

Latest Articles

Should you recalibrate your AI roadmap post changes in OpenAI ?

Web Scraping with LLMs

Why You Should Use Callbacks in TensorFlow 2

Vital Commands To Empower Your Git Workflow

Popular Posts

Spider-Man: No Way Home Torrents May Contain Crypto Malware, Cybersecurity Firm Warns

Onecoin Victims Petition Bulgaria for Seizure of Assets and Compensation

3air Leverages Blockchain Technology to Deliver Extensive Broadband Connectivity in Africa

AI Applications for Border Transportation