NLP and Text summarizer

Wednesday, November 4, 2020

Components of NLP

Hello everyone!!

In this post I am going to explain what is NLU and NLG and how three of them works together also techniques used in NLP.

1. NLU - Natural Language Understanding is a smaller part of natural language processing. Once the language has been broken down, it’s time for the program to understand, find meaning, and even perform sentiment analysis.It mainly includes following steps:

Mapping the given input in natural language into some useful representation.
Analyzing different aspects of language.

NLU understand every meaning behind sentence formation. It is quite possible that the same text has different meaning whereas different word has same meaning. Also most of the times meaning behind the text changes according to the situation. It uses sentiment detection at such cases then.

2. NLG - Natural Language Generation is what happens when computer write language. NLG process turn structured data into text.

It mainly includes following steps:

Text planning - It includes retrieving the relevant content from knowledge base.
Sentence planning - It includes some steps such as choosing required words , forming meaningful phrases and also setting tone of the sentence.
Text realization - It is used for mapping meaningful sentences.

How NLP , NLU and NLG work in hand in hand ?

As u can see in figure , first the input is given and then with the help of automatic speech recognition software speech is converted into text and then passes to NLU. NLU understand the text and convert into structured data. It passes this as input to NLG which turns structured data into text and write information in human language and we get output.

Techniques in NLP

1. Bag of words - It allows you to count no of words in text by creating an occurrence matrix for the sentences.

2.Tokenization - Process of dividing text into set of meaningful pieces called tokens. Also it removes punctuation too.

3. Stop words removal -It includes getting rid of human language articles ,pronouns ,prepositions as "and","the" ,"or" in English.

4. Stemming - It is the process of removing common affixes i.e cutting end or beginning of word.

5 Lemmatization - It has intentions to reduce a word to its base form i.e root form and also combines different combines forms of word.

6 Part of speech - Grammatical type of words is referred to as POS tags. It indicates how a word functions in meaning as well as grammatically into the sentence.

7. Named entity recognition - A word can have more than one POS . So it removes such kinds of ambiguity. It includes noun phase identification , phrase identification, entity disambiguation.

Thanks for reading my article.
Stay tuned for next aspects of NLP.
By Ashwini Ghode

Tuesday, April 28, 2020

Text summarizer and its Types !

Hello everyone!!

In this post I am going to explain how can we manage to summarize lengthy news , comprehensive reports and study notes using text summarizer.

In in our busy schedule we don’t get time to read full newspaper. All we collect news by hearing summaries from other people. But is not always possible to manually summarize the news or other text.There is an enormous amount of textual material, and it is only growing every single day.

Solution – Implementation of Text Summarizer using NLP.

This is where the awesome concept of Text Summarization using machine learning helped us out. It solves the one issue which kept bothering us before – now our model can understand the context of the entire text. It is dream come true for those who cant live without reading newspaper and for those students who study before exams.

Lets first understand what text summarization is before how it works. So here is the definition given .

"The ideal of automatic summarization work is to develop techniques by which a machine can generate summarize that successfully imitate summaries generated by human beings."

— Page 2, Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding, 2014.

Types of summarizer -
Text summarizer can be divided based on many factors.But here are two summarizers based on output of summarization.

Extraction based Summarization

The name gives away what this approach does. In this we identify what sentences are most important and show most of information.Those extracted sentences would be our summary. The below diagram illustrates extractive summarization:

Abstraction-based summarization

In this NLU as well as NLG both works. We try to generate new sentences from the original text bu understanding it. In this sentences are generated are not actually present in the actual document. They are just summarized version of whole text. So it is more preferable than extraction based summarizer.

Thank you guys for reading !!
Stay tuned for my next article on how to implement this text summarizer.
If you have any doubts you can comment below.
- By Ashwini Ghode

Saturday, April 25, 2020

Implementation of Text summarizer using NLP !

Hello everyone !!

In this post I will explain how I I implemented Text summarizer using NLP techniques.

But before learning implementation process let's learn its core algorithm i.e Text Rank algorithm on which basis this technique works.

Pagerank algorithm
Before getting started with the TextRank algorithm, there is another algorithm which we should know - The Pagerank algorithm.In fact this actually inspired Textrank. Here are the fundamentals of it.

Here user probability of visiting from one page to another is calculated and is stored in a matrix.This probability is called as a page rank score which is given by 1/no of web page link it contains.

From Pagerank to Textrank:

The first step would be to concatenate all the text contained in the articles
Then split the text into individual sentences
In the next step, we will find vector representation (word embedding) for each and every sentence
Similarities between sentence vectors are then calculated and stored in a matrix
The similarity matrix is then converted into a graph, with sentences as vertices and similarity scores as edges, for sentence rank calculation
Finally, a certain number of top-ranked sentences form the final summary.

Actual implementation of text summarizer:

1. Import required libraries.

2. Read the data.

3. Split Text into Sentences

We can use the sent_tokenize( ) function of the NLTK library to do this.

4. Use Glove word embedding or Bag of words.

Glove word embedding are vector representation of words. These word embedding will be used to create vectors for our sentences. We can use the Bag-of-Words or TF-IDF approaches to create features for our sentences, but these methods ignore the order of the words (and the number of features is usually pretty large).

5. Text Preprocessing

So, some basic text cleaning is required. Get rid of the stop words (commonly used words of a language – is, am, the, of, in, etc.) present in the sentences

6.Vector Representation of Sentences

7. Similarity Matrix Preparation

The next step is to find similarities between the sentences, and we can use the cosine similarity approach for this challenge.

8. Applying Text Rank Algorithm.

9. Summary Extraction

Finally your summarized report is ready.

Conclusion:

As a result, we are able to summarize text automatically. No need of manual classifier. People who are can keep updated themselves in short time. We can manage to read important news articles in our busy schedule. Inshorts is an innovative news app that converts news articles into a 60-word summary. Most of the students have habit of study before exams. But of course a vast syllabus can not be covered in one night manually. Text summarizer is best option for those students. Students will be able to get meaningful study notes. Our research paper project report can be summarized.

Text summarizer can be used in fields like Media monitoring ,news letters, social media marketing, question answering and boots, programming language.

Thanks for reading !!

If you have any doubt comment me below.

By Ashwini Ghode

Friday, April 10, 2020

Open Problems in NLP

The study of Natural Language Processing(NLP) started somewhere in the 1950s. Ever since there has been a lot of advancement in this field. It has been into a cycle of regular evolution which has caused it to be a very powerful weapon in the world of Artificial Intelligence.

Despite so much of achievement so far, there are still a lot of limitations/Problems NLP has. Natural Language Understanding is one of the areas in which NLP is lagging behind the most. Despite so much of advancement, there is no NLP model that can excel like human beings. Today, In this post, we are going to discuss some of them.

1. Ambiguity :

One of the biggest challenges is to understand the meaning of an Ambiguous statement(statement open to different interpretations). NLP may a times do not handle Ambiguous statement nicely.

For eg: "John went to the bank", here the word 'bank' an either refer to where the money is kept or it could be a river bank.

2. Synonymy:

The same ideas can often be constructed into sentences in which one word could be replaced by its synonym without really changing the meaning of the text. But NLP some times fail to understand the fact that in some cases sysnyms can also change the meaning of the whole text.

For eg: 'large' and 'big.' are synoyms. "He is my big brother" cannot be replaced by "He is my Large brother".

3. Intention/Sarcasm :

I am honestly quite skeptical about NLP models understanding the level of sarcasm which is quite common to Humans. It is one of the limitations of NLP. People might sarcastically criticize a product and the Model might interpret in a different way.

4. Language-Resource unavailability:

Even though there are thousands of Languaguages spoken World-wide but there is hardly any data resources available in other languages other than English and Chinese.

So, because of this problem, it might be quite fair to say that NLP is powerful only for Languages like English and Chinese only.

Now, we have come to the end of the article. Feel free to ask any questions in the comment section below.

Thursday, April 9, 2020

Lexical Analysis & Syntactic Analysis(Parsing)

In this post, I'm going to discuss two of the most important steps which need to be implemented whenever we deal with NLP. They are Lexical Analysis and Syntactic Analysis.

Lexical Analysis:

It basically is a stage in Natural Language Processing(NLP) where the given raw text is divided/segmented into different chunks of words or other units like paragraphs or sentences.

It involves identifying and analyzing the structure of words. Lexicon of a language basically means the collection of words and phrases in a language. It is also known as Morphological Analysis.

For eg:

" Tom owns an iPhone" could be tokenized into :

Tom

owns

iPhone.

Syntactic Analysis(Parsing):

In simple words, this step does nothing but acts as a grammar checker. It analyses the words in the given sentence for grammar and also checks if the arrangement of different words are in a certain order which satisfies the relationship among the word.

What it basically does is it checks the basic grammar of the sentence.

For eg:

"The College goes to boy"

This given sentence is Syntactically wrong.

It identifies and analyzes the structure of words.

Lexicon of a language means the collection of words and phrases in a language

Discourse integration and pragmatic analysis

In this blog,I am going to explain about discourse integration and pragmatic analysis.
Discourse integration
Discourse integration is considered as the larger context for any smaller part of NL structure.

NL is so complex and, most of the time, sequences of text are dependent on prior discourse.

This concept occurs often in pragmatic ambiguity. This analysis deals with how the immediately preceding sentence can affect the meaning and interpretation of the next sentence.

Concept of discourse integration often used in NLG(Natural Language Generation) applications. Chatbots, which are developed to deliver generalized AI. In this kind of application, deep learning has been used.

Pragmatic Analysis
Pragmatics is the study of how words are used, or the study of signs and symbols.

A example of pragmatics is how the same word can have different meanings in different settings.

Pragmatic Analysis is part of the process of extracting information from text.

Specifically, it's the portion that focuses on taking structures set of text and figuring out what the actual meaning was.

Pragmatic analysis refers to a set of linguistic and logical tools with which analysts develop systematic accounts of discursive political interactions.

Sentiment Analysis

Sentiment analysis is that the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis allows businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.
With the recent advances in deep learning, the pliability of algorithms to analyse text has improved considerably. Creative use of advanced computing techniques could also be an honest tool for doing in-depth research. We believe it is important to classify incoming customer conversation a pair of name supported following:
1. Key aspects of a brand’s product and repair that customers care about.
2. Users’ underlying intentions and reactions concerning those aspects.
Also sentiment analysis is most common text classification tool that analyses incoming messages, social posts, comments on forum, etc. which is believed as Intent Analysis Or Profanity Analysis

What does it do ?

Sentiment analysis model detects the polarity within a text (positive or negative), understanding people's emotions is incredibly important for any business, since users can express them themselves in reviews more freely than ever.
For example : A owner of a business used sentiment analysis on the reviews given by the purchasers and located that the bulk of the purchasers were happy by his product, as you will see within the image below.

Now look at the figure below, as you can see it can also tell the emotion attached to comments by analyzing the sentences

Types of Sentiment Analysis

If the polarity is very important to the owner, then most of the emotions should be like
1. Very Good

2. Good

3. Neutral

4. Bad

5. Very Bad.

Here, Very Good = 10 points, and Very bad = 1 point

This same kind is used in one of our college feedback forms, which is pretty good, as the it directly tells the guardians and faculty member weather we are happy or not...

How does this work?

so you would possibly be wondering by now that how did this work?
The process is pretty simple as followed :
1. Break each text document down into its component parts (sentences, phrases, tokens and parts of speech) (Bag of Words)
2. Identify each sentiment-bearing phrase and component (Dictionary meaning and in context meaning) (Lemmatisation)
3. Assign a sentiment score to every phrase and component (-1 to +1)(Can be of any range as an example above is 0 to 10)
4. Optional: Combine scores for multi-layered sentiment analysis. (For system which has multiple output like "Very Good", and "Good"

Based on these points train the model. There will be 3 types on this :
1. Rule-based systems that perform sentiment analysis supported a group of manually crafted rules.
2. Automatic systems that depend upon machine learning techniques to be told from data.
3. Hybrid systems that combine both rule-based and automatic approaches.

Where do we use it?

1. We can use it to detect the customers reviews in any business

2. As for now majority of the social site like facebook and twitter use this to check the person behavior on the site.

For now there are many database available online in which you can some of your own words and train it. There are ready made pre-trained open source models on GITHUB, which you can use for your project.

Thank you for reading this.

If you got any doubts? please feel free to ask in the comments below. we will get back to you asap

By Kapil Kadadas

NLP and Text summarizer

Wednesday, November 4, 2020

Components of NLP

How NLP , NLU and NLG work in hand in hand ?

Techniques in NLP

Tuesday, April 28, 2020

Text summarizer and its Types !

Saturday, April 25, 2020

Implementation of Text summarizer using NLP !

From Pagerank to Textrank:

Friday, April 10, 2020

Open Problems in NLP

Thursday, April 9, 2020

Lexical Analysis & Syntactic Analysis(Parsing)

Discourse integration and pragmatic analysis

Sentiment Analysis

Sentiment Analysis

What does it do ?

Types of Sentiment Analysis

How does this work?

Where do we use it?

Components of NLP

Report Abuse

Labels