Saturday, April 25, 2020

Implementation of Text summarizer using NLP !

Hello everyone !!

In this post I will explain how I I implemented Text summarizer  using NLP techniques.
But before learning implementation process let's learn its core algorithm i.e Text Rank algorithm on which basis this technique works.

Pagerank algorithm
 Before getting started with the TextRank algorithm, there is another algorithm which we should know - The Pagerank algorithm.In fact this actually inspired Textrank. Here are the fundamentals of it.





Here user probability of visiting from one page to another is calculated and is stored in a matrix.This probability is called as a page rank score which is given by 1/no of web page link it contains.

From Pagerank to Textrank:

  • The first step would be to concatenate all the text contained in the articles
  • Then split the text into individual sentences
  • In the next step, we will find vector representation (word embedding) for each and every sentence
  • Similarities between sentence vectors are then calculated and stored in a matrix
  • The similarity matrix is then converted into a graph, with sentences as vertices and similarity scores as edges, for sentence rank calculation
  • Finally, a certain number of top-ranked sentences form the final summary.
Actual implementation of text summarizer:
1. Import required libraries.
2. Read the data.
3. Split Text into Sentences
 We can use the sent_tokenize( ) function of the NLTK library to do this.
4. Use Glove word embedding or Bag of words.
Glove word embedding are vector representation of words. These word embedding will be used to create vectors for our sentences. We can use the Bag-of-Words or TF-IDF approaches to create features for our sentences, but these methods ignore the order of the words (and the number of features is usually pretty large).
5. Text Preprocessing
 So,  some basic text cleaning is required. Get rid of the stop words (commonly used words of a language – is, am, the, of, in, etc.) present in the sentences
6.Vector Representation of Sentences
7. Similarity Matrix Preparation
The next step is to find similarities between the sentences, and we can use the cosine similarity approach for this challenge.
8. Applying Text Rank Algorithm.
9. Summary Extraction
Finally your summarized report is ready.

Conclusion:

As a result, we are able to summarize text automatically. No need of manual classifier. People who are can keep updated themselves in short time. We can manage to read important news articles in our busy schedule. Inshorts is an innovative news app that converts news articles into a 60-word summary. Most of the students have habit of study before exams. But of course a vast syllabus can not be covered in one night manually. Text summarizer is best option for those students. Students will be able to get meaningful study notes. Our research paper project report can be summarized. 

Text summarizer can be used in fields like Media monitoring ,news letters, social media marketing, question answering and boots, programming language.

Thanks for reading !!
If you have any doubt comment me below.
                                                                                                             By Ashwini Ghode

8 comments:

Components of NLP

Hello everyone!! In this post I am  going to explain what is NLU and NLG and how three of them works together also techniques used in NLP....