Thursday, April 9, 2020

Topic Modeliing

Hello Guys, here we are back again with a new topic known as Topic Modelling

Topic Modelling

Basically Topic modeling is an unsupervised machine learning technique that scans a collection of documents, detecting word and phrase patterns within them, and automatically classifying them into word groups and similar expressions that best characterize a collection of documents.
Topic models also are named as probabilistic topic models, which refers to statistical algorithms for locating the latent semantic structures of an in depth text body. within the age of data, the quantity of the writing we encounter day by day is just beyond our processing capacity. Topic models can help to arrange and offer insights for us to know large collections of unstructured text bodies.


How does it work ?

Topic modeling involves counting words and grouping similar word patterns to infer topics within unstructured data. Let’s say you’re a software company and you would like to grasp what customers are saying about particular features of your product. rather than spending hours longing plenty of feedback, in a trial to deduce which texts are talking about your topics of interest, you'll analyze them with a subject modeling algorithm.
By detecting pattern of such words, and finding a pattern in them like frequency of the words, etc the subject model cluster feedback which is analogous and words and phrases that appear the foremost. Also as this can be an unsupervised technique there's no training required.
For example :
"The nice thing about Eventbrite is that it’s absolve to use as long as you’re not charging for the event. there's a fee if you're charging for the event –  2.5% plus a $0.99 transaction fee"
by taking the aboce sentence, and identifing the words and phrases like absolve to use, chargings, fees, $0.99 and lots of such things, Topic Model can group such reviews with many other review which can or might not speak about the costs.
















Where is Topic Modelling used?

Mostly used in NLP.
1. To sort and filter the reviews about a product according to the user preferences
2. To highlight the importants points in a document.

Algorithms:

In practice researchers attempt to fit appropriate model parameters to the data corpus using one of several heuristics for maximum likelihood fit. A recent survey by Blei describes this suite of algorithms. Several groups of researchers starting with Papadimitriou et al. have attempted to design algorithms with probable guarantees.

Libraries which we use in our code :

There are may libraries and software whcih the user can use, here I will mention a few of them.
1. BigARTM. site : (https://github.com/bigartm/bigartm)
2. Mallet. site :  (http://mallet.cs.umass.edu/)
3. Stanford Topic Modelling Toolkit. site : (http://nlp.stanford.edu/software/tmt/tmt-0.4/)
etc

Thank you for reading this.
if you got any doubts? please feel free to ask in the comments below. we will get back to you asap
By Kapil Kadadas


No comments:

Post a Comment

Components of NLP

Hello everyone!! In this post I am  going to explain what is NLU and NLG and how three of them works together also techniques used in NLP....