top of page

Capstone  

Automatic Text Summarization

(A Generic Text Summarizer)

getasummary_full.png

Problem Statement

Are you facing the problems of having too much CONTENT to consume but too LITTLE TIME ?

Why not just get a SUMMARY of the important points from the content.

​

​

Modeling

Three models based on different techniques were created.

1. Graph based summarization (Textrank with sentence embedding)

2. Centroid based summarization (TF-IDF)

3. Pre-trained BERT summarization

Evaluation

The generated summaries were scored using

1. BBC News dataset

2. User scoring through polls in slack channel

Model 2 (Centroid based summarization TF-IDF) performed the best in terms of summary quality and speed

Deployment

The prototype was built using python scikit-learn, natural language processing, flask and deployed on Amazon EC2 and Google kubernetes cluster.

Note. Model 3 (Pre-trained BERT summarization) was excluded due to resource limitation on EC2 free-tier

​

Sample

getasummary_sample1.png
getasummary_sample2.png

Something fun... I have created a telegram summarizer bot too! see this video and try it out yourself!

And a Slack Slash Command /summarizer!

Other Projects

Exploratory
Data Analysis

Classification

Sub-Reddit Posts

Prediction

AMES Housing Price

Prediction

WestNile Virus

logo.png

© 2019 by Guat Hwa Proudly created with Wix.com

  • Facebook Clean Grey
  • Twitter Clean Grey
  • LinkedIn Clean Grey
bottom of page