Introduction to Fuzzy Logic and its Application to Text Summarization

Published in

Becoming Human: Artificial Intelligence Magazine

9 min readAug 27, 2018

Authors:- Anukarsh Singh and Divyanshu Daiya.

In the last blog post, we talked about the know-hows of text summarization. So if you find yourself floundering over the idea about what text summarization is, you might want to head back and give that a read. :)

In this blog post, I will try to throw some light on a particular method of extracting a summary from text: Using Type-1 Fuzzy logic for Text Summarization. Before we dive into the working and implementation part of this method, let us first understand the basics of it.

General Workflow of a Fuzzy Logic Model (Pic Courtesy: Tutorialspoint)

I’m sure most of us know how computers work under the hood. They all use binary logic (yes, that 0 and 1 thing!)for various types of tasks and computations. Contrary to that, we humans do not use such types of logic for taking decisions in our day-to-day life. Let me try to elaborate what I am trying to convey using an analogy.

Imagine you have a cup of coffee kept over the table in front of you. The cup of coffee might be hot or cold for your liking. But we are not limited to just these two ‘levels’ when describing the hotness or coldness of coffee. We might very well say that the coffee is “too hot” ,“too cold” ,“mildly hot” , “ mildly cold”, or “just perfect”. So, there you have it! We humans are not limited to just two levels of logic for understanding the world around us. There are many! So, what do we call this “multi-level” logic? It is aptly named as “Fuzzy Logic”.

This cartoon perfectly describes the fuzzy nature of our day to day decisions regarding hotness/coldness of tea! (Courtesy: me.me)

Fuzzy Set vs Classical Set

We all are very familiar with classical sets(or just ‘sets’), aren’t we? For a classical set, an element is either in a classical set or it isn’t. For example consider the set {1,2,0}. We can confidently say that the element ‘1’ is exists within the set. What about the element ‘10’? We can also clearly say that it does not exist in the set. So as you can see, there is a pretty clear principle behind the membership of an element or classic set. Simply put, we can say that if an element is a part of classical set, then the “membership” of that element is said to be 1. For an element not in the classical set, the “membership” is said to be zero.

The “membership” of an element, in formal terms, can be represented by a function, and is called(unsurprisingly) the membership function.In other words, we can say that the membership function of a classic set consists of two values, 1 and 0.

What about a fuzzy set? Well, things aren’t that straightforward there.(Well then, is fuzzy logic doomed? Well, just hear me out before you conclude this!)

A fuzzy set can be described as a set for which elements do not have the simple property of either being in the set or out of it. An element can be partially in the set also! (This is what the “partial truth” in the wikipedia definition meant. See, I told you I would come back to it.) So, the membership function in the case of fuzzy is logic is not a simple set consisting of 1 and 0.

Diagram juxtaposing the difference between traditional(classical) logic and fuzzy logic.(Pic courtesy: ResearchGate)

Membership Function of Fuzzy Logic

For fuzzy logic, the membership function is continuous between 0 and 1( i.e. it can take any real value between 0 and 1). The former denotes that the element is not a part of the fuzzy set whereas the latter denotes that the element completely belongs to the fuzzy set. Any other value between 0 and 1 denotes that the element is partially in the set.

The membership function, in the case of fuzzy logic, represents the degree of truth.

Diagram depicting membership function for temperature (Pic Courtesy: cds.caltech.edu)

Types of Fuzzy Logic

Till now, we have discussed Fuzzy Logic and it’s membership function in detail. Now, let us discuss the types of Fuzzy Logic. There are basically 2 types of Fuzzy Logic:-

Type-1 Fuzzy Logic(T1 FL)
Type-2 Fuzzy Logic(T2 FL)

Since in this article, we are only concerned with using T1 FL ,we will skip over the details of T2 FL. Note that the membership function we have been talking about till now was for T1 FS.

Extractive Text Summarization Using Fuzzy Logic

After covering the prerequisites, let us now discuss an important application of Fuzzy Logic: Text Summarization.

As discussed in my previous article , Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. One of the ways of achieving this is by using type-1 fuzzy logic. Let’s begin!

Dataset Used

We used the ever-so-popular DUC-2002 dataset to for our summarizer. The reason why we used this instead of the later versions of DUC is because most of the summarizers which used fuzzy logic have been benchmarked on this dataset. So it was appropriate for comparing the performance of our summarizer with other implementations. We used 125 test documents to test our implementation.

Preprocessing Steps

The data needs to go through some pre-processing steps in order to extract the important features.(I have discussed these steps in my previous article) The following preprocessing steps were used:-

Sentence Segmentation
Tokenization
Removing stop words
Word Stemming

Features

The features extracted from the document were used as inputs to our Type-1 Fuzzy System. They were:-

TF/IDF scores: We establish the relevance of a given word in a the document, by using frequency of word in the whole corpus wrt frequency in the given document. The words unique to the document will have more score. Using the scores for word in the sentence we generate sentence scores (by averaging scores of all words in sentence) and use that as feature
Pivot Distance: Using GloVe, we vectorize a sentence by taking average of all Glove vectors corresponding to words in that sentence. We then take the mean of all sentence vectors as pivot for document, and use the distance of each sentence vector from the pivot as Pivot distance for the sentence and use that as a feature.
Sentence Localization: We assign scores to sentences based on their location in the document. Sentences in the beginning or towards the end of the document tens to carry much useful information.
Sentence Length: More the length of a sentence, higher the corresponding score of the sentence.
Nouns: More nouns the sentence contains, higher is the corresponding score.
Cosine Similarity: We calculate the cosine similarity of each sentence pair, and then define the sentence score as Score= sum(all pair including that sentence)/ max(over all sentences,sum(all pair including that sentence))
Pivot Distance_1: Using the scores for words generated using TF/IDF, we generate sentence vector for each sentence by using word scores for words in sentence. We then proceed in similar way as for Pivot Distance.
Numbers: Sentences containing numerical data are assigned higher score.

We experimented with the following parameters

1. Choosing a set of 6 relevant features for every experiment.

2. Thresholds of Membership Functions (for Features we mostly used Gaussian Membership functions).

3. Hand crafting rules to capture better relations among features.

4. For output we used Triangular Membership Function.

5. Generated final score([0,1]) for sentence using Centroid Defuzzification .

Results Using Some Standard Similarity Measures

Before discussing the performance of our model, let’s get a basic idea of the performance of some common similarity measures. The following table states the Rouge2 F-scores for the same.

*ROUGUE2 Scores of existing state of art summarization models*

Performance of our model

Our model did relatively well compared to all of the above mentioned similarity measures. The following table states the Rouge2 F-Scores of our model for different permutations of the feature vectors. We have numbered the features in the same order in which they appear in the “Features” subsection.

As inferred from the above table, our model does worse only in comparison to TextRank and Facebook’s InferSent models.(Which are state of art-models for a reason!)

Some Shortcomings of Our Model

Although our model does well when compared so the common similarity measures, following are some shortcomings of our model :-

Our model scales well to most Fuzzy Type1 Models but it lags by quite a margin against other non-fuzzy State-of-Art Summarization Models.

2. Features used for our model mostly capture statistical sense. We miss the semantic view which also corroborates the gap in performance.

3. Lately many researches have followed practice of dividing the test test into training and cross validation sets, to derive better rules. For example, one derivative used by Jefferson (Fuzzy Approach for Sentiment Analysis). Building on hand wired rules alone have a detrimental effect on performance.

Possible Improvements and extensions

Keeping the shortcomings in mind, the following are the ways which may increase the performance of our model :-

Using type-2 Fuzzy Logic to better capture the uncertainties and ambiguities of text.

2. A consensus based approach may give better performance as compared to a standalone fuzzy logic model.

3. Hand Wired rules can be further fine-tuned by training and cross validating. We might talk about the how to’s for the same in a future post.

4. More semantic features can be included to incorporate better sense from both views.