Introduction to Fuzzy Logic and its Application to Text Summarization

--

Authors:- Anukarsh Singh and Divyanshu Daiya.

In the last blog post, we talked about the know-hows of text summarization. So if you find yourself floundering over the idea about what text summarization is, you might want to head back and give that a read. :)

In this blog post, I will try to throw some light on a particular method of extracting a summary from text: Using Type-1 Fuzzy logic for Text Summarization. Before we dive into the working and implementation part of this method, let us first understand the basics of it.

General Workflow of a Fuzzy Logic Model (Pic Courtesy: Tutorialspoint)

I’m sure most of us know how computers work under the hood. They all use binary logic (yes, that 0 and 1 thing!)for various types of tasks and computations. Contrary to that, we humans do not use such types of logic for taking decisions in our day-to-day life. Let me try to elaborate what I am trying to convey using an analogy.

Imagine you have a cup of coffee kept over the table in front of you. The cup of coffee might be hot or cold for your liking. But we are not limited to just these two ‘levels’ when describing the hotness or coldness of coffee. We might very well say that the coffee is “too hot” ,“too cold” ,“mildly hot” , “ mildly cold”, or “just perfect”. So, there you have it! We humans are not limited to just two levels of logic for understanding the world around us. There are many! So, what do we call this “multi-level” logic? It is aptly named as “Fuzzy Logic”.

This cartoon perfectly describes the fuzzy nature of our day to day decisions regarding hotness/coldness of tea! (Courtesy: me.me)

Trending AI Articles:

1. Basics of Neural Network

2. AI, Machine Learning, & Deep Learning Explained in 5 Minutes

3. TensorFlow Object Detection API tutorial

4. AI & NLP Workshop

What exactly is Fuzzy Logic?

According to wikipedia:

Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false.

In the above definition, the many valued-logic is exactly what the different levels of the hotness or coldness of a coffee convey. And instead of using “words” as described in our analogy, we use real numbers between 0 and 1 to describe and differentiate the different levels. Now, before you blame me for not telling you what “partial truth” depicts in the above definition, just wait for the right moment. Trust me, I will come back to it!

Why is it called “Fuzzy Logic”?

As I said before, fuzzy logic is no misnomer. It aptly describes the “fuzzy” nature of fuzzy logic. Sounds confusing? Let me explain.

Till now, we have seen that fuzzy logic is a multi-valued logic. But we should be aware of the fact that there is no clear distinction a.k.a boundary between different levels of fuzzy logic. For instance, everyone has its own perception of “hot” and “very hot” when describing the hotness of a cup of coffee. A “hot” coffee for one person might be “very hot” for a different person and vice-versa. So there comes the “fuzziness” of fuzzy logic in play!

So, one might ask that if we cannot portray different levels of fuzzy logic clearly then how are we supposed to represent fuzzy logic and extract conclusions using it? There comes the membership function of Fuzzy Logic in the picture.

But before we move on to explain what a membership function is, let us understand what a “fuzzy set” all about.

Fuzzy Set vs Classical Set

We all are very familiar with classical sets(or just ‘sets’), aren’t we? For a classical set, an element is either in a classical set or it isn’t. For example consider the set {1,2,0}. We can confidently say that the element ‘1’ is exists within the set. What about the element ‘10’? We can also clearly say that it does not exist in the set. So as you can see, there is a pretty clear principle behind the membership of an element or classic set. Simply put, we can say that if an element is a part of classical set, then the “membership” of that element is said to be 1. For an element not in the classical set, the “membership” is said to be zero.

The “membership” of an element, in formal terms, can be represented by a function, and is called(unsurprisingly) the membership function.In other words, we can say that the membership function of a classic set consists of two values, 1 and 0.

What about a fuzzy set? Well, things aren’t that straightforward there.(Well then, is fuzzy logic doomed? Well, just hear me out before you conclude this!)

A fuzzy set can be described as a set for which elements do not have the simple property of either being in the set or out of it. An element can be partially in the set also! (This is what the “partial truth” in the wikipedia definition meant. See, I told you I would come back to it.) So, the membership function in the case of fuzzy is logic is not a simple set consisting of 1 and 0.

Diagram juxtaposing the difference between traditional(classical) logic and fuzzy logic.(Pic courtesy: ResearchGate)

Membership Function of Fuzzy Logic

For fuzzy logic, the membership function is continuous between 0 and 1( i.e. it can take any real value between 0 and 1). The former denotes that the element is not a part of the fuzzy set whereas the latter denotes that the element completely belongs to the fuzzy set. Any other value between 0 and 1 denotes that the element is partially in the set.

The membership function, in the case of fuzzy logic, represents the degree of truth.

Diagram depicting membership function for temperature (Pic Courtesy: cds.caltech.edu)

Types of Fuzzy Logic

Till now, we have discussed Fuzzy Logic and it’s membership function in detail. Now, let us discuss the types of Fuzzy Logic. There are basically 2 types of Fuzzy Logic:-

  1. Type-1 Fuzzy Logic(T1 FL)
  2. Type-2 Fuzzy Logic(T2 FL)

Since in this article, we are only concerned with using T1 FL ,we will skip over the details of T2 FL. Note that the membership function we have been talking about till now was for T1 FS.

Extractive Text Summarization Using Fuzzy Logic

After covering the prerequisites, let us now discuss an important application of Fuzzy Logic: Text Summarization.

As discussed in my previous article , Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. One of the ways of achieving this is by using type-1 fuzzy logic. Let’s begin!

Dataset Used

We used the ever-so-popular DUC-2002 dataset to for our summarizer. The reason why we used this instead of the later versions of DUC is because most of the summarizers which used fuzzy logic have been benchmarked on this dataset. So it was appropriate for comparing the performance of our summarizer with other implementations. We used 125 test documents to test our implementation.

Preprocessing Steps

The data needs to go through some pre-processing steps in order to extract the important features.(I have discussed these steps in my previous article) The following preprocessing steps were used:-

  1. Sentence Segmentation
  2. Tokenization
  3. Removing stop words
  4. Word Stemming

Features

The features extracted from the document were used as inputs to our Type-1 Fuzzy System. They were:-

  1. TF/IDF scores: We establish the relevance of a given word in a the document, by using frequency of word in the whole corpus wrt frequency in the given document. The words unique to the document will have more score. Using the scores for word in the sentence we generate sentence scores (by averaging scores of all words in sentence) and use that as feature
  2. Pivot Distance: Using GloVe, we vectorize a sentence by taking average of all Glove vectors corresponding to words in that sentence. We then take the mean of all sentence vectors as pivot for document, and use the distance of each sentence vector from the pivot as Pivot distance for the sentence and use that as a feature.
  3. Sentence Localization: We assign scores to sentences based on their location in the document. Sentences in the beginning or towards the end of the document tens to carry much useful information.
  4. Sentence Length: More the length of a sentence, higher the corresponding score of the sentence.
  5. Nouns: More nouns the sentence contains, higher is the corresponding score.
  6. Cosine Similarity: We calculate the cosine similarity of each sentence pair, and then define the sentence score as Score= sum(all pair including that sentence)/ max(over all sentences,sum(all pair including that sentence))
  7. Pivot Distance_1: Using the scores for words generated using TF/IDF, we generate sentence vector for each sentence by using word scores for words in sentence. We then proceed in similar way as for Pivot Distance.
  8. Numbers: Sentences containing numerical data are assigned higher score.

We experimented with the following parameters

1. Choosing a set of 6 relevant features for every experiment.

2. Thresholds of Membership Functions (for Features we mostly used Gaussian Membership functions).

3. Hand crafting rules to capture better relations among features.

4. For output we used Triangular Membership Function.

5. Generated final score([0,1]) for sentence using Centroid Defuzzification .

Results Using Some Standard Similarity Measures

Before discussing the performance of our model, let’s get a basic idea of the performance of some common similarity measures. The following table states the Rouge2 F-scores for the same.

ROUGUE2 Scores of existing state of art summarization models

Performance of our model

Our model did relatively well compared to all of the above mentioned similarity measures. The following table states the Rouge2 F-Scores of our model for different permutations of the feature vectors. We have numbered the features in the same order in which they appear in the “Features” subsection.

Table depicting the performance of our model for different combination of feature vectors. For example: 1 denotes the 1st feature in the subsection “Features” of this article, which is TF IDF.

As inferred from the above table, our model does worse only in comparison to TextRank and Facebook’s InferSent models.(Which are state of art-models for a reason!)

Some Shortcomings of Our Model

Although our model does well when compared so the common similarity measures, following are some shortcomings of our model :-

  1. Our model scales well to most Fuzzy Type1 Models but it lags by quite a margin against other non-fuzzy State-of-Art Summarization Models.

2. Features used for our model mostly capture statistical sense. We miss the semantic view which also corroborates the gap in performance.

3. Lately many researches have followed practice of dividing the test test into training and cross validation sets, to derive better rules. For example, one derivative used by Jefferson (Fuzzy Approach for Sentiment Analysis). Building on hand wired rules alone have a detrimental effect on performance.

Possible Improvements and extensions

Keeping the shortcomings in mind, the following are the ways which may increase the performance of our model :-

  1. Using type-2 Fuzzy Logic to better capture the uncertainties and ambiguities of text.

2. A consensus based approach may give better performance as compared to a standalone fuzzy logic model.

3. Hand Wired rules can be further fine-tuned by training and cross validating. We might talk about the how to’s for the same in a future post.

4. More semantic features can be included to incorporate better sense from both views.

Further Reading

If you are interested in further improving and broadening your understanding of the topics covered in this article, you can go through the following:-

  • Fuzzy Logic Concepts
  1. Wikipedia!
  2. Flirtation, a very fuzzy Prospect: Flirtation Advisor
  3. Uncertain Rule-Based Fuzzy Systems
  4. Fuzzy Logic and its uses
  5. Uncertainity in Fuzzy Logic Systems

5. Introduction To Type-2 Fuzzy Logic Control: Theory and Applications

6. Membership Function

  • Preprocessing Methods
  1. Using NLTK for Preprocessing

2. Preprocessing Text-KDnuggets blog

  • Research Articles
  1. Optimizing Text Summarization Based on Fuzzy Logic: Farshad Kyoomarsi et. al.
  2. Is there a need of fuzzy logic?: Lofti A. Zadeh
  3. Fuzzy Logic Based Method for Improving Text Summarization: Suanmali et.al.

4. Fuzzy logic for linguistic summarization of databases: J. Kacprzyk

5. An overview of methods for linguistic summarization with fuzzy sets: Fatih Emre Boran et. al.

--

--