Posted Jul 17 by Gareth Hutchins.
Updated Jul 18.

Magellan has a number of different micro services available, including the ability to perform text analytics on content. I decided to create a Chrome extension that would use Magellan's Natural Language Processing capabilities and sentiment analysis to summarize websites for me.

Last activity Jul 30 by Goran Stepic.
1230 views. 7 comments.

Overview

The purpose of this post is really two-fold.

  1. It's a relatively simple use case for Magellan's NLP capabilities and shows how easy they are to consume.
  2. The end tool is actually pretty useful - if you just want to skim read different sites.

The full sourcecode for the extension can be found here:
https://github.com/garethhutchins/Magellan-Chrome-Summarizer

The end artifact is an extension that can be added to Chrome, that when pressed, will summarize the current page to the number of sentences you specify as well as summarize the overall tone & sentiment.

Here's what it looks like:
The Chrome Extension Working

Magellan's Text Mining Engine

I came over to Opentext as part of the Dell ECD acquisition where I was the Global Functional Lead for Capture Solutions. Whilst at ECD, I became a little obsessed at how NLP technologies could complement traditional Capture processes - more on this in later Blog posts. The problem was, ECD didn't have an NLP engine - so after the acquisition, I went straight to the Text Mining Engine to see how we could implement it in some of our Customer use cases. I've been very impressed with it's capabilities. Here's what I think sets it apart:

  1. The Micro services are very easy to consume. I managed to integrate them into a traditional capture process before lunch one day.
  2. The engine is not a black box, you can use machine learning techniques to enhance its vocabulary or you can use a user interface to make an immediate change and browse relationships.
    TME Studio
  3. You are not confined to the usual entities of People, Locations & Organisations like a lot of other NLP engines around. You can create new entities for specific verticals like healthcare or 90's alternative rock bands if you like.
  4. It works for a number of different languages, not just English.

Understanding The TME Service

The Text Mining service is easy to consume and understand. The service accepts an XML post which contains the text you want to analyse as well as what types of things you want to look for. I won't talk about all of the features here, just some highlights I often use.

To start with, if you wanted to return the tone and sentiment of a piece of text, your request would look like this:

<?xml version="1.0" encoding="UTF-8" ?>
 <Nserver>
  <NSTEIN_Text>[Your text goes here]</NSTEIN_Text> 
  <Methods>
   <NSentiment/>
  </Methods>
 </Nserver>

This will return the sentiment and tone of each sentence in the text as well as the whole piece of text, document, as specified in the Methods section of the request, the NSentiment command.

So, if you took the following command:

<?xml version="1.0" encoding="UTF-8" ?>
 <Nserver>
  <NSTEIN_Text>
   This is a story about Gareth Hutchins. He lives in Farnham in Surrey and works for Opentext in Reading. He's pretty cool.
  </NSTEIN_Text>
   <Methods>
    <NSentiment/>
     </Methods>
  </Nserver>

You would get the following back:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
 <Nserver Version="3.0">
  <ErrorID Type="Status">0</ErrorID>
  <ErrorDescription>Success</ErrorDescription>
  <Results>
    <NSentiment>
        <SentenceLevel>
            <Sentence>
                <Text begin="6" end="44">This is a story about Gareth Hutchins.</Text>
                <Subjectivity score="10.0075">fact</Subjectivity>
                <Tone>neutral</Tone>
            </Sentence>
            <Sentence>
                <Text begin="45" end="109">He lives in Farnham in Surrey and works for Opentext in Reading.</Text>
                <Subjectivity score="9.7272">fact</Subjectivity>
                <Tone>neutral</Tone>
            </Sentence>
            <Sentence>
                <Text begin="110" end="127">He's pretty cool.</Text>
                <Subjectivity score="79.8701">opinion</Subjectivity>
                <PositiveTone score="38.471"/>
                <NegativeTone score="24.893"/>
                <Tone>positive</Tone>
            </Sentence>
        </SentenceLevel>
        <DocumentLevel>
            <Subjectivity score="75.0036" distribution="17.2043">opinion</Subjectivity>
            <PositiveTone score="25.3561" distribution="17.2043"/>
            <NegativeTone score="16.4069" distribution="0.0"/>
            <Tone>positive</Tone>
        </DocumentLevel>
    </NSentiment>
</Results>

You can also summarize the text either to a percentage, by category or to a number of sentences. I've chosen to use number of sentences. To do that, you just need to add another command to the Methods section. Including a KBid command which is the taxonomy base to use. I'm using IPTC which is the standard for the news industry. There are a number of other different taxonomies the engine has or you can create your own.

  <Methods>
   <nsummarizer>
        <NbSentences>1</NbSentences>
        <KBid>IPTC</KBid> 
    </nsummarizer>
</Methods>

You will then get a structure like this returned back in the Results section:

<Results>
    <nsummarizer>
        <Summary[A Summary of your text]</Summary>
    </nsummarizer>
</Results>

Again, for the purpose of this extension, I'm only using some of the features of the Text Mining Engine. If you wanted to return entities, then you would add the following methods to the call:

  <Methods>
    <nfinder>
        <nfExtract>
            <Cartridges>
                <Cartridge>ON</Cartridge> 
                <Cartridge>PN</Cartridge> 
                <Cartridge>GL</Cartridge> 
            </Cartridges>
            <Hierarchy /> 
        </nfExtract>
    </nfinder>
</Methods>

This would return all Organisations, People and Locations from the text. However, like I say - you are not limited to just these entities with the Text Mining Engine. We also provide Events, Drugs, Symptoms & Date Times out of the box plus you can create your own. This would also return related entities such as town's borough, county, country, continent etc. As well as stock symbols of organisations.

Creating a Popup

I decided to use a Chrome Extension Popup to display the summarization results and allow the user to specify the number of sentences you wanted returned. Here's what the popup looks like:

popup

The popup is just a simple html page. I've added elements for the tone, subjectivity & summarization. I also added a slide container to specify the number of sentences. These elements are then referenced from the JavaScript for the popup.
I excluded all of the style sheet parts from this section but you can see it in the full github project

 <body>
    <img src="logo.png" width="100%">
    <p>Drag the slider to change the number of sentences</p>

    <div class="slidecontainer">
      <input type="range" min="1" max="20" value="2" class="slider" id="myRange">
      <p>Number of Sentences: <span id="demo"></span></p>
      <p><font size = "3">Subjectivity: <span id ="subjectivity"></span></font></p>
      <p><font size = "3">Tone: <span id ="tone"></span></font></p>
      <p><font size ="3"><span id="summary"></span></font></p>
    </div>


Cleaning the HTML

When you access the Document Object of a Webpage, you'll notcie that it's full of junk that you don't know is there. You get all sorts of nonsense about cookies that will spoil the results of the Text Mining Engine. Therefore, before I send the text from a page to the service I do some cleaning. I first only look for text that is displayed, I do this by looping through all of the sections of a document and removing them from the body if they're hidden like so:

 var divs = document.getElementsByTagName("div");
 for (divx of divs) {
  if (divx.style.display === 'none') {
   var divId = divx.id;
   var divR = document.getElementById(divId);
    if (divR !== null) {
    divR.parentNode.removeChild(divR);
    }
  }
}

I then loop through what's left and look for any Paragraph elements, again checking to see if they're visible. If so, I take the text and add it to the text I'm going to pass to the service like so:

var allPs = document.getElementsByTagName("p");
var rText = "";
for (val of allPs) {
    if (val.style.display != 'none' || val.hidden != true) {
        rText += val.innerText + '. ';
    }
}

Finally, I then replace some characters that can cause the service some issue:

text = text.replace(/[\n\r]+/g, ' ');
text = text.replace(/&/g,"&");
text = text.replace(/</g,"<");
text = text.replace(/>/g,">");
text = text.replace(/"/g,""");
text = text.replace(/'/g,"'");
text = text.replace(/\[\d*\]/g, ' ');
text = text.replace( /\.(?=[^\d])[ ]*/g , '. ')

Calling The Service From JavaScript

Once you've cleaned up all of the text from the page, you can then call the service from the Chrome extensions using JavaScript. I used the following code to pass the text, call the service and then populate the results back to the popup:

var command = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><Nserver><NSTEIN_Text>";
command = command + text;
//Say we're looking for # sentences
command = command + "</NSTEIN_Text><Methods><nsummarizer><NbSentences>" + ranger.value + "</NbSentences><KBid>IPTC</KBid></nsummarizer><NSentiment></NSentiment></Methods></Nserver>";
//now do the post
var URL = 'http://[Your TME URL]:[port]/rs/';
//var result = "";
fetch(URL, {
  method: "POST",
  body : command,
  headers : {"Content-Type" : "application/xml"},
})
.then(function(res) {
  if (res.ok) { // ok if status is 2xx
    console.log('OK ' + res.statusText);
  } else {
    console.log('Request failed.  Returned status of ' + res.status);
  }
  return res.text()
})
.then(function(text) {
  parser = new DOMParser();
  var xmlDoc = parser.parseFromString(text,"text/xml");
  result = xmlDoc.getElementsByTagName("Summary")[0].textContent;
  console.log(result);
  summary.innerText = result;
  var DocLevels = xmlDoc.getElementsByTagName("DocumentLevel")[0];
  Subjectivity.innerText = DocLevels.getElementsByTagName("Subjectivity")[0].textContent;
  tone.innerText = DocLevels.getElementsByTagName("Tone")[0].textContent;
  console.log('Subjectivity' + Subjectivity);
  console.log('Tone' + tone);
  return result;
})

Loading the Extension in Chrome

Before you're able to load the extension in Chrome, you need to create a manifest file that includes version information, extension icon and permissions. Mine looks like this:

{
"name": "Magellan",
"version": "1.0",
"manifest_version": 2,
"description": "Summarize pages for a popup",
"browser_action": {
  "default_icon": "icon.png",
  "default_popup": "popup.html"
},
"permissions": ["tabs", "<all_urls>"]
}

You'll then need to change the URL in popup.js to point to your instance of Text Mining Engine Service.
To load your plugin, in chrome enter the following in the address bar:

chrome://extensions/

That will bring up a screen like the following:
Chrome Load
Select the Load Unpacked option from the top and browse to the location of the extension files on your local machine.
And there you have it, the extension is loaded.

Summary

So, that's it. This was a walk-through on how to call Magellan's Microservice for Text Mining from a Chrome Extension to summarize web pages. Please feel free to contact me for any comments or questions.

Ironically, this post has been longer than I thought it would be…

7 Comments

2

Hello,
Is it possible for OpenText employee to have that extension available (and working) for internal business use?
regards,
Thomas


0

Yes, I can provide details of an environment I have.


4

Awesome article and application of Magellan TExt Mining service Gareth!and
Please get back to me if you have any suggestion or comment to improve any aspect of it.
Starting with 16.5, a revamped REST API will start being published to be compatible with OT2.

I really look forward to see other initiatives and applications of our own services like yours!

Martin Brousseau
Product Manager | Magellan Text Mining | aka InfoFusion


0

Very cool, nicely done! I would like to try this out.


0

Nifty plugin!

Just for my own info - as I manage Azure demos, VCloud and a bunch of lab and TME servers… Which one is set as the harcoded default in the github code?


0

Hi Gareth,

Great article. I would also like to try this out! I´m in Marketing and this could help us in reviewing our websites to assure we get the "right message" across! Since I am not a very technical person could you help me to get this installed?
Thanks in advance.
Cheers from Germany
Petra :-)


0

Really good work Gareth!


Table of Contents

Your comment

To leave a comment, please sign in.