g-Mil g-Req g-Ext g-Rak g-Mir g-Col g-Frame Architecture
 



Marker Generator: g-MIL


Principle

g-MIL, the Marker Generator Independent of Language makes it possible to automatically identify the most significant passages in a body of text. The marker is a key element of the document signature.

Without the need for additional dictionaries or manual intervention, g-MIL analyses text and produces a list of the principal Noun phrases found, i.e. principal word groups, those which really give the meaning of the text. Noun Phrases are weighted according to their importance.

Like all AMI™ components g-MIL works in English, French, German Spanish, Portuguese, Italian and Dutch.


Applications

g-MIL can be used to:

Optimise the relevance of a search engine
A full-text indexer that generally creates a reverse index, allowing the user to find all documents containing the relevant words and phrases. It optimises the relevance of the answers by taking into account things such as the frequency with which a term appears or the proximity of words to the request.

By indexing a document on the Noun Phrases that it contains its possible to find documents that really "speak about" the subject, rather than those that simply “contain” the words

Noun Phrases are weighted, according to their importance in the original text.


Create a summary

Amalgamating the sentences containing the heaviest Noun Phrases makes it possible to easily create an extract of the document which summarises the meaning of it.

One can also "skew" the summary, while asking g-MIL to support the sentences containing Noun Phrases which contain the same specific terms. It provides a  “Summary in Context” of the request, sometimes called "Query Biased Summary". Creating a summary in context of a document containing relevant terms makes it possible, for example, to automatically extract one summary in particular if required.


Classify a document

Identify the principal subjects covered by a document before categorising it, in an automatic or semi-automatic way.


De-duplicate information

In the same way, knowledge of what a document “speaks” about makes it possible to check if it resembles another document within the same corpus.


Manage the CLIR
CLIR stands for Cross Language Information Retrieval. The principle makes it possible to find information whatever language it is written in, using a request in only one language.

The simplest solutions bypass the translation of requests. Indeed, the translation of a whole document is often time consuming and expensive, the more so as users are often prepared to be content with just the answers received in their own language and not driven to investigate the potential use of those results that are not translated. It is simply often difficult for a user to express a request in a foreign language.
These techniques can be enriched considerably by the translation of each document on the principal Noun Phrases they contain. The translation of expressions is far more powerful than “word-for-word” translation.


Highlight and Navigate within text
A simple and extremely useful application is the electronic highlighter which, by marking all the Noun Phrases within a text, is equivalent to speed reading it. Highlighted terms can be transformed into hyperlinks which can then be used to form the query criteria of a new request submitted to the Search engine.


Offer "more like this"
g-MIL can be used to extract the meaning of an item of text and build a request using the Noun Phrase’s extracted. The request, submitted to a search engine such as g-MIR make it possible to find documents closely related to the initial text. This function is comprehensive when combined with g-REQ (Requests Generator)


Compress Data
Storing textual information takes up space, with all the disadvantages that brings regarding disk space, volume of indexes and reduced speed of response.

Furthermore, some data bases are not optimised to store unstructured textual information: certain fields have to contain only generic information.

In both cases, g-MIL can be used to automate, either completely or partially, the extraction of stored information.


Message Routing
The analysis of messages, such as e-mails, by g-MIL allows automatic routing to recipients by automatic detection of the subject covered, disregarding the “noise” generated by differences in writing style and other complimentary factors.


Example

Below is an example of Noun Phrases calculated by g-MIL on an item of text from an article taken at random from a section of the UK press. g-MIL’s default settings were used for this illustration.

Analysed text

The government of Sudan and the biggest Darfur rebel group agreed to sign a peace deal today, signalling a possible breakthrough end the the civil war that has killed tens of thousands.

Robert Zoellick, US Deputy Secretary of State, said the deal was between the two main players, but two smaller rebel groups have rejected signing and will be bypassed.
"Today the largest group, Minni Minnawi’s, has agreed to sign and the government of Sudan have agreed to sign as well," said Mr Zoellick, indicating that he believed the smaller factions had made a tactical error.
Rebel groups and the Sudanese government have been at war for three years. Past efforts to negotiate a settlement and ceasefire have ended in failure. The most recent example was in 2004 when a deal was struck only to be left in tatters shortly afterwards.

Despite a history of failed peace initiatives, the Sudanese government remained upbeat about the chances this time.


Proposed Titles

biggest Darfur rebel group agreed; main rebel groups

Sudan to sign peace deal with Darfur rebels

Proposed Summary

The Government of Sudan and the biggest Darfur rebel group agreed to sign a peace deal today, signalling a possible breakthrough to end the civil war that has killed tens of thousands. Robert Zoellick, US Deputy Secretary of State, said the deal was between the two main players, but two smaller rebel groups have rejected signing and will be bypassed. Despite a history of failed peace initiatives, the Sudanese Government remained upbeat about the chances this...

Noun Phrases

Sudan [ 245 ]
peace deal [ 245 ]
Government of Sudan [ 173 ]
main rebel groups [ 141 ]
smaller rebel groups [ 100 ]
main rebel group [ 100 ]
biggest Darfur rebel group agreed [ 100 ]

Statistics

ami:language ENGLISH
ami:documentsize 3599
ami:paragraphs 39
ami:meanparagraphs 17
ami:meanwords 602
ami:noun phrases 95
ami:sorted noun phrases 18
ami:final noun phrases 16
ami:maxweight 12


Contact us | Return to the main site