B. L. Prabhu, M. Parveentaj
Abstract: An anatomy based summarization method called Topic Summarization and Content Anatomy (TSCAN) was proposed to summarize the content of a temporal topic. TSCAN models the documents as a symmetric block association matrix, in which each block is a portion of a document, and treats each eigenvector of the matrix as a theme embedded in the topic. A temporal similarity (TS) function is applied to generate the event dependencies and context similarity to form an evolution graph of the topic. A unique feature of TSCAN is the introduction of the event segmentation process to extract the semantic construct event before summarization. An ontology database is used for analyzing the main topics of the article using NPL tool and protg tool. Protg can be customized to provide domain-friendly support for creating knowledge models and entering data. Specifically the Natural language processing is the process of a computer extracting meaningful information from natural language input and/or producing natural language output. After identifying the main topics and determining their relative significance, rank the paragraphs based on the relevance between main topics and each individual paragraph. Depending on the ranks, we choose desired proportion of Para-graphs as summary.
Keywords: coherence, Text mining, Topic anatomy, TSCAN