Akinbode K. B., Oguns Y. J., Fadiora B. O., Olalekan S. D.
Abstract: During the last two decades with the accelerated Internet development, a great amount of data has been being accumulated and stored on the Web. However, most of that data is stored in the form of natural language, which complicates its further analysis. Information extraction is a technology which creates the structured representation of unstructured texts by extracting relevant entities from them, thereby, making the data analysis realizable or feasible. Despite the fact that information extraction is a comparatively new area of science it evolves rather quickly and significant research has been done and are being conducted constantly. This paper closely investigates the information extraction field. The definitions for information extraction as well as its place in the text mining framework are discussed. The general structure of an information extraction system, two approaches for its creation and its evaluation framework are analyzed. Comparison of some of the systems is made. Finally, the outline of the information extraction project is given by determining its aim and objectives, research methods, tools that will be used and evaluation plan.
Keywords: Unstructured text, text mining, Information extraction, Natural language processing techniques