Integration and Analysis of Unstructured Data for Decision Making: Text Analytics Approach

Ise Anderson Orobor

Abstract


Relational Database Management System (RDBMS) which is highly relied on by organizations for decision making are limited in their design to integrate and analyze data from unstructured sources. Research has shown that large part of organizational information exists in unstructured sources which might contain information needed for decision making. Integrating data from unstructured sources into RDBMS for the purpose of analysis is challenging due to their inconsistent and unorganized structures. This paper is therefore, aimed at developing a system that automatically integrates unstructured data into RDBMS. Considering the invaluable role played by academic journals (which are in turn unstructured in nature) in educational domain, the system, using text analytic approach, extract relevant information from academic journals to build a structured database which can further be analyzed to support decision making.

Full Text:

PDF

References


Prasad K. and Ramakrishna S. Text Analytics to Data Warehousing. International Journal on Computer Science and Engineering. 2010, 2(6), 2201-2207.

Delavari, N., Phon-Amnuaisuk, S. and Beikzadeh, M. Data Mining Application in Higher Learning Institutions. Informatics in Education International Journal. 2008, 7(1), 31-54.

Effective Decision Making in Higher Educational Institutions using Data Warehousing and Data Mining. Available online: http://www.ijcst.com/vol33/5/alok.pdf. (accessed on 12 June 2015).

Perceived Records Management Practice and Decision Making Among University Administrators in Nigeria. Library Philosophy and Practice. Available online: http://www.webpages.uidaho.edu/~mbolin/atulomah.htm. (accessed on 13 October 2015).

Integrating Structured and Unstructured Data Using Text Tagging and Annotation. Available online: http://www.bi-bestpractices.com/view-articles/4735. (accessed on 24 October 2015).

Fatudimu, I.T, Uwadia, C.O and Ayo, C.K. Improving Customer Relationship Management through Integrated Mining of Heterogeneous Data. International Journal of Computer Theory and Engineering. 2012, 4(4), 518-522.

Exploration and Analysis of Unstructured Business Data using Text Analytics: A Study. Available online: http://www.ijetae.com/files/Volume5Issue7/IJETAE_0715_18.pdf. (accessed on 2 October 2015).

Gupta, V. and Rathore, N. Deriving Business Intelligence from Unstructured Data. International Journal of Information and Computation Technology. 2013, 3(9), 971-976.

Sukumaran, S. Enterprise Infrastructure Scores over Islands of Applications for Information Management. Infosys SETLabs Briefings. 2005, 3(4).

Management Update: Companies Should Align Their Structured and Unstructured Data. Available online: https://www.gartner.com/doc/470721?ref=ddisp. (accessed on 10 January 2016).

Managing Unstructured Data with Structured Legacy Systems. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.5017&rep=rep1&type=pdf. (accessed on 24 October 2015).

Structured Data in a Big Data Environment, Dummies. Available online: http://www.dummies.com/how-to/content/structured-data-in-a-big-data-environment.html. (accessed on 10 January 2016).

Big Content: The Unstructured Side of Big Data. Available online: http://blogs.gartner.com/darin-stewart/2013/05/01/big-content-the-unstructured-side-of-big-data/. (accessed on 8 January 2016).

Skytree 15.2 Delivers Integrated Machine Learning to Unstructured Text Data. Available online: http://www.skytree.net/company/pr/skytree-15-2-delivers-integrated-machine-learning-to-unstructured-text-data/. (accessed on 24 December 2015).

Applications of Machine Learning through Unstructured Text Data. Available online: http://www.skytree.net/tag/unstructured-data/. (accessed on 24 December 2015).

Gupta, V. and Gosain, A. Tagging Facts and Dimensions in Unstructured Data. International Conference on Electrical, Electronics and Computer Science Engineering (EECS). 1-6 May 2013

Gupta V. and Lehal, G. A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence. 2009, 1(1), 60-65.

Text Analytics Beginner’s Guide Extracting Meaning from Unstructured Data. Available online: http://www.angoss.com/wp-content/uploads/2013/04/eBook-Text-Analytics-Beginners-Guide.pdf. (accessed on 24 October 2015).

Entity Extraction – Lexalytics. Available online: https://www.lexalytics.com/content/whitepapers/Lexalytics-WP-Entity-Extraction.pdf. (accessed on 24 October 2015).

Wilcock, G. Introduction to Linguistic Annotation and Text Analytics. Synthesis Lectures on Human Language Technologies. 2009, 2(1), 1-159

Disease Named Entity Recognition by Machine Learning Using Semantic Type of Metathesaurus. Available online: http://www.ijmlc.org/papers/367-C3012.pdf. (accessed: 10 December 2016).

Integrating unstructured data into relational databases. Available online: http://tangra.si.umich.edu/~radev/767w10/papers/Week06/TextRepresentation/Mansuri.pdf. (accessed on 30 October 2015).

Ekbal A. and Bandyopadhyay S. Bengali Named Entity Recognition using Support Vector Machine. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages. 51–58 October 2008.

Named Entity Recognition. PhD Study Report, University of West Bohemia. Available online: https://www.kiv.zcu.cz/site/documents/verejne/vyzkum/publikace/technicke-zpravy/2012/tr-2012-04.pdf. (accessed on 10 January 2016).

Nadeau, D. and Sekine, S. A survey of named entity recognition and classification. Linguisticae Investigationes. 2007, 30(1), 3–26.

Manning, C., Mihai S., John, B., Finkel, J., Bethard, S. and McClosky, C. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55-60 May 2014

Conditional random fields: probabilistic models for segmenting and labeling sequence data. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.803&rep=rep1&type=pdf. (accessed on 2 October 2015).

Conditional Random Fields. Available online: http://pages.cs.wisc.edu/~jerryzhu/cs769/CRF.pdf. (accessed on 2 October 2015).

Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining. Available online: https://support.sas.com/resources/papers/proceedings14/1288-2014.pdf. (accessed on 2 October 2015).

Using linguistic information and machine learning techniques to identify entities from juridical documents. Available online: http://www.morganclaypool.com/doi/abs/10.2200/S00196ED1V01Y200906AIM006?journalCode=aim. (accessed on 4 January 2016).

Mining and Integration of Structured and Unstructured Electronic Clinical Data for Dementia Detection. Thesis, Rochester Institute of Technology. Available online: http://scholarworks.rit.edu/cgi/viewcontent.cgi?article=9737&context=theses. (accessed on 24 October 2015).

Stanford Named Entity Recognizer. Available online: http://nlp.stanford.edu/software/CRF-NER.html. (accessed on 8 November 2015).

Hendler, J. Data Integration for Heterogenous Datasets. Big Data. 2014, 2(4), 205–215.

The Development of Academic Journals in Institutions of Higher Learning in Kano State, Nigeria. Available online: http://www.webpages.uidaho.edu/~mbolin/ahmedmohammed.htm. (accessed on 29 January 2016).


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162