Thursday, March 3, 2016

Two worlds of Data : Structured and Unstructured

As the world of computing continues to evolve from a relatively small unsophisticated system to the massively complex and immensely wide nature it has today the data that has been created in this process continues to grow alongside. With the growing use of analytics to use data from systems to not only evaluate performance but also perform prediction and forecasting an increased interest in the data available in systems including the internet resulted in this data being classified into two categories, structured and unstructured data. While the definition of ‘data as values or information pertaining to real world object’ remains the same, the classification focuses on the organization and structure used to store this data.

1.       Structured Data

Structured data as the name suggests refers to data that is easily organized and follows a fixed and agreed structure. It comprises of data that resides in a fixed field within a data record. The data from excel files or from a relational database can be considered as one of the example of structured data.

Examples of Structured Data
Machine Generated
Sensory Data - GPS data, manufacturing sensors, medical devices
Point-of-Sale Data - Credit card information, location of sale, product information
Call Detail Records - Time of call, caller and recipient information
Web Server Logs - Page requests, other server activity
Human Generated
Input Data – Data inputted into a computer: age, zip code, gender, etc.

Structured data depends on creating a data model of what business data will be recorded and how it will be stored, processed and accessed. The data model includes characteristics of the data fields such as the type e.g. numeric, characters etc., size and any restrictions on the data to be stored in that particular field. Having such a defined structured provides structured data with the advantage to be easy in storing, accessing and analysis. Structured data is usually managed using Structured Query Language SQL, originally developed by IBM and now available as various database systems provided by a wide number of vendors. Latest database systems are sophisticated in providing a wide array of features in both storing and analyzing data and have grown immensely compared to their predecessors.
Though it may seem that the world loves structure and majority data that is available today would belong to this category the actual is quite contrary. Structured data accounts for only 20% of the data available.

2.       Unstructured Data

As expected and evident by the name unstructured data has little or no pre-defined structure. While this kind of data has no predefined form and seems illogical to store it in this way, certain scenarios such as an email message do not fit well into the structured data rules.


Examples of Unstructured Data
Word Doc’s, PDF’s and Other Text Files
Audio Files – Music and other recording such as customer service
Presentations – PowerPoint presentation and animations
Videos – Movies and other personal/ educational videos
Images – Pictures etc.
Messaging - Instant messages, text messages
In all these instances, the data can provide compelling insights. Using the right tools, unstructured data can add a depth to data analysis that couldn’t be achieved otherwise. A surprising fact that was revealed during the analysis of these data categories is that unstructured data accounts for more than 79% of the data. The reasons for this include the exponential growth of the internet, social media and various unstructured systems.

Structured and Unstructured data in deriving business insights and the Future.

Studies have shown that volume of corporate data doubles every year and the public Web grows by over seven million pages per day. Despite the fact that companies spend billions to manage information, it is still fragmented and not properly integrated. In order to add coherency to this and make this information available in a meaningful format to the business user data warehousing began to be widely adopted.
While currently data warehouses are used to derive business insights from a huge volume of structured data these still lack capabilities to provide a clear picture to the business user. In addition to structured and unstructured data present in an organization a significant amount of information comes from outside the corporation such as news, reports from analyst etc. This information is essential and relevant in formulating business strategies.  Information glut and knowledge shortage make it essential for an organization to integrate its structured and unstructured information. Because of the sheer volume of structured and unstructured in order to avoid information glut the users should be able to filter relevant and important information. A drawback of the volume of data is knowledge glut which refers to the lack of context in presenting information in a meaningful way.

Business goals of integrating structured and unstructured information:
  • Improved access, integration and management of information and improved the quality of decision making
  • Improved productivity and workflow, reduced cost and achieve greater output with fewer resources
  • Build tighter relationships, reduce cycle time and improve customer service and support
  • Outsource and leverage non-core competencies to suppliers and partners, and collaborate with them on an on-going basis


With the emergence of technologies such as Big data, No SQL which make it easy to analyze unstructured data, the future of date warehouse is headed towards deriving insights from the volumes of untapped information that lies dormant in unstructured data. Warehouses of the future will be able to easily integrate unstructured and structured data with little or no intervention from a human and subsequently derive insights by combining a wide array of data sets. Such systems will not only be fast and efficient but will be highly configurable and dynamic in nature as opposed to systems widely used in the present day.


References:
·         http://www.webopedia.com/TERM/S/structured_data.html
·         https://www.betterbuys.com/bi/future-of-data-warehousing/

No comments:

Post a Comment