As the world of computing continues to evolve from a
relatively small unsophisticated system to the massively complex and immensely
wide nature it has today the data that has been created in this process
continues to grow alongside. With the growing use of analytics to use data from
systems to not only evaluate performance but also perform prediction and
forecasting an increased interest in the data available in systems including
the internet resulted in this data being classified into two categories, structured and unstructured data. While
the definition of ‘data as values or
information pertaining to real world object’ remains the same, the
classification focuses on the organization and structure used to store this
data.
1. Structured Data
Structured data as the name
suggests refers to data that is easily organized
and follows a fixed and agreed structure. It comprises of data that resides
in a fixed field within a data record.
The data from excel files or from a relational database can be considered as
one of the example of structured data.
Examples of Structured Data
Machine Generated
Sensory Data - GPS data, manufacturing sensors,
medical devices
Point-of-Sale Data - Credit card information,
location of sale, product information
Call Detail Records - Time of call, caller and
recipient information
Web Server Logs - Page requests, other server
activity
Human Generated
Input Data – Data inputted into a computer:
age, zip code, gender, etc.
Structured data depends on
creating a data model of what
business data will be recorded and how it will be stored, processed and
accessed. The data model includes characteristics of the data fields such as
the type e.g. numeric, characters etc.,
size and any restrictions on the data to be stored in that particular
field. Having such a defined structured provides structured data with the advantage
to be easy in storing, accessing and analysis. Structured data is usually
managed using Structured Query Language SQL, originally developed by IBM and
now available as various database systems provided by a wide number of vendors.
Latest database systems are sophisticated in providing a wide array of features
in both storing and analyzing data and have grown immensely compared to their predecessors.
Though it may seem that the world
loves structure and majority data that is available today would belong to this
category the actual is quite contrary. Structured data accounts for only 20% of the data available.
2. Unstructured Data
As expected and evident by the
name unstructured data has little or no pre-defined
structure. While this kind of data has no predefined form and seems
illogical to store it in this way, certain scenarios such as an email message
do not fit well into the structured data rules.
Examples
of Unstructured Data
Word Doc’s, PDF’s and Other Text Files
Audio Files – Music and other recording
such as customer service
Presentations – PowerPoint presentation
and animations
Videos – Movies and other personal/
educational videos
Images – Pictures etc.
Messaging - Instant messages, text
messages
In all these instances, the data can
provide compelling insights. Using the right tools, unstructured data can add a
depth to data analysis that couldn’t be achieved otherwise. A surprising fact
that was revealed during the analysis of these data categories is that
unstructured data accounts for more than 79%
of the data. The reasons for this include the exponential growth of the internet, social media and various unstructured
systems.
Structured
and Unstructured data in deriving business insights and the Future.
Studies have shown that volume of
corporate data doubles every year and the public Web grows by over seven
million pages per day. Despite the fact that companies spend billions to manage
information, it is still fragmented and not properly integrated. In order to
add coherency to this and make this information available in a meaningful
format to the business user data warehousing began to be widely adopted.
While currently data warehouses are
used to derive business insights from a huge volume of structured data these
still lack capabilities to provide a clear picture to the business user. In addition
to structured and unstructured data present in an organization a significant
amount of information comes from outside the corporation such as news, reports
from analyst etc. This information is essential and relevant in formulating
business strategies. Information glut and knowledge shortage make it essential for an organization to integrate its
structured and unstructured information. Because of the sheer volume of structured
and unstructured in order to avoid information glut the users should be able to
filter relevant and important information. A drawback of the volume of data is knowledge glut which refers to the lack
of context in presenting information in a meaningful way.
Business
goals of integrating structured and unstructured information:
- Improved access, integration and management of information and improved the quality of decision making
- Improved productivity and workflow, reduced cost and achieve greater output with fewer resources
- Build tighter relationships, reduce cycle time and improve customer service and support
- Outsource and leverage non-core competencies to suppliers and partners, and collaborate with them on an on-going basis
With the emergence of technologies
such as Big data, No SQL which make it easy to analyze unstructured data, the
future of date warehouse is headed towards deriving insights from the volumes
of untapped information that lies dormant in unstructured data. Warehouses of
the future will be able to easily integrate unstructured and structured data
with little or no intervention from a human and subsequently derive insights by
combining a wide array of data sets. Such systems will not only be fast and
efficient but will be highly configurable and dynamic in nature as opposed to
systems widely used in the present day.
References:
·
http://www.webopedia.com/TERM/S/structured_data.html
· https://www.betterbuys.com/bi/future-of-data-warehousing/
· https://www.betterbuys.com/bi/future-of-data-warehousing/
No comments:
Post a Comment