Thursday, March 31, 2016

Presentation and Visualization Methods

A visualization method is a systematic, rule based, external, permanent and graphical representation that depicts information in a way that is conducive to acquiring insights, developing an elaborate understanding or communicating experiences. The discipline of visualization is an emergent one and currently represents a highly unstructured domain of research. A research paper recently presented provides a simple structure inspired by the look, feel and logic of the periodic table that provides a descriptive over view over the domain and can function as the inventory or repository like a structure tool box. In order to be a successful communicator not only needs to convey the message but also tailor it according to the recipient. The recipient should be able to understand the knowledge and be able to use it for meaningful actions.
The methodology can be divided into 3 steps.
1.       Identify potential candidates for visualization
2.       Selecting the methods best suited for the job.
3.       Logical and accessible compilation of selected methods.
The visualization methods can be further classified based on the challenges and requirements as follows.
1.       Complexity of visualization
2.       Main application or content area
3.       Details, overview or detail and overview
4.       Reducing complexity of increasing it
5.       Process and structure




A data visualization should not only be precise, versatile, aesthetically pleasing but also effective and efficient.

Healthcare
The healthcare industry might not be considered one which would make use of a lot of visualization but with the changing dynamics of the industry the demand for visualization is on the rise. Health care facility managers have to constantly keep a track on the various types of patients being treated in order to better manage their facility and provide superior service. In cases where it is important to highlight the majority share belonging to a particular type of patients a pie chart is the most effective visualization method. Similarly researchers focusing on various health care topics feel the need to present the finding in case of numerical data using a visualization method.
Education
Similar to the healthcare domain, visualization in the education domain also vary according to the purpose and the kind of data. While presenting the distribution of students across various grades can be best represented as a heat map a bar chart can be used to show increasing trends of costs as well as student engagement.

Financial Services
Managers working in the financial services domain often use visualization to present data related to assets, liabilities or trends in the Forex and other factors. Depending on the kind of data that is being presented a manager could effectively use something as simple as pivot table when trying to present the banks status over various parameters such as account, location, customer type etc. In certain cases when the manager aim to present trends of how value of a particular stock has varied over time a trend line is more appropriate to represent such data.


In conclusion various visualization methods help effectively present data and certain guidelines help making a better decision at choosing them however no particular method can be associated only to a particular industry.


Thursday, March 3, 2016

Two worlds of Data : Structured and Unstructured

As the world of computing continues to evolve from a relatively small unsophisticated system to the massively complex and immensely wide nature it has today the data that has been created in this process continues to grow alongside. With the growing use of analytics to use data from systems to not only evaluate performance but also perform prediction and forecasting an increased interest in the data available in systems including the internet resulted in this data being classified into two categories, structured and unstructured data. While the definition of ‘data as values or information pertaining to real world object’ remains the same, the classification focuses on the organization and structure used to store this data.

1.       Structured Data

Structured data as the name suggests refers to data that is easily organized and follows a fixed and agreed structure. It comprises of data that resides in a fixed field within a data record. The data from excel files or from a relational database can be considered as one of the example of structured data.

Examples of Structured Data
Machine Generated
Sensory Data - GPS data, manufacturing sensors, medical devices
Point-of-Sale Data - Credit card information, location of sale, product information
Call Detail Records - Time of call, caller and recipient information
Web Server Logs - Page requests, other server activity
Human Generated
Input Data – Data inputted into a computer: age, zip code, gender, etc.

Structured data depends on creating a data model of what business data will be recorded and how it will be stored, processed and accessed. The data model includes characteristics of the data fields such as the type e.g. numeric, characters etc., size and any restrictions on the data to be stored in that particular field. Having such a defined structured provides structured data with the advantage to be easy in storing, accessing and analysis. Structured data is usually managed using Structured Query Language SQL, originally developed by IBM and now available as various database systems provided by a wide number of vendors. Latest database systems are sophisticated in providing a wide array of features in both storing and analyzing data and have grown immensely compared to their predecessors.
Though it may seem that the world loves structure and majority data that is available today would belong to this category the actual is quite contrary. Structured data accounts for only 20% of the data available.

2.       Unstructured Data

As expected and evident by the name unstructured data has little or no pre-defined structure. While this kind of data has no predefined form and seems illogical to store it in this way, certain scenarios such as an email message do not fit well into the structured data rules.


Examples of Unstructured Data
Word Doc’s, PDF’s and Other Text Files
Audio Files – Music and other recording such as customer service
Presentations – PowerPoint presentation and animations
Videos – Movies and other personal/ educational videos
Images – Pictures etc.
Messaging - Instant messages, text messages
In all these instances, the data can provide compelling insights. Using the right tools, unstructured data can add a depth to data analysis that couldn’t be achieved otherwise. A surprising fact that was revealed during the analysis of these data categories is that unstructured data accounts for more than 79% of the data. The reasons for this include the exponential growth of the internet, social media and various unstructured systems.

Structured and Unstructured data in deriving business insights and the Future.

Studies have shown that volume of corporate data doubles every year and the public Web grows by over seven million pages per day. Despite the fact that companies spend billions to manage information, it is still fragmented and not properly integrated. In order to add coherency to this and make this information available in a meaningful format to the business user data warehousing began to be widely adopted.
While currently data warehouses are used to derive business insights from a huge volume of structured data these still lack capabilities to provide a clear picture to the business user. In addition to structured and unstructured data present in an organization a significant amount of information comes from outside the corporation such as news, reports from analyst etc. This information is essential and relevant in formulating business strategies.  Information glut and knowledge shortage make it essential for an organization to integrate its structured and unstructured information. Because of the sheer volume of structured and unstructured in order to avoid information glut the users should be able to filter relevant and important information. A drawback of the volume of data is knowledge glut which refers to the lack of context in presenting information in a meaningful way.

Business goals of integrating structured and unstructured information:
  • Improved access, integration and management of information and improved the quality of decision making
  • Improved productivity and workflow, reduced cost and achieve greater output with fewer resources
  • Build tighter relationships, reduce cycle time and improve customer service and support
  • Outsource and leverage non-core competencies to suppliers and partners, and collaborate with them on an on-going basis


With the emergence of technologies such as Big data, No SQL which make it easy to analyze unstructured data, the future of date warehouse is headed towards deriving insights from the volumes of untapped information that lies dormant in unstructured data. Warehouses of the future will be able to easily integrate unstructured and structured data with little or no intervention from a human and subsequently derive insights by combining a wide array of data sets. Such systems will not only be fast and efficient but will be highly configurable and dynamic in nature as opposed to systems widely used in the present day.


References:
·         http://www.webopedia.com/TERM/S/structured_data.html
·         https://www.betterbuys.com/bi/future-of-data-warehousing/