Thursday, March 31, 2016

Presentation and Visualization Methods

A visualization method is a systematic, rule based, external, permanent and graphical representation that depicts information in a way that is conducive to acquiring insights, developing an elaborate understanding or communicating experiences. The discipline of visualization is an emergent one and currently represents a highly unstructured domain of research. A research paper recently presented provides a simple structure inspired by the look, feel and logic of the periodic table that provides a descriptive over view over the domain and can function as the inventory or repository like a structure tool box. In order to be a successful communicator not only needs to convey the message but also tailor it according to the recipient. The recipient should be able to understand the knowledge and be able to use it for meaningful actions.
The methodology can be divided into 3 steps.
1.       Identify potential candidates for visualization
2.       Selecting the methods best suited for the job.
3.       Logical and accessible compilation of selected methods.
The visualization methods can be further classified based on the challenges and requirements as follows.
1.       Complexity of visualization
2.       Main application or content area
3.       Details, overview or detail and overview
4.       Reducing complexity of increasing it
5.       Process and structure




A data visualization should not only be precise, versatile, aesthetically pleasing but also effective and efficient.

Healthcare
The healthcare industry might not be considered one which would make use of a lot of visualization but with the changing dynamics of the industry the demand for visualization is on the rise. Health care facility managers have to constantly keep a track on the various types of patients being treated in order to better manage their facility and provide superior service. In cases where it is important to highlight the majority share belonging to a particular type of patients a pie chart is the most effective visualization method. Similarly researchers focusing on various health care topics feel the need to present the finding in case of numerical data using a visualization method.
Education
Similar to the healthcare domain, visualization in the education domain also vary according to the purpose and the kind of data. While presenting the distribution of students across various grades can be best represented as a heat map a bar chart can be used to show increasing trends of costs as well as student engagement.

Financial Services
Managers working in the financial services domain often use visualization to present data related to assets, liabilities or trends in the Forex and other factors. Depending on the kind of data that is being presented a manager could effectively use something as simple as pivot table when trying to present the banks status over various parameters such as account, location, customer type etc. In certain cases when the manager aim to present trends of how value of a particular stock has varied over time a trend line is more appropriate to represent such data.


In conclusion various visualization methods help effectively present data and certain guidelines help making a better decision at choosing them however no particular method can be associated only to a particular industry.


Thursday, March 3, 2016

Two worlds of Data : Structured and Unstructured

As the world of computing continues to evolve from a relatively small unsophisticated system to the massively complex and immensely wide nature it has today the data that has been created in this process continues to grow alongside. With the growing use of analytics to use data from systems to not only evaluate performance but also perform prediction and forecasting an increased interest in the data available in systems including the internet resulted in this data being classified into two categories, structured and unstructured data. While the definition of ‘data as values or information pertaining to real world object’ remains the same, the classification focuses on the organization and structure used to store this data.

1.       Structured Data

Structured data as the name suggests refers to data that is easily organized and follows a fixed and agreed structure. It comprises of data that resides in a fixed field within a data record. The data from excel files or from a relational database can be considered as one of the example of structured data.

Examples of Structured Data
Machine Generated
Sensory Data - GPS data, manufacturing sensors, medical devices
Point-of-Sale Data - Credit card information, location of sale, product information
Call Detail Records - Time of call, caller and recipient information
Web Server Logs - Page requests, other server activity
Human Generated
Input Data – Data inputted into a computer: age, zip code, gender, etc.

Structured data depends on creating a data model of what business data will be recorded and how it will be stored, processed and accessed. The data model includes characteristics of the data fields such as the type e.g. numeric, characters etc., size and any restrictions on the data to be stored in that particular field. Having such a defined structured provides structured data with the advantage to be easy in storing, accessing and analysis. Structured data is usually managed using Structured Query Language SQL, originally developed by IBM and now available as various database systems provided by a wide number of vendors. Latest database systems are sophisticated in providing a wide array of features in both storing and analyzing data and have grown immensely compared to their predecessors.
Though it may seem that the world loves structure and majority data that is available today would belong to this category the actual is quite contrary. Structured data accounts for only 20% of the data available.

2.       Unstructured Data

As expected and evident by the name unstructured data has little or no pre-defined structure. While this kind of data has no predefined form and seems illogical to store it in this way, certain scenarios such as an email message do not fit well into the structured data rules.


Examples of Unstructured Data
Word Doc’s, PDF’s and Other Text Files
Audio Files – Music and other recording such as customer service
Presentations – PowerPoint presentation and animations
Videos – Movies and other personal/ educational videos
Images – Pictures etc.
Messaging - Instant messages, text messages
In all these instances, the data can provide compelling insights. Using the right tools, unstructured data can add a depth to data analysis that couldn’t be achieved otherwise. A surprising fact that was revealed during the analysis of these data categories is that unstructured data accounts for more than 79% of the data. The reasons for this include the exponential growth of the internet, social media and various unstructured systems.

Structured and Unstructured data in deriving business insights and the Future.

Studies have shown that volume of corporate data doubles every year and the public Web grows by over seven million pages per day. Despite the fact that companies spend billions to manage information, it is still fragmented and not properly integrated. In order to add coherency to this and make this information available in a meaningful format to the business user data warehousing began to be widely adopted.
While currently data warehouses are used to derive business insights from a huge volume of structured data these still lack capabilities to provide a clear picture to the business user. In addition to structured and unstructured data present in an organization a significant amount of information comes from outside the corporation such as news, reports from analyst etc. This information is essential and relevant in formulating business strategies.  Information glut and knowledge shortage make it essential for an organization to integrate its structured and unstructured information. Because of the sheer volume of structured and unstructured in order to avoid information glut the users should be able to filter relevant and important information. A drawback of the volume of data is knowledge glut which refers to the lack of context in presenting information in a meaningful way.

Business goals of integrating structured and unstructured information:
  • Improved access, integration and management of information and improved the quality of decision making
  • Improved productivity and workflow, reduced cost and achieve greater output with fewer resources
  • Build tighter relationships, reduce cycle time and improve customer service and support
  • Outsource and leverage non-core competencies to suppliers and partners, and collaborate with them on an on-going basis


With the emergence of technologies such as Big data, No SQL which make it easy to analyze unstructured data, the future of date warehouse is headed towards deriving insights from the volumes of untapped information that lies dormant in unstructured data. Warehouses of the future will be able to easily integrate unstructured and structured data with little or no intervention from a human and subsequently derive insights by combining a wide array of data sets. Such systems will not only be fast and efficient but will be highly configurable and dynamic in nature as opposed to systems widely used in the present day.


References:
·         http://www.webopedia.com/TERM/S/structured_data.html
·         https://www.betterbuys.com/bi/future-of-data-warehousing/

Thursday, February 18, 2016

Dimensional modelling in Retail Banking Industry


The retail banking also known as consumer is banking provides services such as savings and checking account, mortgages, personal loans, debit/credit card to individual customers through local branches. Banks facilitate transfer of funds between accounts, currency conversion, auto transfer between accounts, bill payments etc. The major source of income for banks is the difference between interest paid by customers for loans and the interest paid to customers for having funds in savings account. Additionally banks earn revenue from monthly maintenance fees, conversion charges and fee for various banking activity such as Swift transfer etc. Banks are regulated by a federal agency and have to abide to rules and regulations laid down by them. Certain regulations include maintaining a certain percentage of liquid funds at all times, reporting fraudulent transaction and customers etc.

As discussed in the above paragraph banks earn their bread and butter by lending money to borrowers and pays some part of it to customers who maintain funds in their savings accounts. One of the important business metrics that the CEO would be interested in while evaluating performance of a bank is the total funds product wise in a financial quarter. He would also be interested in evaluating the number of accounts under each product and the number of active customers for a quarter. Such data can be used to analyze how the distribution of funds is spread across various products and help implement strategy to maintain liquidity. Additional metrics such as number of accounts under each product and numbers of customers when compared across quarters provides QoQ statistics and can help business teams to come up with products and promotions to increase customer base for a particular product or service.

The banking industry is becoming increasingly dependent on information technology to retain its competitive edge and adapt to changing market scenarios. Every day as a result of the sheer volume of transactions that take place in a bank, enormous amounts of data is produced. Yet most of this data that can be used to gather strategic information remains locked within archival systems. Dimensional modelling of such information can be used to generate reports that can be used by corporate heads while making decisions regarding strategy. Reports can also be generated for compliance issues. The lack of consistent data restricted the use of model based decision making.

Dimensional modelling can help the business processes in the following ways
1.        Collate data from multiple sources and create a single consistent view.
2.       Quick ad hoc queries to support real business questions
3.       Help maintain flexibility and scalability.
4.      Optimize user end to end experience by encapsulating the underlying model.

Considering the metrics we earlier described about quarterly information of funds across products and customer segments, a periodic snap shot fact table would be the appropriate selection. A snapshot of the account balance for account belonging to various products and customers will be uploaded at intervals of quarters. This information can then be used to generate report of the total funds under each product or customer segment type for a quarter.

A sample dimensional model is shown below.




The dimensional model shown above provides one way in which a model can be created to provide quick statistics for decision making.

Thursday, February 4, 2016

Business Intelligence & Analysis Products Scan & Evaluation


Over the last decade as businesses transform from traditional book keeping methods to a more sophisticated digital medium, enterprises are buzzing with a vast amount of data. Data about their customers, suppliers, partners, competitors etc. over a huge span a time is easily available at the disposal of modern era decision makers. However scanning through such vast amounts of data is a mammoth and time consuming activity. In order to turn this data into actionable information enterprises are turning to BI analytics tools. Business Intelligence (BI) is a technology driven process that turns this data into information that can assist managers in making critical business decisions. BI encompasses a variety of tools, applications and methodologies that can help organization collection data from various sources both internal and external which could be available in a variety of formats, transform this data and run queries and prepare dashboard and visualizations that can be presented to managers to assist in the decision making process.

While the BI tools market can be considered matured, it is constantly evolving to satisfy changing analytics needs of today’s enterprises. Over the past ten years BI needs have changed from IT authored production reports that were pushed out of system to users now demanding interactive style of analytics and insights from advanced analytics without requiring IT or data science skills. Vendors are trying hard to meet customer requirements which has resulted in a wide array of products offering a wide variety of features available in the market today. Unfortunately there is no single product which fits requirements for each industry and deciding on a BI tools shouldn’t be based on the features offered but rather on the analytics that the users require and will be used by the enterprise.

Based on the capabilities provided, BI tools can be grouped into three broad categories.
1.        Guided Analysis and Reporting: This category includes tools that have been used traditionally to perform recurring analysis on specific data. This category was earlier limited to static reports but has evolved with functionality that enables user to filter, compare, visualize and analyze data. The characteristic of this group is that the analysis performed may vary based on needs of the customer when performing analysis however the data set and metric remain pre-defined. The IT team generally created the tools and reports for the end users and is responsible got managing the underlying data and tool on a recurring basis.

2.       Self-service BI and analysis: BI tools used by business users to perform ad-hoc analysis are major part of this group. The analysis is usually one time analysis or recurring which can be shared with other users. The users of these tools are both consumers as well as producers of analytics. These tools allow users to add data while performing analysis without IT intervention. Though most data sources can be consumed by these tools, there might be a few sources which are not allowed. Also the user must have understanding of the data source to use the tool effectively.

3.        Advanced Analytics: The tools are used by data scientist to create predictive and prescriptive analytical models. Predictive analytics, statistical modeling, data mining and big data analytical software is included in this category. Majority of the time I spent in data ingestion, integration and cleansing.

BI Category and style
The success of a BI project depends immensely on selecting the right BI tools for your enterprise needs. Key data or analytical characteristics like data sources, performance measures, recurring vs one-time analysis, visual analysis, spreadsheet usage, business knowledge of data and business analytical skills can be used to create use cases that can help select the appropriate BI tool for an enterprise.

For the purpose of comparison I have selected the following BI tools which are among the leaders in the Gartners Magic quadrant for 2015.
IBM Cognos: A web based integrated business intelligence suite provided by IBM that provides a rich toolset for ad hoc query, report and dashboard authoring and consumption, OLAP, scorecarding, production reporting, scheduling, alerting, data discovery and mobile. IBM has displayed a compelling vision for the future with innovation such Watson analytics making it a sough after product. 

Microsoft BI: Microsoft Power BI is a collection of online services and features that enables user to find and visualize data, share discoveries, and collaborate in intuitive new ways. Developed by Microsoft it can seamlessly combine with existing enterprise data, external data and unstructured big data. It supports a diverse range of centralized and decentralized BI use cases and analytic needs for its large customer base

Microstrategy: MicroStrategy, Inc. is a provider of business intelligence (BI), mobile software, and cloud-based services. The company is based in the Washington, D.C. area and serves companies and organizations worldwide. Founded in 1989 by Michael J. Saylor and Sanju Bansal, the firm develops software to analyze internal and external data in order to make business decisions and to develop mobile apps. The software can be deployed in companies' data centers, or as cloud services.

Oracle BI: Oracle BI offers a modern analytics platform powered by advanced analytics and exceptional visualization capabilities. Its products range from hardware to software platforms, and include Oracle BI Foundation Suite, more than 80 prebuilt BI applications, Oracle Endeca Information Discovery and Oracle Essbase — most of which are available on the Oracle Exalytics Engineered System.

Tableau: Tableau software is an American company headquartered in Seattle, Washington. The company is the provider of rich data visualization tools. Tableau Software helps people see and understand data. Offering a revolutionary new approach to business intelligence, Tableau allows users to quickly connect, visualize, and share data with a seamless experience from the PC to the iPad. 

The parameters used for rating were as follows:
1.        Capabilities: The functionalities and features offered by the product.
2.       Performance: The hardware and environment requirement of the product.
3.       Scalability: The measure of how well a product scales.
4.      Productivity: Support provided by the platform for productive work.
5.       Value benefit: The value offered by the product in comparison with the price at which it is offered to the customer.

Weighted Analysis of the products

Product
Weight
IBM Cognos
Microsoft BI
MicroStrategy
Oracle BI
Tableau
Capability
40%
5
4
4.5
4.5
3.5
Performance
20%
4
4
3.5
4
4
Scalability
15%
4
4
4
3.5
3
Productivity
15%
3.5
3.5
3.5
4
4.5
Value benefit
10%
3.5
4.5
4
3
3.5
4.275
3.975
4.025
4.025
3.675