Master Information Systems

An Information System is a group is a group of related components that function cohesively to achieve a specified goal. The components gather, process and distribute data and information. Example of information systems would be the following;

Information systems used by insurance companies to store and process information relating to customers. ATMs and POSs used in grocery stores by shop assistants. Information used by Human Resources in companies to store and process information relating to their employees.

information system

Activities:

Input: Raw data

Processing: Transforming raw data into useful information

Output: Move processed information to people/ activities.

Feedback/ Control: Output passed back to improve process/ inputs

Data can be either qualitative or quantitative. The processing of data involves it being classified, sorted, aggregated, calculations performed on it and then selecting required data.

Information is the ultimate output of the process and is data that has significance in regard to the context it was processed.

Good information should have the following qualities;

  • Relevant
  • Complete
  • Accurate
  • Clear
  • Consistent
  • Reliable
  • Communicable to the right person
  • Be a manageable volume
  • Timely
  • Furnished at a cost lesser than the value of it’s benefits

Information is considered to have three dimensions;

  • Time – Timeliness, currency, Frequency, Time period
  • Content – Accurate, relevant, complete, concise, scope
  • Form – Clarity, detail, order, presentation, media

 

Information systems can help with decision making, diagrams can be used to impart the decision in a structured manner and to make sure decision rules are adhered to. One example of such a diagram would be a decision tree as seen below.

decision-tree

Decision Behaviour describes how people make decisions and the factors that influence them. There are two types;

  • Structured Decisions
  • Unstructured Decisions

A structured decision tends to involve situations where the rules & constraints influencing the decision are known. They usually routine and uncomplicated.

An unstructured decision tends to involve more complex situations, where the rules influencing the decision are complicated or unknown. They usually occur infrequently and rely on the experience of the decision maker.

In Information systems, diagrams are used to display the decision in a structured way and to ensure that the rules are defined correctly.

 

 

 

 

Business Intelligence

This is a grouped information concerning a business’s customers, competitors, partners, competitive environment and internal operations that gives the business the ability to make efficient, significant, tactical and strategic decisions. Big data is huge amounts of unstructured and semi-structured data from the web, sensors, stock market, social media and so on. Big data is of massive interest because it can display more patterns and interesting aberrations than smaller volumes of data. It has the potential to provide novel understanding into such areas as financial market activity, weather patterns, consumer behaviour, tidal movements and so on. To obtain value from big data, we need to use new tools which are able to work with non-traditional data along with traditional data.

These tools include the following;

  • Data warehouses
  • Data marts
  • Hadoop
  • In-memory computing
  • Analytical platforms

images (3)

Data warehouse: Stores present and past standardised data. It also provides analysis and reporting tools

Data marts: This provides a subset of the data warehouse’s data with an emphasis on a single subject or line of business.

Hadoop: Provides parallel processing of big data across cheap computers. It’s main features are;

  • Hadoop distributed file system.
  • MapReduce which breaks data into clusters to works on.
  • Hbase which is a NoSQL database.

In-memory computing: This uses RAM for data storage to make data retrievable at a faster rate. This can speed processing times from hours/days to just seconds.

Analytical platforms: These are high speed platforms that use both relational and non-relational tools for big data sets. One of these tools is OLAP (Online Analytical Processing), It has the following capabilities;

  • Supports multidimensional analysis of data.
  • It views data using multiple dimensions.
  • It can provide instant online answers to ad hoc queries.

Another analytical tool is data mining which performs the following functions;

  • Looks for hidden patterns in sets of data.
  • Generates rules to predict behaviour.
  • Produces data by associations, sequences, clustering and forecasting.

Text mining is also a common analytical tool; this extracts important elements of information such as facts, opinions and dates from large data sets.

y

There are six key elements of any effective business intelligence environment. These are the following;

  • Data from the commercial domain
  • The business intelligence infrastructure
  • Business intelligence analytics
  • Managerial users and functions
  • The delivery platform – Management Information System (MIS), Decision Support System (DSS), Executive Support System (ESS)
  • The user interface

The main objectives of business intelligence and analytics is to produce the following outcomes in real-times and also highly precise manner;

  • Production reports – For routine-type decisions e.g. Marketing, human resources, financial accounts
  • Parameterized reports
  • Dashboards to help the user experience
  • Search/report creation
  • Forecasts and scenarios

Predictive analytics

This is the use of various tools to forecast future trends and behaviour. These tools include the following;

  • Statistical analysis
  • Data mining
  • Historical data

Predictive analytics has numerous BI applications for sales, financial markets and fraud detection to name but a few.

Operational and middle managers utilize MIS (running data from TPS- Transaction Processing System) for routine production reports.

Super users and business analysts utilize DSS for more sophisticated analysis and custom reports and semi-structured decisions.

“What-if” analysis, Sensitivity analysis, Multidimensional analysis / OLAP and pivot tables are all examples of DSSs.

 

 

 

 

Data Mining and Lift and Chi Squared Analysis

Data Mining

Data mining is an analytical process developed to explore big data in order to detect consistent patterns or relationships between variables and to then substantiate the results applying the detected patterns to new subsets of data. The use of statistical formulas Lift and Chi squared can be used to detect levels of Interestingness in Big Data. This is one way to engage in data mining.

Lift measures the dependency/correlation between two sets of data. For example the Lift between A and B would be, Lift (A, B) =

Sup (A u B)/((Sup(A)*Sup(B)) where Sup is the support (likeliness) function, this is similar to the probability of something happening for a given data set)

If Lift(A, B) = 1 => A and B are independent

> 1: positively correlated

< 1: negatively correlated

An additional measure to test correlated events: X^2 or Chi Squared.

X^2 = Σ (Observed – Expected)2 / Expected

  • General rules

X^2 = 0 => independent

X^2 > 0 => correlated, either positively or negatively, so it needs additional test such as Kulczynski.

Please see below an example of a Lift and Chi squared calculation.

Lift Analysis

Chips ^Chips Total Row
Burgers 600 400 1000
^Burgers 200 200  400
Total Column 800 600 1400

Sup = Support.

Burger = B, Chips = C.

Lift(Burger, Chips) =

Sup(B u C)/((Sup(B)*Sup(C)) =

(600/1400)/((1000/1400)*(800/1400))  =  1.05 – This indicates a positive correlation between Burger and Chips.

Lift(B, ^C) =

Sup(B u ^C)/((Sup(B)*Sup(^C)) =

(400/1400)/((1000/1400)*(600/1400))  = 0.933333333…… – This indicates a negative correlation between Burger and ^Chips.

Lift(^B,C) =

Sup(^B u C)/((Sup(^B)*Sup(C)) =

(200/1400)/((400/1400)*(800/1400))  = 0.875 – This indicates a negative correlation between ^Burger and Chips.

Lift(^B,^C) =

Sup(^B u ^C)/((Sup(^B)*Sup(^C)) =

(200/1400)/((400/1400)*(600/1400))  = 1.166666666 …… – This indicates a positive correlation between ^Burger and ^Chips.

 

Shampoo ^Shampoo Total Row
Ketchup 100 200 300
^Ketchup 200 400 600
Total Column 300 600 900

K = Kitchup, S = Shampoo.

Lift(K,S) =

Sup(K u S)/((Sup(K)*Sup(S)) =

(100/900)/((300/900)*(300/900)) = 1.0, No correlation between K and S.

Lift(K,^S) =

Sup(K u ^S)/((Sup(K)*Sup(^S)) =

(200/900)/((300/900)*(600/900)) = 1.0, No correlation between K and ^S.

Lift(^K,S) =

Sup(^K u S)/((Sup(^K)*Sup(S)) =

(200/900)/((600/900)*(300/900)) = 1.0, No correlation between ^K and S.

Lift(^K,^S) =

Sup(^K u ^S)/((Sup(^K)*Sup(^S)) =

(400/900)/((600/900)*(600/900)) = 1.0, No correlation between ^K and ^S.

 

Chips ^Chips Total Row
Burgers 900 (800) 100 (200) 1000
^Burgers 300 (400) 200 (100)  500
Total Column 1200 300 1500

Chi Squared Analysis.

X^2 = Chi Squared.

X^2 = Σ (Observed – Expected)^2/Expected

^2 = Power of 2.

O = Observed; E = Expected.

B = Burger; C = Chips.

X^2(B,C) = (900 – 800)^2/800  = 12.5, As Observed > Expected, We have a positive correlation between B and C.

X^2(B,^C) = (100 – 200)^2/200  = 50.0, As Observed < Expected, We have a negative correlation between B and ^C.

X^2(^B,C) = (300 – 400)^2/400  = 2.5, As Observed < Expected, We have a negative correlation between ^B and C.

X^2(^B,^C) = (200 – 100)^2/100  = 100, As Observed > Expected, We have a positive correlation between ^B and ^C.

The Chi Squared result is the sum of the above 4 values; 12.5 + 50 + 2.5 + 100 = 165. As 165 is positive and as Observed > Expected (for B union C), we have a positive correlation between B and C.

 

Sausages ^Sausages Total Row
Burgers 800 (800) 200 (200) 1000
^Burgers 400 (400) 100 (100)  500
Total Column 1200 300 1500

B = Burger; S = Sausages.

X^2(B,S) = (800 – 800)^2/800  = 0, No correlation between B and S, they are independent of each other.

X^2(B,^S) = (200 – 200)^2/200  = 0, No correlation between B and ^S, they are independent of each other.

X^2(^B,S) = (400 – 400)^2/400  = 0, No correlation between ^B and S, they are independent of each other.

X^2(^B,^S) = (100 – 100)^2/100  = 0, No correlation between ^B and ^S, they are independent of each other.

The Chi Squared result is the sum of the above 4 values; 0+0+0+0 = 0. As the result is 0, we have independence between B and S.

Lift and X^2 would prove to be inadequate algorithms if there was a sizeable amount of null events/transactions in the data set.

Kulczynski’s algorithm would rectify this.