Blog II | Module 1 - Data Warehouse Design

This blog post is for Module 1 (Data Warehouse Design) which will discuss the different lectures and reading materials that fall under this module. 

The blog will cover the below topics:

  1. Data Warehouse Design Cycle
  2. Balanced Scorecard
  3. Performance Management
  4. Star Schema Design 
  5. Data Quality Analysis
  6. Dashboard Design and Analysis. 

Week 2 Lecture Summary: 

In the "Data Warehouse Design Cycle" lecture I learned that the two major organizational informations are Online Transaction Processing or OLTP and Online Analytical Processing or OLAP. OLTP is for business operation databases and OLAP contain current and historical data. Also OLTP use relational tables to structure and store data and OLAP which uses star schema or a snowflake schema which contain facts and dimensions. It also uses relational tables and data cubes which are multidimensional structures for storing a lot of historical data. 

The "Introduction to Balanced Scorecard" lecture discussed how business intelligence is about performance measurement and management and how Balanced Scorecard is a performance management framework. It maps the goals to series of actions needed to achieve these goals. The Balanced Scorecard also provides systematic technique to develop strategy maps and derive key performance indictors or KPIs. The Balanced Scorecard also has four perspectives and combined together make performance management framework. They are "financial perspective", "customer perspective", "internal business process perspective" and "learning and growth perspective". 

Lastly the "Dimensional Modeling Star Schema Design" lecture discussed the four step process for designing a dimensional model. The steps are "selecting the business process to model", "declaring the grain of the business process", "choosing dimensions that apply to each fact row" and "identify the numeric facts". Also the star schema has facts and dimensions. The facts represent the business measures and the dimensions represent the grain of the fact. Fact tables are tall and thin and dimension tables are short and fat and they are also un-normalized or denormalized. 

Week 2 Reading Analysis:

The "Using the Balanced Scorecard as a Strategic Management System" reading discussed the four perspectives financial, customer, internal business process and learning and growth, but the reading also discussed something new, the managing strategy: four processes. The four processes are business planning you do this by setting targets, aligning strategic initiatives, allocating resources and establishing milestones. The next process is the feedback and learning process, and in this process you articulate the shared vision, supply strategic feedback and facilitate strategy review and learning. The next process is the translating the vision which is clarifying the vision and gaining consensus. The last process is the communicating and linking process which is communicating and educating, setting goals, and linking rewards to performance measures. I did not know about these processes before, it helped me to better understand because it was explained in great detail. 

The "The Balanced Scorecard - Measures That Drive Performance" reading discussed how many companies used a daily financial report to improve quality. This then give the operators great feedback and motivation and pushed their quality and productivity efforts. It also motivated the operators to make decisions could improve the quality, increase how productivity one can be and reduce wasteful consumption of energy and materials. Although many companies have been using the daily financial report, I feel like still there are a lot of companies that are not utilizing such things. 

The "How Southwest Airlines Developed its Balanced Scorecard Analysis" reading discussed how Southwest used the four perspectives of the balanced scorecard to improve how they do their business. Their Financial Perspective was to bring more customers but use fewer planes. Their Customer Perspective was to target certain customers who appreciate good price and on-time arrivals. Their Internal Perspective was to have fast aircraft turnaround time and lastly their learning perspective was to educate and compensate the crew and to use the employee stockholder program. While I was reading this, I thought about other airlines and why they are not utilizing the balanced scorecard the way Southwest is, especially since Southwest had such a success with it.  

The "Date Warehousing - Chapter 11" discussed the definition of data warehouse which is a subject-oriented, integrated, time-variant, the data cannot be updated and is used to help management make decisions. Subject-oriented refers to the customers, integrated means to have consistent naming conventions. Time-variant means can study trends and changes and non-updatable means it can only be read-only and it periodically refreshed. It also mentioned a data mart which is a data warehouse that is very limited in scope. This additional reading supplemented the lecture really well, it went in more detail regarding definitions and throughly explained the data warehouse and data mart which helped me because I was still having some confusion regarding data marts. 

Chapter 1 of the book discussed fact tables and dimension tables and how they are joined in a star schema. Business process is represented by dimensional models has fact tables and dimension tables. It is important to make the dimensional schema simple because business users will better understand it and it is easier to navigate. The chapter also discussed a few myths regarding dimensional modeling such as dimensional models are only for summary data and this is the reason for poor designed dimensional models. The summary data should complement the detailed data but it should not be used to replace the detailed data. I referenced chapter 1 a lot while doing the assignment and preparing for the quizzes. This chapter discussed Fact Tables and Dimension Tables in great length and for someone who did not know much about Fact Tables and Dimension Tables this chapter really helped with understanding them.

Chapter 2 of the book discussed the four key decisions made when designing a dimensional model. They are selecting the business process, declaring the grain, identifying the dimensions, and identifying the facts. The business process is the activities that are performed by the business, and choosing the process is very important for targeting a design. Declaring the grain is very key. It shows what a single fact table row represents and it must be declared before choosing dimensions or facts. The dimension tables contain the descriptive attributes used by the BI applications for filtering and grouping the facts. Lastly, facts are the measurements that result from a business process event and it is almost always numeric. Chapter 2 helped a lot while doing the assignment because it goes in such detail explaining the business process. 

Week 2 Personal Opinion:

I really enjoyed learning about OLTP and OLAP, I didn't know about them before this week and learning about it and then applying that knowledge to assignment number one really helped me better understand them. It also made doing the assignment a lot easier. I also really found the Balanced Scorecard really interesting, I wish more companies would utilize it. The readings did discuss a few companies that have applied and it how it has really helped them because overall, the customers were happier and so were the staff. 

Week 2 Additional Materials:

Here is an interesting reading that I found online that talks more about OLTP and OLAP. The article compares the two and states the benefits and defines both really well. 

https://techdifferences.com/difference-between-oltp-and-olap.html 

Week 3 Lecture Summary: 

In the "Advanced Star Schema Design" lecture I learned that surrogate keys act as primary keys, it is a sequential number and it is used to join dimension tables with the fact table and that they never change. Also the lecture discussed that degenerate dimension helps to do many different kinds of analyses such as market basket analysis. Additional the lecture discussed the three types of slowly changing dimensions. In type 1 when one thing changes you make the update to show the new change and forget about the fact that the old item was there. In type 2, preserving the history is important so you insert another tuple with a new surrogate key. Lastly in type 3, you add a new column with a current information and then rename the old column so you have both information with the same surrogate key. 

In the "Business Intelligence - Data Quality Analysis" I learned that good quality data is extremely important for decision making. In addition, there are many reasons why someone would want to do data profiling for example if they want to repurpose something other than the original purpose for which it was gathered or if they want to understand and assess the risk involved in using data for a new application. However, data profiling can be complicated if it is done manually, there are software tools that help automate the work that is needed for understanding and profiling data. These tools generate statistical summaries and frequencies distributions for the unique values found within the data set. 

Week 3 Reading Analysis:

The main thing that stood out to me in chapter three were the four step dimensional design process and the examples for how to create dimension and fact tables. The steps are first to select the business process and this is done by listening to the organization and identifying the process. Next step is to declare the grain which means to specify what an individual fact table row represents. It shows the level of detail is associated with the fact table. Then the step is to identify the dimensions, and this is easily done if the grain is clearly. The dimensions represent the "who", "what" "where" "when" "why" and "how" associated with the event. The last step is to identify the facts, and this is done by answering "What is the process measuring?". Knowing what the dimensions represent helped a lot while creating the dimension tables for the assignment. 

Chapter four discussed the fact table types which are transaction fact tables, periodic snapshot fact tables and accumulating snapshot fact tables. Transaction Fact Tables represent an event that occurred at a certain point. These tables also fit easily into a dimensional framework. Periodic Snapshot Fact Tables shows the cumulative performance of the business at a regular time intervals. It is also the only place to easily retrieve a regular view of longitudinal performance trends. Lastly Accumulating Snapshot Fact Tables represents processes that have certain beginning and ending with a standard set of intermediate process steps. They also have multiple date foreign keys that represent the major events or process milestones. Chapter 4 went into great length explaining the different types of Fact Tables and the importances of each one of them.

The last chapter (chapter five) also discussed the slowly changing dimension basics similarly to the lecture. However there were just more types mentioned in this chapter such as adding a new attribute or adding a mini dimension. The new attribute would allow to see the fact data as if the change never happened. Also adding a mini-dimension is needed when a dimension is too big which would impact browsing and querying filtering performance. This type allows to break off frequently analyzed or changing attributes into separate dimension. It was interesting to know that a big dimension can be broken down to smaller ones when needed and that having such big dimensions can cause issues while querying. 

Week 3 Personal Opinion:

I really enjoyed this weeks materials learning more about the Star Schema Design was really interesting and helped a lot with the assignment. I had never created a Star Schema Design before this class and learning about its benefits and how to create one was helpful. The steps for creating one was a little hard to understand at first but understanding it well made doing the assignment a lot easier. I also did not know that dimension tables have a surrogate keys and primary keys. I also knew that data quality is important but knowing that having good quality data allowed for repurposing something other than the original purpose really showed the importances of having good data. 

Week 3 Additional Materials:

This article helped me while I was trying to do the assignment. It is very detailed and goes over Fact Tables and Dimension Tables in more detail. It also talks about surrogate keys degenerate dimensions. 

https://docs.microsoft.com/en-us/power-bi/guidance/star-schema 

Week 4 Lecture Summary: 

In this lecture I learned about "Dashboard Design and its use for analysis". The dashboard is a visual representation of a company's performance, the dashboard is also in one screen. Enterprise Information system and portals are not dashboards. The dashboard could exist once you enter a portal but it is not the portal it self. Dashboards can also contain quantitative measures, if a dashboard has a baseline you can look at the performance and see how well you are doing with respect to a baseline. Also if you have a target, you can look at the target and see how close or far you are from that target. Dashboards also have widgets but it is recommended to keep the dashboard simple to avoid clutter. The dashboard should be able to provide the big picture of your performance so the focus has to be on important things. Don't have too many widgets, and don't use useless color and decoration on a dashboard. 

The TED talk video was very interesting. He looked at the data from different countries around the world and he looked at child mortality for example, and the software was able to display it in a way that is easily understandable. The widgets and colors that were used were necessary and made looking at the visual a lot easier. He also discussed how we as a world need a good search function for the publicly funded data, a search function where we can copy the data to a searchable format and get it out to the world, this way it is searchable for people. Not only will it be searchable but that people can use the design tools to animate it. 

Week 4 Reading Analysis:

"The 5 Most Influential Date Visualizations of All Time" reading discussed the importances of having data visualization and how without it we are more likely to have misunderstandings and missed opportunities. The reading then discussed five visualizations that people use and find helpful. All five were very interesting, however "March on Moscow" stood out to me because the visualization showed how the cold weather was the reason behind the defeat of Napoleon's army. I also enjoyed learning about the nurse who used data to show the number of deaths due to a disease that they could prevent. This then showed how important it is to always sanitize. I did not know there were so many different types of visualizations, it was helpful to know that there are and the uses for each one of them.

"Dashboard Design: Beyond Meters, Gauges, and Traffic Lights" this reading discussed the importances of having a dashboard and how it is the means to communicate information. The reading talks about how the terms EIS (Executive Information Systems) and dashboard are not the same thing. Dashboards can be used to achieve what EIS promises but that does not mean that they are the same thing. Dashboards are not a certain time of data for a group of people and EIS is like this. Dashboard is defined as a visual display of information that is important and the information is shown on a screen so it can be discussed while it is presented. This is done so that the information is taken in very quickly. It needs to let the audience know that this is what is important and needs attention, so it is necessary for the dashboard to be easy to ready to get the necessary information as soon as possible. It was important to know that EIS and portals are not the same thing as dashboards. This is a common misunderstanding in a lot of places. 

However, dashboards also present challenges, such as not knowing what is necessary to include in the dashboard. The design of the dashboard is also very important. Two important things to consider when using widgets: it needs to display the information in the best way and it needs to serve its purpose. 

"Common Pitfalls in Dashboard Design" discussed how a dashboard's entire purpose is to show important information clearly, accurately and efficiently. Creating a dashboard is not easy because trying to get that information on a single screen in a way that doesn't make the screen look messy is not easy. One of the common pitfalls in a dashboard design that stood out to me is "displaying excessive detail or precision" the details on the dashboard need to be of only the things that are necessary. Don't have irrelevant information that would distract the viewer or waste the readers time. This reading was very helpful while working on assignment one. It goes in great detail explaining each pitfall. 

Week 4 Personal Opinion:

While working on assignment two I kept looking through the lecture notes and readings especially "Common Pitfalls in Dashboard Design" reading. It helped with designing the visualizations and then the dashboard at the end. Without the materials it would have been very difficult to do the assignment. It is so easy to do a visualization incorrectly, a few times I made mistakes and when I went through the lecture I caught the mistake and was able to make the change. 

Week 4 Additional Materials:
I mostly used the tableau site for reference but this article also helped me better understand tableau. I hope these materials assist anyone who needs additional information. I am excited to read through everyone else's blogs and see how they can assist me with this weeks assignment.  

https://www.tutorialspoint.com/tableau/index.htm 

References:

Says, B., Korir, B., Says, G., Gh, Says, A., Arnold, . . . Hadi. (2019, December 26). Difference between OLTP and Olap (with COMPARISON CHART). Retrieved February 04, 2021, from https://techdifferences.com/difference-between-oltp-and-olap.html

Peter-Myers. “Understand Star Schema and the Importance for Power BI - Power BI.” Power BI | Microsoft Docs, docs.microsoft.com/en-us/power-bi/guidance/star-schema.

Tableau Tutorial. (n.d.). Retrieved from https://www.tutorialspoint.com/tableau/index.htm





Comments

  1. Hello Leeda , I am happy to see that another peer found the "Common Pitfalls in Dashboard Design" to be informative. While doing my assignment , I refered to this almost 40 times :) Haha..

    ReplyDelete
    Replies
    1. Yes, it was a great article for the assignment. Super helpful!

      Delete
  2. Great summary of Module -1 , I like the fact that you have seggregated your personal learnings which was a great idea !

    ReplyDelete
  3. I've encountered several dashboard which had the excessive precision/detail issue. It really made it difficult to process quickly what the dashboard was monitoring. Having just the right amount of detail is definitely key to getting users to readily and frequently use your dashboard.

    ReplyDelete

Post a Comment