Blog II | Module 1 - Data Warehouse Design
This blog post is for Module 1 (Data Warehouse Design) which will discuss the different lectures and reading materials that fall under this module.
The blog will cover the below topics:
- Data Warehouse Design Cycle
- Balanced Scorecard
- Performance Management
- Star Schema Design
- Data Quality Analysis
- Dashboard Design and Analysis.
Week 2 Lecture Summary:
In the "Data Warehouse Design Cycle" lecture I learned that the two major organizational informations are Online Transaction Processing or OLTP and Online Analytical Processing or OLAP. OLTP is for business operation databases and OLAP contain current and historical data. Also OLTP use relational tables to structure and store data and OLAP which uses star schema or a snowflake schema which contain facts and dimensions. It also uses relational tables and data cubes which are multidimensional structures for storing a lot of historical data.
The "Introduction to Balanced Scorecard" lecture discussed how business intelligence is about performance measurement and management and how Balanced Scorecard is a performance management framework. It maps the goals to series of actions needed to achieve these goals. The Balanced Scorecard also provides systematic technique to develop strategy maps and derive key performance indictors or KPIs. The Balanced Scorecard also has four perspectives and combined together make performance management framework. They are "financial perspective", "customer perspective", "internal business process perspective" and "learning and growth perspective".
Lastly the "Dimensional Modeling Star Schema Design" lecture discussed the four step process for designing a dimensional model. The steps are "selecting the business process to model", "declaring the grain of the business process", "choosing dimensions that apply to each fact row" and "identify the numeric facts". Also the star schema has facts and dimensions. The facts represent the business measures and the dimensions represent the grain of the fact. Fact tables are tall and thin and dimension tables are short and fat and they are also un-normalized or denormalized.
Week 2 Reading Analysis:
The "Using the Balanced Scorecard as a Strategic Management System" reading discussed the four perspectives financial, customer, internal business process and learning and growth, but the reading also discussed something new, the managing strategy: four processes. The four processes are business planning you do this by setting targets, aligning strategic initiatives, allocating resources and establishing milestones. The next process is the feedback and learning process, and in this process you articulate the shared vision, supply strategic feedback and facilitate strategy review and learning. The next process is the translating the vision which is clarifying the vision and gaining consensus. The last process is the communicating and linking process which is communicating and educating, setting goals, and linking rewards to performance measures. I did not know about these processes before, it helped me to better understand because it was explained in great detail.
The "The Balanced Scorecard - Measures That Drive Performance" reading discussed how many companies used a daily financial report to improve quality. This then give the operators great feedback and motivation and pushed their quality and productivity efforts. It also motivated the operators to make decisions could improve the quality, increase how productivity one can be and reduce wasteful consumption of energy and materials. Although many companies have been using the daily financial report, I feel like still there are a lot of companies that are not utilizing such things.
The "How Southwest Airlines Developed its Balanced Scorecard Analysis" reading discussed how Southwest used the four perspectives of the balanced scorecard to improve how they do their business. Their Financial Perspective was to bring more customers but use fewer planes. Their Customer Perspective was to target certain customers who appreciate good price and on-time arrivals. Their Internal Perspective was to have fast aircraft turnaround time and lastly their learning perspective was to educate and compensate the crew and to use the employee stockholder program. While I was reading this, I thought about other airlines and why they are not utilizing the balanced scorecard the way Southwest is, especially since Southwest had such a success with it.
The "Date Warehousing - Chapter 11" discussed the definition of data warehouse which is a subject-oriented, integrated, time-variant, the data cannot be updated and is used to help management make decisions. Subject-oriented refers to the customers, integrated means to have consistent naming conventions. Time-variant means can study trends and changes and non-updatable means it can only be read-only and it periodically refreshed. It also mentioned a data mart which is a data warehouse that is very limited in scope. This additional reading supplemented the lecture really well, it went in more detail regarding definitions and throughly explained the data warehouse and data mart which helped me because I was still having some confusion regarding data marts.
Chapter 1 of the book discussed fact tables and dimension tables and how they are joined in a star schema. Business process is represented by dimensional models has fact tables and dimension tables. It is important to make the dimensional schema simple because business users will better understand it and it is easier to navigate. The chapter also discussed a few myths regarding dimensional modeling such as dimensional models are only for summary data and this is the reason for poor designed dimensional models. The summary data should complement the detailed data but it should not be used to replace the detailed data. I referenced chapter 1 a lot while doing the assignment and preparing for the quizzes. This chapter discussed Fact Tables and Dimension Tables in great length and for someone who did not know much about Fact Tables and Dimension Tables this chapter really helped with understanding them.
Chapter 2 of the book discussed the four key decisions made when designing a dimensional model. They are selecting the business process, declaring the grain, identifying the dimensions, and identifying the facts. The business process is the activities that are performed by the business, and choosing the process is very important for targeting a design. Declaring the grain is very key. It shows what a single fact table row represents and it must be declared before choosing dimensions or facts. The dimension tables contain the descriptive attributes used by the BI applications for filtering and grouping the facts. Lastly, facts are the measurements that result from a business process event and it is almost always numeric. Chapter 2 helped a lot while doing the assignment because it goes in such detail explaining the business process.
Week 2 Personal Opinion:
I really enjoyed learning about OLTP and OLAP, I didn't know about them before this week and learning about it and then applying that knowledge to assignment number one really helped me better understand them. It also made doing the assignment a lot easier. I also really found the Balanced Scorecard really interesting, I wish more companies would utilize it. The readings did discuss a few companies that have applied and it how it has really helped them because overall, the customers were happier and so were the staff.
Here is an interesting reading that I found online that talks more about OLTP and OLAP. The article compares the two and states the benefits and defines both really well.
https://techdifferences.com/difference-between-oltp-and-olap.html
Week 3 Lecture Summary:
In the "Advanced Star Schema Design" lecture I learned that surrogate keys act as primary keys, it is a sequential number and it is used to join dimension tables with the fact table and that they never change. Also the lecture discussed that degenerate dimension helps to do many different kinds of analyses such as market basket analysis. Additional the lecture discussed the three types of slowly changing dimensions. In type 1 when one thing changes you make the update to show the new change and forget about the fact that the old item was there. In type 2, preserving the history is important so you insert another tuple with a new surrogate key. Lastly in type 3, you add a new column with a current information and then rename the old column so you have both information with the same surrogate key.
In the "Business Intelligence - Data Quality Analysis" I learned that good quality data is extremely important for decision making. In addition, there are many reasons why someone would want to do data profiling for example if they want to repurpose something other than the original purpose for which it was gathered or if they want to understand and assess the risk involved in using data for a new application. However, data profiling can be complicated if it is done manually, there are software tools that help automate the work that is needed for understanding and profiling data. These tools generate statistical summaries and frequencies distributions for the unique values found within the data set.
Week 3 Reading Analysis:
The main thing that stood out to me in chapter three were the four step dimensional design process and the examples for how to create dimension and fact tables. The steps are first to select the business process and this is done by listening to the organization and identifying the process. Next step is to declare the grain which means to specify what an individual fact table row represents. It shows the level of detail is associated with the fact table. Then the step is to identify the dimensions, and this is easily done if the grain is clearly. The dimensions represent the "who", "what" "where" "when" "why" and "how" associated with the event. The last step is to identify the facts, and this is done by answering "What is the process measuring?". Knowing what the dimensions represent helped a lot while creating the dimension tables for the assignment.
Chapter four discussed the fact table types which are transaction fact tables, periodic snapshot fact tables and accumulating snapshot fact tables. Transaction Fact Tables represent an event that occurred at a certain point. These tables also fit easily into a dimensional framework. Periodic Snapshot Fact Tables shows the cumulative performance of the business at a regular time intervals. It is also the only place to easily retrieve a regular view of longitudinal performance trends. Lastly Accumulating Snapshot Fact Tables represents processes that have certain beginning and ending with a standard set of intermediate process steps. They also have multiple date foreign keys that represent the major events or process milestones. Chapter 4 went into great length explaining the different types of Fact Tables and the importances of each one of them.
The last chapter (chapter five) also discussed the slowly changing dimension basics similarly to the lecture. However there were just more types mentioned in this chapter such as adding a new attribute or adding a mini dimension. The new attribute would allow to see the fact data as if the change never happened. Also adding a mini-dimension is needed when a dimension is too big which would impact browsing and querying filtering performance. This type allows to break off frequently analyzed or changing attributes into separate dimension. It was interesting to know that a big dimension can be broken down to smaller ones when needed and that having such big dimensions can cause issues while querying.
Week 3 Personal Opinion:
I really enjoyed this weeks materials learning more about the Star Schema Design was really interesting and helped a lot with the assignment. I had never created a Star Schema Design before this class and learning about its benefits and how to create one was helpful. The steps for creating one was a little hard to understand at first but understanding it well made doing the assignment a lot easier. I also did not know that dimension tables have a surrogate keys and primary keys. I also knew that data quality is important but knowing that having good quality data allowed for repurposing something other than the original purpose really showed the importances of having good data.
This article helped me while I was trying to do the assignment. It is very detailed and goes over Fact Tables and Dimension Tables in more detail. It also talks about surrogate keys degenerate dimensions.
https://docs.microsoft.com/en-us/power-bi/guidance/star-schema
Week 4 Lecture Summary:
Week 4 Reading Analysis:
Week 4 Personal Opinion:
Says, B., Korir, B., Says, G., Gh, Says, A., Arnold, . . . Hadi. (2019, December 26). Difference between OLTP and Olap (with COMPARISON CHART). Retrieved February 04, 2021, from https://techdifferences.com/difference-between-oltp-and-olap.html
Peter-Myers. “Understand Star Schema and the Importance for Power BI - Power BI.” Power BI | Microsoft Docs, docs.microsoft.com/en-us/power-bi/guidance/star-schema.
Tableau Tutorial. (n.d.). Retrieved from https://www.tutorialspoint.com/tableau/index.htm
Hello Leeda , I am happy to see that another peer found the "Common Pitfalls in Dashboard Design" to be informative. While doing my assignment , I refered to this almost 40 times :) Haha..
ReplyDeleteYes, it was a great article for the assignment. Super helpful!
DeleteGreat summary of Module -1 , I like the fact that you have seggregated your personal learnings which was a great idea !
ReplyDeleteI've encountered several dashboard which had the excessive precision/detail issue. It really made it difficult to process quickly what the dashboard was monitoring. Having just the right amount of detail is definitely key to getting users to readily and frequently use your dashboard.
ReplyDelete