Data Warehouse

Introduction

Most in-house and cloud-based systems are transaction processing systems. The user enters something and a response is expected. This is largely what is taught in schools and universities. These types of systems are random access, as it is hard to predict what the user will want next. Large AI (artificial intelligence) systems are increasingly used to predict user interaction.

The Data Warehouse, on the other hand, most resembles the batch processing systems of old. The traffic is largely sequential and predictable. The skills to optimize these sorts of systems are largely overlooked in schools and universities. Much of the practical theory has been lost to time over the past 50 years, as batch processing systems fell out of favor.

However, Random Access and Sequential data are very different and required very different skill sets and even different computer hardware. A transaction processing system typically visits 10% of the data daily. A data warehouse system on the other hand typically visits 100% of the data. As a result, you need to build the data warehouse 10 times bigger and 10 times faster.

Easy’s skill and talent in Data Warehousing are a result of our years of experience with very large batch-oriented systems. We designed, built, and operated 2 of the largest banking systems in BC along with 2 of the largest data warehouses. We know how to optimize sequential data to avoid having to build the data warehouse 10 times bigger and 10 times faster.

Data Warehouse

Data Warehouse
Data Warehouse 3

Easy specializes in developing large, high-performance data warehouse systems in record time, with a minimum of cost. We primarily use MS SQL/SSIS because of the low cost and the ability to grow to meet demand.

What is a Data Warehouse

A Data Warehouse (DW) can be thought of as a black box that sits between your production computer systems and your reporting system. The function of this black box is to regularly take a copy or snapshot of the production systems, similar to a backup. Unlike a backup, the DW then restructures the data to best meet the reporting needs of the business.

The company’s own business and data rules are used to perform this restructuring, to create business information (BI), and provide insight that was not previously available. The reporting system can then use this information to optimize the business. As such the DW is often described as an Information Warehouse.

Production Reporting

Why not simply do your reporting from the production system? Four reasons:

  • Optimization
  • Integration
  • Security
  • Recovery

Optimization

Data Warehouses are typically first installed to optimize reporting. To provide analytics that is not provided by the “bread and butter” production systems. This frees up the production systems to concentrate on the day-to-day operational requirements of the business.

Integration

In addition, the DW allows for cross-platform reporting. Typically it can be very hard to find a single vendor that provides a “best in class” solution for all your business system needs. Rather, for example, a vendor may provide a great payroll system, but a poor sales system.

As a result, a company can end up with a payroll system from one vendor and a sale system from another vendor. The data warehouse provides a means, for example, to integrate commissions earned in sales with commissions paid on the payroll.

Master Data Management

As Data Warehouses have evolved the concept of Master Data Management (MDM) has been introduced as businesses recognize the need for a “single version” of the truth. While the DW is based on Transactional and Reference Data, MDM is based on Master Data.

For example, a customer may exist in different systems, such as sales and accounts. The customer may have Transactional Data in both systems such as sales and payments. And both systems may have Reference Data such as customer names and addresses.

Master data combines the transactions and reference data into a single view of the customer. This provides two significant benefits:

  • Everything known about a customer is provided by a single source.
  • A change in one business system will update all business systems.

Master Data, Reference Data, and Transactional Data are all part of an Integrated Data Warehouse (IDW). As designs have evolved, the Extract Load Transform (ETL) process has been simplified and expanded to handle increasing volumes of fast-changing data.

This has allowed for the introduction of low-cost MDM development as compared to traditional DW design. In ETL design, the majority of the cost is T (Transform), and is often the bottleneck in design, implementation, maintenance, and support.

MDM recognizes this problem and minimizes T through the use of timestamp and source identification.

Security

Natural disasters and power outages are not the only serious challenges facing production systems. With the rise of ransomware and cryptocurrency, it is possible for bad actors to shut down systems with virtually no fear of law enforcement tracing or preventing their activities. It is quite possible many attacks are state-sponsored, with past and present employees the targets to gain access.

These events are much more common than reports in the press would indicate. No matter how good your firewall, no matter how good your malware detection, human engineering remains the weakest link in any security system. Companies will often pay the ransom quietly rather than risk their public image.

Data Recovery

The DW and MDM can play a key role in disaster recovery. As mentioned previously, the operation of the data warehouse is very similar to a data backup. In fact, the DW and MDM can use backups as their data source, or with good design, the DW and MDM can supplement or replace backups.

Virtualization

For these reasons, it is not advisable to run the data warehouse on the same hardware as the production system via virtualization. A breach of security at the hardware level could lock out your production system, backups, and data warehouse. Recovery is then almost impossible without paying the ransom. And having paid the ransom, there is no guarantee of recovery.

An alternative is to place your data warehouse offsite and use transaction or log shipping to backup the production system. The data warehouse can then use a copy of the transaction log as its data source. In this case, the hackers must break security into two different systems rather than a single system.

With data warehousing and backup on a separate system, hackers will look for a weaker target. Thieves do not target two strong men walking down the street. They pick on the little old lady clutching her purse.

When swimming with sharks you don’t need to swim faster than the sharks, only faster than the person beside you. Good security well designed and implemented will defeat perfect security never realized.

The excuse, “it will never happen to us” is always true, until it does.