fbpx

Data warehouse and Enterprise Resource Planning | DWH and ERP

ARCHIVE DATA CENTRAL: HISTORY ED EVOLUTIONS

The two dominant themes of corporate technology in the 90s were i data warehouse and the ERP. For a long time these two powerful currents have been part of corporate IT without ever having intersections. It was almost as if they were matter and anti-matter. But the growth of both phenomena inevitably led to their intersection. Companies today are facing the problem of what to do with ERP and data warehouse. This article will outline what the problems are and how they are addressed by companies.

AT THE BEGINNING…

In the beginning there was the data warehouse. Data warehouse was created to counter the transaction processing application system. In the early days the memorization of the data collected it was meant to be just a counterpoint to transaction processing applications. But nowadays there are far more sophisticated visions of what one can do data warehouse. In today's world the data warehouse it is part of a structure that can be called the Corporate Information Factory.

THE CORPORATE INFORMATION FACTORY (CIF)

The Corporate Information Factory has standard architectural components: a transformation and code integration layer that integrates i data collected while I data collected move from the application environment to the data warehouse of the company; a data warehouse of the company where i data collected detailed and integrated histories. The data warehouse company serves as the foundation upon which all other parts of the environment can be built data warehouse; an operational data store (ODS). An ODS is a hybrid structure that contains some aspects of the data warehouse and other aspects of an OLTP environment; data marts, where different departments can have their own version of the data warehouse, One data warehouse of exploration where company "thinkers" can present their 72-hour queries with no detrimental effect on the data warehouse; and a near line memory, in which data collected old and data collected bulk detail can be stored inexpensively.

WHERE ERP COMBINES WITH THE CORPORATE INFORMATION FACTORY

The ERP merges with the Corporate Information Factory in two places. First as a basic application (baseline) that provides the data collected of the application to data warehouse. In this case i data collected, generated as a by-product of a transaction process, are integrated and loaded into the data warehouse of the company. The second point of union between ERP and CIF is the ODS. Indeed, in many environments ERP is used as a classic ODS.

In case ERP is used as a basic application, the same ERP can also be used in the CIF as ODS. In any case, if ERP is to be used in both roles, there must be a clear distinction between the two entities. In other words, when ERP plays the role of core application and ODS, the two architectural entities must be distinct. If a single implementation of an ERP tries to perform both roles at the same time there will inevitably be problems in the design and implementation of that structure.

SEPARATE ODS AND BASIC APPLICATIONS

There are many reasons that lead to the division of the architectural components. Perhaps the most telling issue in separating the different components of an architecture is that each component of the architecture has its own view. The baseline application serves a different purpose than the ODS. Try to overlap

a baseline application view on the world of an ODS or vice versa is not a right way to work.

Consequently, the first problem of an ERP in CIF is to verify if there is a distinction between the baseline applications and the ODS.

DATA MODELS IN THE CORPORATE INFORMATION FACTORY

To achieve cohesion between the different components of the CIF architecture, there must be a model of data collected. The models of data collected they serve as a link between the various components of the architecture such as the baseline applications and the ODS. The models of data collected they become the "intellectual road map" to get the right meaning from the different architectural components of the CIF.

Going hand in hand with this notion, the idea is that there should be a great and unique model of data collected. Obviously there must be a pattern of data collected for each of the components and furthermore there must be a sensible path connecting the different models. Each component of the architecture - ODS, baseline applications, data warehouse company, and so on .. - needs its own model of data collected. And so there must be a precise definition of how these models of data collected they interface with each other.

MOVE I DATA OF THE ERP DATE WAREHOUSE

If the origin of the data collected is a baseline application and / or an ODS, when the ERP enters i data collected in data warehouse, this insertion must take place at the lowest level of "granularity". Simply summarize or aggregate i data collected just as they come out of the ERP baseline application or the ERP ODS is not the right thing to do. THE data collected details are needed in the data warehouse to form the basis of the DSS process. Such data collected will be reshaped in many ways by the data marts and explorations of the data warehouse.

The displacement of data collected from the ERP baseline application environment to the data warehouse of the company is done in a reasonably relaxed manner. Such a shift occurs approximately 24 hours after the upgrade or creation in the ERP. The fact of having a "lazy" movement of gods data collected in data warehouse of the company allows the data collected coming from the ERP to "settle". Once i data collected are deposited in the baseline application, then you can safely move i data collected of ERP in the enterprise. Another goal achievable thanks to the “lazy” movement of the gods data collected it is the clear demarcation between operational processes and DSS. With a "fast" movement of the data collected the line between DSS and operational remains vague.

The movement of data collected from the ODS of the ERP to data warehouse of the company is done periodically, usually weekly or monthly. In this case the movement of data collected it is based on the need to "clean" the old data collected historians. Of course, the ODS contains i data collected which are much newer than the data collected historians found in data warehouse.

The displacement of data collected in data warehouse it is almost never done "wholesale" (in a wholesaler manner). Copy a table from the ERP environment to the data warehouse it does not make sense. A much more realistic approach is to move selected units of the data collected. Only the data collected which have changed since the last update of the data warehouse are the ones that should be moved to the data warehouse. One way to know which ones data collected have been changed since the last update is to look at the timestamps data collected found in the ERP environment. The designer selects all the changes that have occurred since the last update. Another approach is to use change acquisition techniques data collected. With these techniques, logs and journal tapes are analyzed in order to determine which ones data collected must be moved from the ERP environment to that of data warehouse. These techniques are best as the logs and journal tapes can be read from the ERP files without further affecting the other ERP resources.

OTHER COMPLICATIONS

One of the problems with ERP in CIF is what happens to other sources of the application or to data collected of the ODS which must contribute to data warehouse but they are not part of the ERP environment. Given the closed nature of ERP, especially SAP, the attempt to integrate keys from external sources of data collected with data collected that come from the ERP at the time of moving i data collected in data warehouse, it is a great challenge. And how many exactly are the probabilities that i data collected of applications or ODS outside the ERP environment will be integrated into the data warehouse? The odds are actually very high.

FIND DATA HISTORICALS FROM ERP

Another problem with the data collected of the ERP is that deriving from the need to have data collected historians within the data warehouse. Usually the data warehouse needs data collected historians. And usually ERP technology doesn't store these data collected historical, at least not to the point where it is necessary in the data warehouse. When a large amount of data collected history begins to be added in the ERP environment, that environment needs to be cleaned up. For example, suppose a data warehouse must be loaded with five years of data collected historical while the ERP holds a maximum of six months of these data collected. As long as the company is satisfied to collect a variety of data collected historical as time passes, then there is no problem in using the ERP as a source for the data warehouse. But when the data warehouse he has to go back in time and take gods data collected histories that have not previously been collected and saved by the ERP, then the ERP environment becomes inefficient.

ERP AND METADATA

Another consideration to make about ERP and data warehouse is the one on the existing metadata in the ERP environment. Just as metadata passes from the ERP environment to the data warehouse, the metadata must be moved in the same way. In addition, the metadata must be transformed into the format and structure required by the infrastructure of the data warehouse. There is a big difference between operational metadata and DSS metadata. Operational metadata is primarily for the developer and the

programmer. DSS metadata is primarily for the end user. Existing metadata in ERP applications or ODS needs to be converted and this conversion is not always easy and straightforward.

SOURCING THE ERP DATA

If the ERP is used as a provider of data collected for the data warehouse there must be a solid interface that moves i data collected from the ERP environment to the environment data warehouse. The interface must:

  • ▪ be easy to use
  • ▪ allow access to data collected of the ERP
  • ▪ take the meaning of the data collected that are about to be moved to the data warehouse
  • ▪ know the limitations of the ERP that could arise when the access is made data collected of the ERP:
  • ▪ referential integrity
  • ▪ hierarchical relationships
  • ▪ implicit logical relationships
  • ▪ application convention
  • ▪ all the structures of the data collected supported by the ERP, and so on ...
  • ▪ be efficient in accessing data collected, by providing:
  • ▪ direct movement of data collected
  • ▪ acquisition of change data collected
  • ▪ support timely access to data collected
  • ▪ understand the format of the data collected, and so on… INTERFACE WITH SAP The interface can be of two types, homegrown or commercial. Some of the major commercial interfaces include:
  • ▪ SAS
  • ▪ Prims Solutions
  • ▪ D2k, and so on ... MULTIPLE ERP TECHNOLOGIES Treating the ERP environment as if it were a single technology is a big mistake. There are many ERP technologies, each with its strengths. The best known vendors in the market are:
  • ▪ SAP
  • ▪ Oracle Financials
  • ▪ PeopleSoft
  • ▪ JD Edwards
  • ▪ Baan SAP SAP is the largest and most complete ERP software. SAP applications encompass many types of applications in many areas. SAP has a reputation for being:
  • ▪ very large
  • ▪ very difficult and expensive to implement
  • ▪ it needs a lot of people and consultants to be implemented
  • ▪ needs specialized people for implementation
  • ▪ takes a long time to implement. SAP also has a reputation for storing its own data collected very carefully, making it difficult for someone outside the SAP area to access them. The strength of SAP is that it is capable of capturing and storing a large amount of data collected. SAP recently announced its intention to extend its applications to data warehouse. There are many pros and cons of using SAP as a data warehouse. One advantage is that SAP is already installed and that most consultants are already familiar with SAP.
    The disadvantages of having SAP as a supplier of data warehouse are many: SAP has no experience in the world of data warehouse If SAP is the supplier of data warehouse, it is necessary to "take out" i data collected from SAP al data warehouse. Date a SAP's track record of closed system, it's unlikely to be easy to get i from SAP into it (???). There are many legacy environments that power SAP, such as IMS, VSAM, ADABAS, ORACLE, DB2, and so on. SAP insists on a "not invented here" approach. SAP does not want to work with other vendors to use or create the data warehouse. SAP insists on generating all of its software itself.

Although SAP is a large and powerful company, attempting to rewrite the technology of ELT, OLAP, system administration, and even the core code of the dbms it's just crazy. Instead of taking a cooperative attitude with suppliers of data warehouse long-standing, SAP has followed the approach that they "know best". This attitude holds back the success that SAP could have in the area of data warehouse.
SAP's refusal to allow external suppliers to access theirs promptly and gracefully data collected. The very essence of using a data warehouse is easy access to data collected. The whole story of SAP is based on making it difficult to access data collected.
SAP's lack of experience in dealing with large volumes of data collected; in the field of data warehouse there are volumes of data collected never seen by SAP and to handle these large amounts of data collected you need to have a suitable technology. SAP is apparently not aware of this technological barrier that exists to enter the field of data warehouse.
The corporate culture of SAP: SAP has created a business in obtaining i data collected from the system. But to do this you need to have a different mentality. Traditionally, software companies that were good at getting data into an environment have not been good at getting data to go the other way. If SAP can do this type of switch it will be the first company to do it.

In short, it is questionable whether a company should select SAP as their supplier of data warehouse. There are very serious risks on the one hand and very few rewards on the other. But there is another reason that discourages choosing SAP as a supplier of data warehouse. Because every company should have the same data warehouse of all the other companies? The data warehouse it is the heart of the competitive advantage. If every company adopted the same data warehouse it would be difficult, if not impossible, to achieve a competitive advantage. SAP seems to think that a data warehouse it can be seen as a cookie and this is a further sign of their "get the data in" application mentality.

No other ERP vendor is as dominant as SAP. Undoubtedly there will be companies that will follow the SAP path for theirs data warehouse but presumably these data warehouse SAPs will be large, expensive, and time-consuming to create.

These environments include such activities as bank teller processing, airline booking processes, insurance complaint processes, and so on. The more powerful the transaction system was, the more obvious was the need for separation between operational process and DSS (Decision Support System). However, with human and personal resource systems, you are never faced with large volumes of transactions. And, of course, when a person is hired or leaves the company this is a record of a transaction. But relative to other systems, human and personal resource systems simply don't have many transactions. Therefore, in human and personal resources systems it is not entirely obvious that a DataWarehouse is needed. In many ways these systems represent the bundle of DSS systems.

But there is another factor that needs to be considered when dealing with datawarehouse and PeopleSoft. In many environments, i data collected of human and personal resources are secondary to the primary business of the company. Most companies do manufacturing, selling, providing services, and so on. Human and personal resource systems are usually secondary (or supportive) to the company's main line of business. Therefore, it is equivocal and inconvenient a data warehouse separate for human and personal resources support.

PeopleSoft is very different from SAP in this respect. With SAP, it is mandatory that there be a data warehouse. With PeopleSoft, it's not all that clear. A data warehouse is optional with PeopleSoft.

The best thing that can be said for the data collected PeopleSoft is that the data warehouse can be used in order to archive i data collected related to old human and personal resources. A second reason why a company would like to use a data warehouse a

to the detriment of the PeopleSoft environment is to allow access and free access to analysis tools, ai data collected by PeopleSoft. But beyond these reasons, there may be cases where it is preferable not to have a data warehouse for data collected PeopleSoft.

In short (Italian only)

There are many insights into building a data warehouse inside an ERP software.
Some of these are:

  • ▪ It makes sense to have a data warehouse who looks like any other in the industry?
  • ▪ How flexible an ERP is data warehouse software?
  • ▪ An ERP data warehouse software can handle a volume of data collected which is located in a "data warehouse arena"?
  • ▪ What is the record of the track that the ERP vendor makes in the face of easy and inexpensive, in terms of time, data collected? (what is the ERP vendors track record on delivery of inexpensive, on time, easy to access data?)
  • ▪ What is the ERP vendor's understanding of the DSS architecture and corporate information factory?
  • ▪ ERP vendors understand how to get data collected within the environment, but also understand how to export them?
  • ▪ How open is the ERP vendor to data warehousing tools?
    All of these considerations must be made in determining where to put the data warehouse which will host i data collected ERP and others data collected. In general, unless there is an compelling reason to do otherwise, building is recommended data warehouse outside the ERP vendor environment. CHAPTER 1 Overview of the BI Organization Key points:
    Information repositories work contrary to business intelligence (BI) architecture:
    Corporate culture and IT can limit success in building BI organizations.

Technology is no longer the limiting factor for BI organizations. The problem for architects and project planners is not whether the technology exists, but whether they can effectively implement the available technology.

For many companies a data warehouse it is little more than a passive deposit that distributes i data collected to users who need it. THE data collected they are extracted from the source systems and are populated in target structures of data warehouse. THE data collected they can also be cleaned with any luck. However, no additional value is added to or reaped by data collected during this process.

Essentially, passive DW, at best, only provides i data collected clean and operational to user associations. Information creation and analytical understanding are entirely up to the users. Judge if the DW (Data warehouse) is a success and subjective. If we judge success on the ability to efficiently collect, integrate and clean the data collected corporate on a predictable basis, then yes, the DW is a success. On the other hand, if we look at the information gathering, consolidation and exploitation of the organization as a whole, then the DW is a failure. A DW provides little or no information value. As a result, users are forced to make do, thus creating information silos. This chapter presents a comprehensive overview to recap the business intelligence (BI) architecture of the organization. Let's start with a description of BI and then move on to discussions of information design and development, as opposed to simply providing the data collected to users. Discussions are then focused on calculating the value of your BI efforts. We conclude by defining how IBM addresses the BI architectural requirements of your organization.

Description of the architecture of organization of BI

Powerful transaction-oriented information systems are now commonplace in every large enterprise, effectively leveling the playing field for corporations around the world.

Remaining competitive, however, now requires analytically oriented systems that can revolutionize the company's ability by rediscovering and using the information they already possess. These analytic systems derive from understanding the wealth of gods data collected available. BI can improve performance across all enterprise information. Companies can improve relationships between customers and suppliers, improve the profitability of products and services, generate new and better offers, control risk and, among many other earnings, cut spending drastically. With BI your company finally begins to use customer information as a competitive asset thanks to applications that have market objectives.

Having the right business means means having definitive answers to key questions such as:

  • ▪ Which of ours clients do they make us earn more, or do they send us at a loss?
  • ▪ Where our best live clients in relation to shop/ warehouse they frequent?
  • ▪ Which of our products and services can be sold most effectively and to whom?
  • ▪ Which products can be sold most effectively and to whom?
  • ▪ Which sales campaign was the most successful and why?
  • ▪ Which sales channels are most effective for which products?
  • ▪ How we can improve relationships with our best clients? Most companies have data collected rough to answer these questions.
    Operational systems generate large quantities of product, customer and data collected market from points of sale, reservations, customer service and technical support systems. The challenge is to extract and exploit this information. Many companies take advantage of only small fractions of their own data collected for strategic analyzes.
    I data collected remaining, often joined with i data collected deriving external sources such as "government reports", and other purchased information, are a gold mine just waiting to be explored, and data collected they should only be refined in the informational context of your organization.

This knowledge can be applied in several ways, ranging from designing an overall corporate strategy to personal communication with suppliers, through call centers, invoicing, Internet and other points. Today's business environment dictates that DW and related BI solutions evolve beyond running traditional business structures. data collected which i data collected normalized at the atomic-level and "star / cube farms".

What is needed to remain competitive is a fusion of traditional and advanced technologies in an effort to support a broad analytical landscape.
In conclusion, the general environment must improve the knowledge of the company as a whole, making sure that the actions taken as a result of the analyzes carried out are useful for everyone to benefit.

For example, let's say you rank yours clients in the categories of high or low risk.
If this information is generated by an extracting model or other means, it must be placed in the DW and be made accessible to anyone, by means of any access tool, such as static reports, spreadsheets, tables, or online analytical processing (OLAP ).

However, currently, much of this type of information remains in the silos of data collected of the individuals or departments that generate the analysis. The organization as a whole has little or no visibility for understanding. Only by mixing this type of information content into your enterprise DW can you eliminate information silos and elevate your DW environment.
There are two major obstacles to developing a BI organization.
First, we have the problem of the organization itself and its discipline.
While we can't help with organizational policy changes, we can help understand an organization's components of BI, its architecture, and how IBM technology facilitates its development.
The second barrier to overcome is the lack of integrated technology and the knowledge of a method that recalls the entire BI space as opposed to only a small component.

IBM is responding to changes in integration technology. It is your responsibility to provide conscious planning. This architecture must be developed with technology chosen for unconstrained integration, or at the very least, with technology that adheres to open standards. Additionally, your business management must ensure that BI's business is done on schedule and not to allow for the development of information silos that result from self-serving agendas, or goals.
This is not to say that the BI environment is not sensitive to react to the different needs and requirements of different users; instead, it means that the implementation of those individual needs and requirements is done for the benefit of the entire BI organization.
A description of the architecture of the BI organization can be found on page 9 in Figure 1.1. The architecture demonstrates a rich blend of technologies and techniques.
From the traditional view, the architecture includes the following warehouse components

Atomic Layer.

This is the foundation, the heart of the entire DW and therefore of strategic reporting.
I data collected stored here will retain historical integrity, reports of data collected and they include derived metrics, as well as being cleaned, integrated, and stored using the extracting templates.
All subsequent use of these data collected and related information is derived from this facility. This is an excellent source for extraction of data collected and for reports with structured SQL queries

Operational deposit of data collected or report base of data collected(Operational data store (ODS) or reporting .)

This is a structure of data collected specifically designed for technical signaling.

I data collected stored and reported above these structures can finally propagate in the warehouse through the staging area, where it could be used for strategic reporting.

Staging area.

The first stop for most data collected intended for the warehouse environment is the organization zone.
Here i data collected they are integrated, cleaned and transformed into data collected profits that will populate the warehouse structure

Date marts.

This part of the architecture represents the structure of data collected used specifically for OLAP. The presence of the datamarts, if i data collected they are stored in the overlapping star diagrams data collected multidimensional in a relational environment, or in the files of data collected reserved used by specific OLAP technology, such as DB2 OLAP server, is not relevant.

The only constraint is that the architecture facilitates the use of data collected multidimensional.
The architecture also includes critical technologies and techniques of Bi which stand out as:

Spatial analysis

Space is a windfall of information for the analyst and is critical to complete resolution. Space can represent information about people living in a certain location, as well as information about where that location is physically relative to the rest of the world.

To perform this analysis, you must begin by binding your information to latitude and longitude coordinates. This is referred to as “geocoding” and must be part of the extraction, transformation, and loading process (ETL) at the atomic level of your warehouse.

Data mining.

The extraction of data collected allows our companies to grow the number of clients, to predict sales trends and allow the management of relationships with the clients (CRM), among other BI initiatives.

The extraction of data collected it must therefore be integrated with the structures of data collected of the DWHouse and supported by warehouse processes to ascertain both the effective and efficient use of technology and related techniques.

As indicated in the BI architecture, the atomic level of the DWHouse, as well as the datamarts, is an excellent source of data collected for extraction. Those same facilities must also be recipients of extraction results to ensure availability to the widest audience (broadest audience).

Agents.

There are various agents to examine the customer for each point such as, the company's operating systems and the dw themselves. These agents can be advanced neural networks trained to learn about the trends of each point, such as future product demand based on sales promotions, rules-based engines to react to a date set of circumstances, or even simple agents who report exceptions to "top executives". These processes generally occur in real time and, therefore, must be closely coupled with the movement of the same data collected. All these structures of data collected, technologies and techniques ensure that you won't spend the night building your BI organization.

This activity will be developed in incremental steps, by small points.
Each step is an independent project effort, and is referred to as an iteration in your dw or BI initiative. Iterations may include implementing new technologies, starting with new techniques, adding new structures of data collected , loading i data collected additional, or by expanding your environment's analytics. This paragraph is discussed in more detail in Chapter 3.

In addition to the traditional DW structures and BI tools there are other functions of your BI organization that you need to design for, such as:

Customer touch points (Customer touch points).

As with any modern organization there are a number of customer touchpoints that indicate how to have a positive experience for yours clients. There are traditional channels such as merchants, switchboard operators, direct mail, multimedia and advertising press, as well as the most current channels such as email and web, data collected products with some point of contact must be acquired, transported, cleaned, transformed and then populated at facilities of data collected of BI.

Basics of data collected operational and user associations (Operational

databases and user communities).
At the end of the contact points of the clients you will find the basics of data collected application of the company and user communities. THE data collected existing are data collected traditional that must be reunited and merged with the data collected flowing from the contact points to fulfill the necessary information.

Analysts. (Analysts)

The primary beneficiary of the BI environment is the analyst. It is he who benefits from the current extraction of data collected operational, integrated with different sources of data collected , augmented with features such as geographic analysis (geocoding) and presented in BI technologies that allow you to extract, OLAP, advanced SQL reporting and geographic analysis. The primary interface for the analyst to the reporting environment is the BI portal.

However, the analyst is not the only one who benefits from the BI architecture.
Executives, large user associations, and even partners, suppliers and i clients they should find benefits in enterprise BI.

Back feed loop.

BI architecture is a learning environment. A characteristic principle of development is to allow persistent structures of data collected to be updated using the BI technology used and through user actions. An example is customer evaluation (customer scoring).

If the sales department does a mining model of the customer's scores as to use a new service, then the sales department should not be the only group benefiting from the service.

Instead, the mining model should be carried out as a natural part of the data flow within the company and the customer scores should become an integrated part of the information context of the warehouse, visible to all users. IBM's Suite of Bi-bi-centric including DB2 UDB, DB2 OLAP Server includes most of the major technology components, defined in Figure 1.1.

We use architecture as it appears in this figure in the book to give us a level of continuity and demonstrate how each IBM product fits into the overall BI framework.

Providing the Information Content (Providing information content)

Designing, developing, and implementing your BI environment is a daunting task. Design must embrace both current and future business requirements. The architectural design must be complete to include all conclusions found during the design phase. Execution must remain committed to a single purpose: to develop the BI architecture as formally presented in the design and grounded in business requirements.

It is particularly difficult to argue that discipline will ensure relative success.
This is simple because you don't develop a BI environment all of a sudden, but do it in small steps over time.

However, identifying the BI components of your architecture is important for two reasons: You will guide all subsequent technical architectural decisions.
You will be able to consciously design a particular use of technology even though you may not get a repeat that needs the technology for several months.

Understanding your business requirements sufficiently will affect the type of products you acquire for your architecture.
The design and development of your architecture ensures that your warehouse is

not a random event, but rather a "well-thought-out", carefully constructed ad opera of art like a mosaic of mixed technology.

Design the information content

All initial design must focus on and identify the core components of BI that will be needed by the general environment now and in the future.
Knowing the Business Requirements is important.

Even before all formal planning begins, the project planner can often identify a component or two right away.
The balance of components that may be needed for your architecture, however, cannot be found easily. During the design phase, the main part of the architecture ties the application development session (JAD) onto a search to identify business requirements.

Sometimes these requirements can be entrusted to query and reporting tools.
For example, users state that if they want to currently automate a report they must manually generate by integrating two current reports and adding the calculations derived from the combination of the data collected.
While this requirement is simple, it defines some functionality of the feature that you must include when purchasing reporting tools for your organization.

The designer must also pursue the additional requirements to obtain a complete image. Do users want to subscribe to this report?
Are the subsets of the report generated and emailed to the various users? Want to see this report on the company portal? All of these requirements are part of the simple need to replace a manual report as required by users. The benefit of these types of requirements is that everyone, users and designers, have an understanding of the concept of reports.

There are other types of businesses, however, that we need to plan. When business requirements are stated in the form of strategic business questions, it is easy for the experienced designer to discern measure / fact and dimensional requirements.

If JAD users do not know how to declare their requirements in the form of a business problem, the designer will often provide examples to jump-start the requirements collection session.
The expert designer can help users understand not only strategic trade, but also how to train it.
The requirements gathering approach is discussed in chapter 3; for now we just want to point out the need to design for all kinds of BI requirements.

A strategic business problem is not only a business requirement, but also a design clue. If you have to answer a multidimensional question, then you have to memorize, present the data collected dimensions, and if you need to store i data collected multidimensional, you have to decide what kind of technology or technique you are going to employ.

Do you implement a reserved cube star scheme, or both? As you can see, even a simple business problem can significantly affect the design. But these types of business requirements are commonplace and of course, at least by experienced project planners and designers.

There has been sufficient debate about OLAP's technologies and support, and a wide range of solutions are available. So far we have mentioned the need to combine simple reporting with dimensional business requirements, and how these requirements influence technical architectural decisions.

But what are the requirements that aren't readily understood by users or the DW team? Will you ever need spatial analysis?
The mining models of data collected will they be a necessary part of your future? Who knows?

It is important to note that these types of technologies are not well known by the general user communities and DW team members, in part this could be because they are typically handled by some internal or third party technical experts. It is an extreme case of the problems that these types of technologies generate. If users cannot describe business requirements or frame them in a way that provides guidelines to designers, they can go unnoticed or, worse, simply ignored.

It becomes more problematic when the designer and developer cannot recognize the application of one of these advanced but critical technologies.
As we have often heard the designers say, “well, why don't we put it aside until we get this? “Are they really interested in priorities, or do they just avoid requirements they don't understand? It is most likely the last hypothesis. Let's say your sales team has communicated a business requirement, as stated in Figure 1.3, as you can see, the requirement is framed in the form of a business problem. The difference between this problem and the typical dimensional problem is the distance. In this case, the sales group wants to know, on a monthly basis, the total sales from products, warehouses and clients who live within 5 miles of the warehouse where they shop.

Sadly, designers or architects can simply ignore the spatial component by saying, "We have the customer, the product and the data collected of the deposit. Let's keep the distance out until another iteration.

"Wrong answer. This type of business problem is entirely about BI. It represents a deeper understanding of our business and a robust analytics space for our analysts. BI is beyond simple query or standard reporting, or even OLAP. This is not to say that these technologies are not important to your BI, but alone they do not represent the BI environment.

Design for the information context (Design for Information Content)

Now that we have identified the Business requirements that distinguish various fundamental components, they must be included in a general architectural design. Some of the components of BI are part of our initial efforts, while some will not be implemented for several months.

However, all known requirements are reflected in the design so that when we need to implement a particular technology, we are prepared to do so. Something of the design will reflect traditional thinking.

This set of data collected is used to support later uses of data collected dimensions guided by the Business issues we have identified. As additional documents are generated, such as the design development of the data collected, we will begin to formalize how i data collected they spread in the environment. We have ascertained the need to represent the data collected dimensionally, dividing them (according to specific specific needs) in data marts.

The next question to answer is: how will these data marts be built?
Do you build the stars to support the cubes, or just cubes, or just the stars? (or right cubes, or right stars). Generate architecture for dependent data marts that require an atomic layer for all data collected acquired? Allow independent data marts to acquire data collected directly from operating systems?

What cube technology will you try to standardize?

You have massive quantities of gods data collected required for dimensional analysis or do you need cubes from your national sales force on a weekly basis or both? Do you build a powerful object like DB2 OLAP Server for finance or Cognos PowerPlay cubes for your sales organization or both? These are the big architectural design decisions that will impact your BI environment from here on out. Yes, you have identified a need for OLAP. Now how are you going to do that kind of technique and technology?

How do some of the more advanced technologies affect your designs? Let's assume you have ascertained a spatial need in your organization. Now you have to recall the architectural drawing editions even if you don't plan to make spatial components for several months. The architect must design today based on what is needed. Predict the need for spatial analysis that generates, stores, makes and provides access to data collected spatial. This in turn should serve as a constraint regarding the type of software technology and platform specifications you can currently consider. For example, the administration system of data base relational (RDBMS) that you make for your atomic layer must have a robust spatial extension available. This would ensure maximum performance when using geometry and spatial objects in your analytic applications. If your RDBMS can't handle the data collected (spatial-centric) internally, so you will have to establish a data base (spatial-centric) external. This complicates the management of editions and impairs your overall performance, not to mention the additional problems generated for your DBAs, as they probably have a minimal understanding of the basics of data collected space as well. On the other hand, if your RDMBS engine handles all the spatial components and its optimizer is aware of the special needs (e.g., indexing) of the spatial objects, then your DBAs can handle the management of the issues readily and you can maximize the performance.

Also, you need to adjust the staging area (scene area) and atomic environment layer to include address cleanup (a

key element to spatial analysis), as well as the subsequent saving of spatial objects. The succession of design editions continues now that we have introduced the notion of address cleanliness. For one thing, this application will dictate the type of software needed for your ETL effort.

Do you need products like Trillium to provide you with a clean address, or an ETL provider of your choice to provide that functionality?
For now it is important that you appreciate the level of the design that must be completed before you begin building your environment (warehouse). The above examples should demonstrate the multitude of design decisions that must follow the identification of any particular business requirement. When done correctly, these design decisions promote interdependence between the physical structures of your environment, the selection of technology used, and the flow of information content propagation. Without this conventional BI architecture, your organization will be subject to a chaotic mix of existing technologies, at best, loosely combined to provide apparent stability.

Maintain information content

Bringing the value of information to your organization is a very difficult task. Without sufficient understanding and experience, or proper planning and drawing, even the best teams would fail. On the other hand, if you have great intuition and detailed planning but no discipline to execute, you have just wasted your money and time because your endeavor is doomed to fail. The message should be clear: If you are lacking one or more of these skills, understanding / experience or design / design or implementation discipline, it will cripple or destroy the building of the BI organization.

Is your team prepared enough? Is there anyone on your BI team who understands the vast analytical landscape available in BI environments, the techniques and technologies required to implement that landscape? Is there anyone on your team who can recognize the application difference between advanced

static reporting and OLAP, or the differences between ROLAP and OLAP? Does one of your team members clearly recognize how mining and how it might impact the warehouse or how the warehouse can support mining performance? A team member understands the value of data collected space or agent-based technology? Do you have someone who appreciates the unique application of ETL tools versus message broker technology? If you don't have it, get one. BI is much larger than a normalized atomic layer, OLAP, star patterns, and an ODS.

Having the understanding and experience to recognize BI requirements and their solutions is essential to your ability to properly formalize user needs and design and implement their solutions. If your community of users has difficulty describing the requirements, it is up to the warehouse team to provide that understanding. But if the warehouse team

does not recognize the specific application of BI - for example, data mining - then it is not the best thing that BI environments are often limited to being passive repositories. However, ignoring these technologies does not diminish their importance and the effect they have on the emergence of your organization 's business intelligence capabilities, as well as the information set - up you plan to promote.

Design must include the notion of design, and both require a competent individual. In addition, designing requires a team werehouse philosophy and observance of standards. For example, if your company has established a standard platform or identified a particular RDBMS that they want to standardize across the platform, it is incumbent on everyone on the team to adhere to those standards. Generally a team exposes the need for normalization (to user communites), but the team itself is unwilling to adhere to the standards established in other areas in the company or perhaps even in similar companies. Not only is this hyporcritic, but it ascertains the firm is unable to exploit existing resources and investments. It does not mean that there are no situations that guarantee a non-standardized platform or technology; however, the efforts of the warehouse

they should jealously protect the firm's standards until business requirements dictate otherwise.

The third key component needed to build a BI organization is discipline.
It depends in total, equally on individuals and the environment. Project planners, sponsors, architects, and users must appreciate the discipline necessary to build the information structure of the company. Designers must direct their project efforts in such a way as to complement other necessary efforts in society.

For example, suppose your company builds an ERP application that has a warehouse component.
Therefore it is the responsibility of the ERP designers to collaborate with the warehouse environment team so as not to compete or duplicate the work already started.

Discipline is also a subject that must be occupied by the entire organization and is usually established and entrusted to an executive level.
Are executives willing to adhere to a designed approach? An approach that promises to create information content that will ultimately bring value to all areas of the business, but perhaps compromise individuals or departmental agendas? Remember the saying "Thinking about everything is more important than thinking about one thing." This saying is true for BI organizations.

Unfortunately, many warehouses focus their efforts trying to target and bring value to a particular department or specific users, with little regard to the organization in general. Suppose the executive requests assistance from the werehouse team. The team responds with 90-day work that includes not only delivering executive-defined notification requirements but ensuring that all data collected base are mixed at the atomic level before being introduced into the proposed cube technology.
This engineering addition ensures that the werehouse firm will benefit from the data collected necessary for the manager.
However, the executive spoke to external consultancies who proposed a similar application with delivery in less than 4 weeks.

Assuming that the internal werehouse team is competent, the executive has a choice. Those who can support the additional engineering discipline required to nurture the enterprise information asset or may choose to implement their own solution quickly. The latter seems to be chosen far too often and only serves to create containers of information that only a few or the single benefit from.

Short and long term goals

Architects and project designers must formalize a long-term view of general architecture and plans to grow in a BI organization. This combination of short-term gain and long-term planning represent the two sides of BI endeavors. Short-term gain is the BI facet that is associated with iterations of your warehouse.

This is where designers, architects and sponsors focus on meeting specific commercial requirements. It is at this level where physical structures are built, technology is purchased and techniques are implemented. They are by no means made to address specific requirements as defined by particular user communities. Everything is done in order to address specific requirements defined by a particular community.
Long-range planning, however, is the other facet of BI. This is where the plans and projects ensured that any physical structure was built, the technologies selected and the techniques implemented made with an eye towards the enterprise. It is the long-term planning that provides the cohesion needed to ensure that the business benefits derive from any short-term gains found.

Justify your BI effort

Un data warehouse on its own it has no inherent value. In other words, there is no inherent value between warehouse technologies and implementation techniques.

The value of any warehouse effort is found in the actions performed as a result of the warehouse environment and information content grown over time. This is a critical point to understand before you ever attempt to estimate the value of any wherehouse initiative.

Too often, architects and designers attempt to apply value to physical and technical warehouse components when value is founded on business processes that are positively impacted by the warehouse and well-acquired information.

Here lies the challenge to found BI: How do you justify the investment? If the wherehouse itself does not have intrinsic value, project designers must investigate, define and formalize the benefits achieved by those individuals who will use the warehouse to improve specific business processes or the value of protected information or both.

To complicate matters, any business process affected by inventory efforts could provide "considerable" or "slight" benefits. Considerable benefits provide a tangible metric for measuring return on investment (ROI) - for example, turning inventory over for an additional time during a specific period or for a lower cost of transportation per shipment. Small benefits, such as improved access to information, are more difficult to define in terms of tangible value.

Connect your project to know the Business requests

Too often, project designers attempt to link warehouse value with the firm's amorphous goals. By declaring that "the value of a warehouse is based on our ability to satisfy strategic requests" we open the discussion in a pleasant way. But that alone isn't enough to determine whether investing in inventory makes sense. It's best to link warehouse repetitions with specific and known business inquiries.

Measuring the ROI

Calculating ROI in a warehouse setup can be particularly difficult. It is especially difficult if the advantage

principal of a particular repetition is something intangible or easy to measure. One study found that users perceive the two main benefits of BI initiatives:

  • ▪ Create the ability to make decisions
  • ▪ Create access to information
    These perks are soft (or mild) perks. It's easy to see how we can calculate an ROI based on a harder (or greater) advantage like reducing the cost of transportation, but how do we measure the ability to make better decisions?
    This is certainly a challenge for project designers when they are trying to get the company to invest in a particular warehouse effort. Rising sales or falling costs are no longer the central themes driving the BI environment.
    Instead, you are looking for better access to information in business inquiries so that a particular department can make faster decisions. These are strategic drivers who happen to be equally important to the business but are more ambiguous and more difficult to characterize in a tangible metric. In this case, calculating ROI can be deceiving, if not irrelevant.
    Project designers must be able to demonstrate tangible value for executives to decide whether the investment in a particular repetition is worthwhile. However, we will not propose a new method for calculating ROI, nor will we make any arguments for or against it.
    There are many articles and books available that discuss the fundamentals of calculating ROI. There are special value propositions such as value on investment (YOU), offered by groups like Gartner, that you can research. Instead, we'll focus on core aspects of any ROI or other value propositions you need to consider. Applying ROI In addition to the argument about the "hard" benefits versus the "light" benefits associated with BI efforts there are other issues to consider when applying ROI. For instance:

Attribute too many savings to the efforts of the DW that would come anyway
Let's say your company moved from a mainframe architecture to a distributed UNIX environment. So any savings that may (or may not) be realized by that effort should not be attributed solely, if at all (?), To the warehouse.

Not accounting for everything costs. And there are many things to take into account. Consider the following list:

  • ▪ Cost of start-up, including feasibility.
  • ▪ Cost of dedicated hardware with related storage and communications
  • ▪ Cost of software, including management of data collected and client / server extensions, ETL software, DSS technologies, visualization tools, programming and workflow applications, and monitoring software,.
  • ▪ Structure design cost data collected, with the realization, and the optimization of
  • ▪ Software development cost directly associated with the BI effort
  • ▪ Cost of home support, including performance optimization, including software version control and help operations Apply “Big-Bang” ROI. Building the warehouse as a single gigantic effort is doomed to fail, so too does it calculate the ROI for a large enterprise initiative The offer is surprising, and that designers keep making feeble attempts to estimate the value of the entire effort . Why do planners try to put a monetary value on the business initiative if it is widely known and accepted that estimating specific repetitions is difficult? How is it possible? It is not possible with a few exceptions. Don't do it. Now that we've established what not to do when calculating ROI, here are a few points that will help us define a reliable process for estimating the value of your BI efforts.

Obtaining ROI Consent. Regardless of your choice of technique for estimating the value of your BI efforts, it must be agreed upon by all parties, including project planners, sponsors, and business executives.

Reduce ROI to identifiable parts. A necessary step towards reasonable ROI calculation is to focus that calculation on a specific project. This then allows you to estimate a value based on specific business requirements being met

Define the costs. As mentioned, numerous costs have to be considered. Furthermore, the costs must include not only those associated with the single iteration but also the costs associated with ensuring compliance with enterprise standards.

Define benefits. By clearly linking ROI to specific business requirements, we should be able to identify the benefits that will lead to meeting the requirements.

Reduce costs and benefits in upcoming earnings. It is the best way to base your valuations on net present value (NPV) as opposed to trying to predict future value in future earnings.

Keep your ROI split time to a minimum. It is well documented in the long run it has been used in your ROI.

Use more than one ROI formula. There are numerous methods for predicting ROI and you should plan whether to use one or more, including net present value, internal rate of return (IRR), and payback.

Define the repeatable process. This is crucial in calculating any long-term value. A single repeatable process should be documented for all subsequent project sequences.

The problems listed are the most common ones defined by werehouse experts. Management's insistence on delivering a “Big-Bang” ROI is very confusing. If you start all your ROI calculations by breaking them down into identifiable and tangible parts, you have a good chance at estimating an accurate ROI assessment.

Questions about the benefits of ROI

Whatever your benefits, soft or hard, you can use some fundamental questions to determine their value. For example, using a simple scale system, from 1 to 10, you can measure the impact of any effort using the following questions:

  • How would you rate the understanding of data collected as a result of this project of your company?
  • How would you estimate the process improvements following this project?
  • How would you measure the impact of new insights and inferences now made available by this iteration
  • What has been the impact of new, high-performance computer environments as a result of what was learned? If the answers to these questions are few, it is possible that the company is not worth the investment. Questions with a high score point to significant gains in value and should serve as guides for further investigation. For example, a high process improvement score should lead designers to examine how processes have been improved. You may find that some or all of the gains made are tangible and therefore a monetary value can be readily applied. Getting the most out of the first iteration of the warehouse The biggest result of your enterprise effort is often in the first few iterations. These early efforts traditionally establish the most useful information content for the public and establish aid to the technology foundation for subsequent BI applications. Usually each subsequent subsequence of data collected Warehouse projects bring less and less additional value to the business as a whole. This is especially true if the iteration does not add new topics or does not meet the needs of a new user community.

This feature of storing also applies to growing stacks of data collected historians. As subsequent efforts require more data collected and how more data collected are poured into the warehouse over time, the most of data collected it becomes less relevant to the analysis used. These data collected they are often called data collected dormant and it is always expensive to keep them because they are hardly ever used.

What does this mean for project sponsors? Essentially, early sponsors share more than the investment costs. This is primary because they are the impetus to found the wide technological environment and resource layer of the warehouse, including organic.

But these first steps carry the highest value and therefore project designers often have to justify the investment.
Projects made after your BI initiative may have lower costs (compared to the first one) and direct, but bring less value to the business.

And organization owners must start considering throwing the build-up data collected and less relevant technologies.

Data Mining: Mining Data

Numerous architectural components require variations in data mining technologies and techniques—
for example, the different "agents" for the examination of the points of interest of the clients, the company's operating systems and for the same dw. These agents can be advanced neural networks trained in POT trends, such as future product demand based on sales promotions; rules-based engines to react to a set date circumstances, for example, medical diagnosis and treatment recommendations; or even simple agents with the role of reporting exceptions to top executives. Generally these extraction processes data collected si

verify in real time; therefore, they must be united completely with the movement of data collected themselves.

Online Analytic Processing Processing

Online Analytics

The ability to slice, chop, roll, drill down, and analyze
what-if, is within the scope, of the IBM technology suite's goal. For example, online analytic processing (OLAP) functions exist for DB2 which brings dimensional analysis into the engine of the same.

Functions add dimensional utility to SQL while taking full advantage of being a natural part of DB2. Another example of OLAP integration is the extraction tool, DB2 OLAP Server Analyzer. This technology allows DB2 OLAP Server cubes to be quickly and automatically scanned to locate and report on data values data collected unusual or unexpected throughout the cube to the trade analyst. And finally, the DW Center features provide a means for architects to control, among other things, the profile of a DB2 OLAP server cube as a natural part of ETL processes.

Spatial Analysis Spatial Analysis

Space represents half of the analytical anchors (conduction) necessary for a panorama
broad analytic (time represents the other half). The atomic-level of the warehouse, represented in Figure 1.1, includes the fundamentals of both time and space. Time records anchor analysis by time and address information anchor analysis by space. Timestamps conduct the analysis by time, and the address information conducts the analysis by space. The diagram shows geocoding - process of converting addresses to points in a map or points in space so that concepts such as distance and interior / exterior can be used in the analysis - conducted at the atomic level and the spatial analysis that is made available to the analyst. IBM provides spatial extensions, developed with the Environmental System Research Institute (ESRI), al DB2 so that spatial objects can be stored as a normal part of the relational. DB2

Spatial Extenders, they also provide all SQL extensions to take advantage of spatial analysis. For example, the SQL extensions to query on the
distance between addresses or if a point is inside or outside a defined polygonal area, are an analytical standard with the Spatial Extender. See chapter 16 for more information.

Database-Resident Tools Tools Database-Resident

DB2 has many SQL BI-resident features that assist in the analysis action. These include:

  • Recursion functions to perform analysis, such as “find all possible flight paths from San Francisco a New York".
  • The analytic functions for ranking, cumulative functions, cube and rollup to facilitate the tasks that normally occur only with OLAP technology, are now a natural part of the engine of the
  • The ability to create tables that contain results
    Sellers of leaders mix more of the BI capabilities in the same.
    The main suppliers of data base are blending more of BI capabilities into the same.
    This provides the best performance and most execution options for BI solutions.
    The features and functions of DB2 V8 are discussed in detail in the following chapters:
    Technical Architecture and Data Management Foundations (Chapter 5)
  • DB2 BI Fundamentals (Chapter 6)
  • DB2 Materialized Query Tables (Chapter 7)
  • DB2 OLAP Functions (Chapter 13)
  • DB2 Enhanced BI Features and Functions (Chapter 15) Simplified Data Delivery System Delivery system of data collected Simplified

The architecture depicted in Figure 1.1 includes numerous structures data collected physical. One is the warehouse of data collected operating. Generally, an ODS is a subject oriented, integrated and current. You would build an ODS to support, for example, the sales office. ODS sales would integrate data collected coming from numerous different systems but would only keep, for example, today's transactions. The ODS can also be updated several times a day. At the same time, the processes push the data collected integrated into other applications. This structure is specifically designed to integrate data collected current and dynamic and would be a likely candidate to undergo real-time analytics, such as providing service agents clients a customer's current sales information by extracting sales trend information from the inventory itself. Another structure shown in Figure 1.1 is a formal state for the dw. Not only is this the place to perform the necessary integration, of the quality of data collected, and the transformation of data collected of incoming warehouse, but it is also a reliable and temporary storage area for data collected replicates that could be used in real-time analysis. If you decide to use an ODS or staging area, one of the best tools to populate these structures data collected using different operating sources is the heterogeneous distributed query of DB2. This capability is delivered by the optional DB2 feature called DB2 Relational Connect (query only) and through DB2 DataJoiner (a separate product that delivers the application, insert, update, and delete capability to heterogeneous distributed RDBMSs).

This technology allows architects to data collected to tie data collected of production with analytical processes. Not only can the technology adapt to virtually any of the replication demands that might arise with real-time analytics, it can also link to a wide variety of bases of data collected most popular, including DB2, Oracle, Sybase, SQL Server, Informix, and others. DB2 DataJoiner can be used to populate a structure data collected formal as an ODS or even a permanent table represented in the warehouse designed for quick recovery of instant updates or for sale. Of course, these same structures data collected can be populated using

another major technology designed for replication of data collected, IBM DataPropagator Relational. (DataPropagator is a separate product for central systems. DB2 UNIX, Linux, Windows, and OS / 2 include replication services data collected as a standard feature).
Another method for moving the data collected operating around the enterprise is an enterprise application integrator otherwise known as a message broker. This unique technology allows unparalleled control to target and move data collected around the company. IBM has the most widely used message broker, MQSeries, or a variation of the product that includes the requirements of E-commerce, IBM WebSphere MQ.
For more discussion on how to leverage MQ to support a warehouse and BI environment, visit website of the book. For now, suffice it to say that this technology is an excellent means to capture and transform (using MQSeries Integrator) data collected targeted operators recruited for BI solutions. MQ technology has been integrated and packaged into UDB V8, which means that message queues can now be managed as if they were DB2 tables. The concept of welding queued messages and the universe of relational leads to a powerful delivery environment data collected.

Zero-Latency Zero latency

The ultimate strategic goal for IBM is zero-latency analysis. As defined by
Gartner, a BI system must be able to infer, assimilate and provide information for analysts on demand. The challenge, of course, lies in how to mix data collected current and real-time with necessary historical information, such as i data collected related pattern / trend, or extracted understanding, such as customer profiling.

Such information includes, for example, the identification of clients high or low risk or which products i clients they will most likely purchase if they already have cheese in their shopping carts.

Obtaining zero latency is effectively dependent on two fundamental mechanisms:

  • Complete union of data collected which are analyzed with the established techniques and with the tools created by BI
  • A delivery system of data collected efficient to ensure that real-time analytics are truly available These zero latency prerequisites are no different from the two goals set by IBM and described above. The close coupling of the data collected is part of IBM's seamless integration program. And create a delivery system of data collected efficient is completely dependent on the available technology which simplifies the delivery process data collected. Consequently, two of IBM's three goals are critical to achieving the third. IBM is consciously developing its technology to ensure that zero latency is a reality for warehouse efforts. Summary / Synthesis The BI organization provides a road map for realizing your environment
    iteratively. It must be adjusted to reflect the needs of your business, both now and in the future. Without a broad architectural vision, warehouse repeats are little more than random central warehouse implementations that do little to create a large, informative enterprise. The first hurdle for project managers is how to justify the investments needed to develop the BI organization. While ROI calculation has remained a primary prop for inventory accomplishments, it is becoming more difficult to predict exactly. This has led to other methods of determining if you are getting your money's worth. The value on investment2 (YOU), for example, is proclaimed as a solution. It is looming over the architects of data collected and on project planners deliberately generate and provide information to user associations and not simply give a service on data collected. There is a huge difference between the two. Information is something that makes a difference in decision making and effectiveness; relatively, i data collected they are building blocks for deriving that information.

Although critical of the source data collected to drive business inquiries, the BI environment should serve a larger role in creating information content. We need to take the extra steps to clean, integrate, transform or otherwise create information content that users can take action, and then we need to make sure those actions and decisions, where reasonable, are reflected in the BI environment. If we relegate the warehouse to serve only on data collected, it is assured that the user associations will create the content of the information necessary to act. This ensures that their community will be able to make better decisions, but the business suffers from the lack of knowledge they have used. Date that architects and project planners initiate specific projects in the BI environment, they remain accountable to the enterprise as a whole. A simple example of this two-sided feature of BI iterations is found in the source data collected. All the data collected received for specific commercial requests must be populated in the first atomic layer. This guarantees the development of the corporate information asset, as well as managing, addressing the specific user requests defined in the iteration.

What is a Data Warehouse?

Data warehouse has been the heart of information systems architecture since 1990 and supports information processes by offering a solid integrated platform data collected historians taken as a basis for subsequent analyzes. THE data warehouse they offer ease of integration into a world of incompatible application systems. Data warehouse it has evolved to become a fashion. Data warehouse organizes and stores i data collected necessary for information and analytical processes on the basis of a long historical time perspective. All this involves a considerable and constant commitment in the construction and maintenance of the data warehouse.

So what is a data warehouse? A data warehouse is:

  • ▪ oriented to subjects
  • ▪ integrated system
  • ▪ variant time
  • ▪ non-volatile (does not cancel)

a collection of data collected used to support managerial decisions in the implementation of processes.
I data collected inserted in data warehouse in most cases they derive from operational environments. The data warehouse it is realized by a storage unit, physically separated from the rest of the system, which it contains data collected previously transformed by applications that operate on information deriving from the operating environment.

The literal definition of a data warehouse deserves an in-depth explanation as there are important underlying reasons and meanings that describe the characteristics of a warehouse.

SUBJECT ORIENTATION ORIENTATION THEMATIC

The first feature of a data warehouse is that it is oriented to the major players of a company. The judging of the trials through i data collected it is in contrast to the more classic method which foresees the orientation of applications towards processes and functions, a method shared by most of the older management systems.

The operating world is designed around applications and functions such as loans, savings, bankcards and trust for a financial institution. The world of dw is organized around main subjects such as the customer, the seller, the product and the business. Alignment around topics affects the design and implementation of data collected found in the dw. More importantly, the main argument affects the most important part of the key structure.

The world of the application is influenced by both the design of the database and the process design. The world of dw is focused exclusively on modeling of data collected and on the drawing of . The process design (in its classic form) is not part of the dw environment.

The differences between the choice of process / function application and choice by subject are also revealed as differences in the content of data collected on a detailed level. THE data collected del dw do not include i data collected which will not be used for the DSS process while applications

operational oriented to data collected contain i data collected to immediately meet functional / processing requirements which may or may not have any use for the DSS analyst.
Another important way in which data collected differ from data collected of dw is in the reports of data collected. THE data collected Operators maintain a continuous relationship between two or more tables based on a business rule that is active. THE data collected of dw span a spectrum of time and the ratios found in dw are many. Many trading rules (and correspondingly, many data collected ) are represented in the warehouse of data collected between two or more tables.

(For a detailed explanation of how the relationships between the data collected are managed in the DW, we refer to the Tech Topic on that matter.)
From no other perspective than that of the fundamental difference between a functional / process application choice and a subject choice, is there a greater difference between operating systems and data collected and the DW.

INTEGRATION INTEGRATION

The most important aspect of dw's environment is that i data collected found within the dw are easily integrated. ALWAYS. WITHOUT EXCEPTIONS. The very essence of the dw environment is that i data collected contained within the limits of the warehouse are integrated.

Integration reveals itself in many different ways - in the identified conventions consistent, in the extent of consistent variables, in the consistent coded structures, in the physical attributes of data collected consistent, and so on.

Over the years, designers of different applications have made many decisions about how an application should be developed. The style and individualized design decisions of designers' applications reveal themselves in a hundred ways: in the differences in coding, key structure, physical characteristics, identification conventions, and so on. The collective ability of many application designers to create inconsistent applications is legendary. Figure 3 exposes some of the most important differences in the ways applications are designed.

Encoding: Encode:

Application designers have chosen field coding - gender - in several ways. A designer represents sex as an "m" and "f". Another designer represents sex as a "1" and a "0". Another designer represents sex as an "x" and "y". Another designer represents sex as "male" and "female". It doesn't really matter how sex comes in the DW. The "M" and "F" are probably as good as the whole representation.

What matters is that whatever origin the sex field derives from, that field arrives in the DW in a consistent integrated state. Consequently when the field is loaded into the DW from an application where it has been represented out in the format "M" and "F", the data collected must be converted to the DW format.

Measurement of Attributes: Measurement of Attributes:

Application designers have chosen to measure the pipeline in a variety of ways over the years. A designer stores i data collected of the pipeline in centimeters. Another application designer stores i data collected of the pipeline in terms of inches. Another application designer stores i data collected of the pipeline in million cubic feet per second. And another designer stores pipeline information in terms of yards. Whatever the source, when pipeline information arrives in the DW it must be measured in the same way.

According to the indications in Figure 3, integration issues affect almost every aspect of the project - the physical characteristics of the data collected, the dilemma of having more than one source of data collected, the issue of inconsistent identified samples, formats of data collected inconsistent, and so on.

Whatever the design argument, the result is the same - i data collected they must be stored in the DW in a singular and globally acceptable manner even when the underlying operating systems store i differently data collected.

When the DSS analyst looks at the DW, the analyst's goal should be the exploitation of data collected who are in the warehouse,

rather than wondering about the credibility or consistency of data collected.

TIME VARIANCY

Our data collected in the DW they are accurate at some point in time. This basic feature of the data collected in DW it is very different from data collected found in the operating environment. THE data collected of the operating environment are as precise as at the time of access. In other words, in the operating environment when a drive is accessed data collected, it is expected that it will reflect values ​​as accurate as at the time of login. Why i data collected in the DW are precise as at some point in time (ie, not “right now”), i data collected found in the DW are "time variancy".
The time variancy of data collected DW is referred to in numerous ways.
The simplest way is that i data collected of a DW represent data collected over a long time horizon - five to ten years. The time horizon represented for the operating environment is much shorter than today's current values ​​from up to sixty ninety
Applications that must function well and must be available for transaction processing must carry the minimum amount of data collected if they admit any degree of flexibility. So operational applications have a short time horizon, like an audio application design topic.
The second way that 'time variancy' appears in the DW is in the key structure. Each key structure in the DW contains, implicitly or explicitly, a time element, such as day, week, month, etc. The time element is almost always at the bottom of the chained key found in the DW. On these occasions, the time element will implicitly exist, such as the case where an entire file is duplicated at the end of the month or quarter.
The third way time variancy is displayed is that i data collected of the DW, as soon as they are correctly registered, cannot be updated. THE data collected of the DW are, for all practical purposes, a long series of snapshots (snapshots). Of course, if the snapshots were taken incorrectly, then the snapshots can be edited. But assuming the snapshots are done correctly, they are not changed as soon as they are made. In some

cases it can be immoral or even invalid that the snapshots in the DW are modified. THE data collected operational, being precise as in the moment of access, they can be updated as the need arises.

NOT VOLATILE

The fourth important feature of DW is that it is non-volatile.
Updates, insertions, deletions and changes are made regularly for operational environments on a record-by-record basis. But the basic manipulation of the data collected occurring in the DW is much simpler. There are only two kinds of operations that occur in the DW - the initial loading of the data collected and access to data collected. There is no update of the data collected (in the general sense of updating) in the DW as a normal processing operation. There are some very powerful consequences of this basic difference between operational processing and DW processing. At the design level, the need to be cautious about abnormal updating is not a factor in the DW, since updating the data collected it is not carried out. This means that at the physical level of design, liberties can be taken to optimize access to data collected, in particular in dealing with the issues of normalization and physical denormalization. Another consequence of the simplicity of DW operations is in the underlying technology used to run the DW environment. Having to support online record-by-record updates (as is often the case with operational processing) requires the technology to have a very complex foundation underneath an apparent simplicity.
The technology that supports backup and recovery copies, transactions and integrity of data collected and the detection and remedy of deadlock condition is quite complex and not necessary for DW processing. The characteristics of a DW, design orientation, integration of data collected within the DW, time variancy and the simplicity of managing data collected, all of which leads to an environment that is very, very different from the classic operating environment. The source of almost all data collected of DW are the operating environment. It is tempting to think that there is a massive redundancy of data collected between the two environments.
In fact, the first impression that many people have is that of great redundancy of data collected between the operating environment and the

DW. Such an interpretation is superficial and demonstrates a lack of understanding what happens in the DW.
In fact there is a minimum of redundancy of data collected between the operating environment and i data collected of the DW. Let us consider the following: data collected are filtered date switching from the operating environment to the DW environment. Lot of data collected they never go out of the operating environment. Except that i data collected that are required for DSS processing find their direction in the environment

▪ the time horizon of the data collected it is very different from one environment to another. THE data collected in the operating environment they are very fresh. THE data collected in DW they are much older. From the perspective of the time horizon alone, there is very little overlap between the operating environment and the DW.

▪ The DW contains data collected summary that are never found in the environment

▪ I data collected undergo a fundamental transformation as they move on to Figure 3 illustrates that most of the data collected they are significantly modified as long as they are selected and moved to the DW. Put another way, most of the data collected it is changed physically and radically as it is moved in the DW. From the point of view of integration they are not the same data collected residing in the operating environment. In light of these factors, the redundancy of data collected between the two environments is a rare event, leading to less than 1% redundancy between the two environments. THE STRUCTURE OF THE WAREHOUSE DWs have a distinct structure. There are various summary and detail levels that demarcate the DWs.
The various components of a DW are:

  • Metadata
  • Data current details
  • Data of old detail
  • Data slightly summarized
  • Data highly summarized

By far the main concern is for i data collected of current details. It is the main concern because:

  • I data collected current details reflect the most recent events, which are always of great interest and
  • i data collected current details are bulky because it is stored at the lowest level of granularity e
  • i data collected current details are almost always stored in disk memory, which is fast to access, but expensive and complex by I. data collected older details are data collected which are stored on some memory of mass. It is accessed sporadically and stored at a level of detail compatible with data collected current details. While it is not mandatory to store on an alternate storage medium, due to the large volume of data collected united with the sporadic access of the data collected, the memory medium for data collected older detail is usually not stored on disk. THE data collected summarized lightly are data collected which are distilled from the low level of detail found to the current level of detail. This level of the DW is almost always stored in disk memory. The problems of the design that are presented to the architect of the data collected in the construction of this level of the DW are:
  • Which unit of time is the summarization done above
  • Which content, attributes will slightly summarize the content of data collected The next level of data collected found in the DW is that of data collected highly summarized. THE data collected highly summarized are compact and easily accessible. THE data collected highly summarized are sometimes found in the DW environment and in other cases i data collected highly summarized are found outside the immediate walls of the technology hosting the DW. (in any case, i data collected highly summarized are part of the DW regardless of where i data collected are physically housed). The final component of the DW is that of metadata. In many ways metadata sits in a different dimension than others data collected of the DW, because the metadata does not contain any date directly taken from the operating environment. Metadata plays a special and very important role in DW. Metadata are used as:
  • a directory to help the DSS analyst locate the contents of the DW,
  • a guide to mapping the data collected of how i data collected have been transformed from the operating environment to the DW environment,
  • a guide to the algorithms used for summarization between data collected current details and i data collected slightly summarized, i data collected highly summarized, metadata plays a much more important role in the DW environment than it ever had in the operational environment OLD DETAIL STORAGE MEDIUM Magnetic tape can be used to store that type of data collected. In fact, there is a wide variety of storage media that should be considered for the preservation of old ones data collected of detail. Depending on the volume of the data collected, the frequency of access, the cost of the tools and the type of access, it is entirely possible that other tools will need the old level of detail in the DW. FLOW OF DATA There is a normal and predictable flow of gods data collected within the DW.
    I data collected enter the DW from the operating environment. (NOTE: There are some very interesting exceptions to this rule. However, almost all data collected enter the DW from the operating environment). Date that data collected enter the DW from the operating environment, it is transformed as described above. Provided you enter the DW, i data collected enter the current level of detail, as shown. It resides there and is used until one of the three events occurs:

summarization uses the detail of data collected to calculate i data collected slightly summarized and highly summarized levels of data collected. There are some exceptions to the flow shown (to be discussed later). However, usually, for the vast majority of data collected found within a DW, the flow of data collected is as pictured.

USING THE DATAWAREHOUSE

Not surprisingly the various levels of data collected within the DW they do not receive different levels of use. As a rule, the higher the level of summarization, the more the data collected they are used.
Many uses occur in data collected highly summarized, while the old data collected details are almost never used. There is good reason for moving the organization to the resource use paradigm. More summarized i data collected, the faster and more efficient it is to get to the data collected. If a shop finds that it does many processes at the detail level of the DW, then a corresponding large amount of machine resources is consumed. It is in everyone's best interests to process as soon as possible in a high level of summarization.

For many stores, the DSS analyst in a pre-environment DW used data collected at the level of detail. In many respects the arrival a data collected detailed looks like a security blanket, even when other levels of summarization are available. One of the activities of the architect of data collected is to unaccustom the DSS user from constant use of data collected at the lowest level of detail. There are two reasons available to the architect of data collected:

  • by installing a chargeback system, where the end user pays for the resources consumed e
  • which indicate that very good response time can be obtained when the behavior with i data collected it is at a high level of summarization, while the poor response time comes from the behavior of the data collected at a low level of OTHER CONSIDERATIONS There are some other DW construction and management considerations.
    The first consideration is that of indices. THE data collected at higher levels of summarization they can be freely indexed, while i data collected

at the lower levels of detail they are so voluminous that it can be indexed frugally. From the same token, i data collected at high levels of detail they can be relatively easily restructured, while the volume of data collected at the lower levels it is so large that i data collected they cannot be easily restructured. Consequently, the model of the data collected and the formal work done by the design lay the foundation for the DW applied almost exclusively to the current level of detail. In other words, the modeling activities of data collected they do not apply to summarization levels in almost every case. Another structural consideration is that of the subdivision of data collected by DW.

Partitioning can be done at two levels - at the level of dbms and at the application level. In the division on the level dbms, dbms he is informed of the divisions and monitors them accordingly. In the case of an application-level division, only the programmer is informed of the divisions and the responsibility for their administration is left to him.

Below the level dbms, a lot of work is done automatically. There is a lot of inflexibility associated with the automatic administration of divisions. In the case of the division at the application level of the data collected of the data warehouse, a lot of work weighs on the programmer, but the end result is flexibility in the administration of the data collected in data warehouse

OTHER ANOMALIES

While the components of the data warehouse work as described for almost all data collected, there are some useful exceptions that need to be discussed. An exception is that of data collected public summary data. These are data collected summaries that were calculated out of the data warehouse but they are used by society. THE data collected public summaries are stored and managed in the data warehouse, although as mentioned above they are figured out. Accountants work to produce such quarterly data collected such as income, quarterly expenses, quarterly profit, and so on. The work done by the accountants is external to the data warehouse. However, i data collected are used “internally” within the company – from marketing, sales, etc. Another anomaly, which will not be discussed, is that of data collected external.

Another great kind of data collected which can be found in a data warehouse is that of permanent detail data. These result in the need to permanently store i data collected at a detailed level for ethical or legal reasons. If a company is exposing its workers to hazardous substances there is a need to data collected detailed and permanent. If a company manufactures a product that involves public safety, such as parts of an airplane, there is a need for data collected permanent details, as well as whether a company enters into dangerous contracts.

The company cannot afford to overlook the details because over the next few years, in the event of a lawsuit, a recall, a disputed construction defect, etc. the company's exposure could be great. Consequently there is a unique type of data collected known as permanent detail data.

SUMMARY

Un data warehouse is an object oriented, integrated, time variant, a collection of data collected non-volatile in support of the decision-making needs of the administration. Each of the salient functions of a data warehouse has its implications. In addition there are four levels of data collected of the data warehouse:

  • Old details
  • Current detail
  • Data slightly summarized
  • Data highly summarized Metadata is also an important part of the data warehouse. ABSTRACT The concept of the storage of data collected it has recently received a lot of attention and has become a trend of the 90's. This is due to the ability of a data warehouse to overcome the limitations of management support systems such as decision aid systems (DSS) and executive information systems (EIS). Although the concept of the data warehouse looks promising, implementing i data warehouse can be problematic due to large-scale storage processes. Despite the complexity of the data collected, many suppliers and consultants who stock data collected claim that the storage of data collected current does not involve problems. However, at the beginning of this research project, almost no independent, rigorous and systematic research had been carried out. Consequently, it is difficult to say what actually happens in the industry when they are built data warehouse. This study explored the storage practice of data collected contemporaries aiming to develop a richer understanding of Australian practice. Analysis of the literature provided the context and foundation for the empirical study. There are a number of results from this search. First, this study revealed the activities that occurred during the development of the data warehouse. In many areas, i data collected gathered confirmed the practice reported in the literature. Second, the issues and problems that may impact the development of the data warehouse were identified from this study. Finally, benefits derived from Australian organizations associated with the use of data warehouse have been revealed.

Chapter 1

Research context

The concept of data warehousing received widespread exposure and developed into an emerging trend in the 90s (McFadden 1996, TDWI 1996, Shah and Milstein 1997, Shanks et al. 1997, Eckerson 1998, Adelman and Oates 2000). This can be seen from the growing number of articles on data warehousing in commercial publications (Little and Gibson 1999). Many articles (see, for example, Fisher 1995, Hackathorn 1995, Morris 1995a, Bramblett and re 1996, Graham et al. 1996, Sakaguchi and Frolick 1996, Alvarez 1997, Brousell 1997, Clarke 1997, McCarthy 1997, O'Donnell 1997, Edwards 1998, TDWI 1999) reported significant benefits from organizations implementing i data warehouse. They backed up their theory with anecdotal evidence of successful implementations, high return on investment figures (ROI) and, also, providing the guidelines or methodologies for developing the data warehouse

(Shanks et al. 1997, Seddon and Benjamin 1998, little and Gibson 1999). In an extreme case, Graham et al. (1996) reported an average return on a three-year investment of 401%.

Much of the current literature, however, has overlooked the complexities involved in undertaking such projects. The projects of data warehouse they are normally complex and large-scale and therefore imply a high probability of failure if not carefully controlled (Shah and Milstein 1997, Eckerson 1997, Foley 1997b, Zimmer 1997, Bort 1998, Gibbs and Clymer 1998, Rao 1998). They require the vast amounts of both human and financial resources and, time and effort to build them (Hill 1998, Crofts 1998). The typical time and financial means required are approximately two years and two or three million dollars, respectively (Braly 1995, Foley 1997b, Bort 1998, Humphries et al. 1999). This time and financial means are required to control and consolidate many different aspects of data warehousing (Cafasso 1995, Hill 1998). Beside the hardware and software considerations, other functions, which vary from the extraction of data collected to the loading processes of data collected, from memory capacity to manage updates and from meta data collected for user training, they must be considered.

At the time of this research project, there was very little academic research conducted in the field of data warehousing, especially in Australia. This was evident from the shortage of articles published on data warehousing from newspapers or other academic writings of the time. Many of the available academic writings described the US experience. The lack of academic research in the data warehousing area has caused the demand for rigorous research and empirical studies (McFadden 1996, Shanks et al. 1997, Little and Gibson 1999). In particular, research studies on the implementation process of data warehouse need to be done to extend general knowledge about the implementation of data warehouse and will serve as the basis for a future research study (Shanks et al. 1997, Little and Gibson 1999).

The purpose of this study, therefore, is to investigate what actually happens when organizations implement and use i data warehouse in Australia. Specifically, this study will involve an analysis of an entire development process of a data warehouse, starting with initiation and planning through design and implementation and subsequent use within Australian organizations. In addition, the study will also contribute to current practice by identifying areas where practice can be further improved and inefficiencies and risks can be minimized or avoided. In addition, it will serve as the basis for other studies on data warehouse in Australia and will fill the gap that currently exists in the literature.

Research questions

The goal of this research is to study the activities involved in the implementation of data warehouse and their use by Australian organizations. In particular, the elements concerning project planning, development, operation, use and the risks involved are studied. So the question of this research is:

“How is the current practice of data warehouse in Australia?"

To effectively answer this question, a number of subsidiary research questions are required. In particular, three sub-questions have been identified from the literature, which is presented in chapter 2, to guide this research project: How are the data warehouse from Australian organizations? What are the problems encountered?

What are the benefits experienced?
In answering these questions, an exploratory research design employing inquiry was used. As an exploratory study, the answers to the above questions are not complete (Shanks et al. 1993, Denscombe 1998). In this case, triangulation is required to improve the answers to these questions. However, the investigation will provide a solid foundation for future work examining these questions. A detailed discussion on the justification of the research method and design is presented in chapter 3.

Structure of the research project

This research project is divided into two parts: the contextual study of the concept of datawarehousing and the empirical research (see figure 1.1), each of which is discussed below.

Part I: Contextual study

The first part of the research consisted in reviewing the current literature on various types of data warehousing including decision aid systems (DSS), executive information systems (EIS), case studies of data warehouse and the concepts of data warehouse. Furthermore, the results of the foum sui data warehouse and meeting groups for experts and practitioners led by the Monash DSS research group, contributed to this phase of the study which was intended to obtain information on the practice of data warehouse and to identify the risks involved in their adoption. During this contextual study period, the understanding of the problem area was established to provide the background knowledge for subsequent empirical investigations. However, this was an ongoing process while conducting the research study.

Part II: Empirical Research

The relatively new concept of data warehousing, especially in Australia, has created the need for an investigation to get a broad picture of the user experience. This part was done once the domain of the problem was established through extensive literature review. The concept of data-warehousing formed during the contextual study phase was used as input for the initial questionnaire of this study. After this, the questionnaire was reviewed. You are experts of data warehouse took part in the test. The purpose of the initial questionnaire test was to check the completeness and accuracy of the questions. Based on the test results, the questionnaire was modified and the modified version was sent to the survey participants. The questionnaires returned were then analyzed for i data collected in tables, diagrams and other formats. THE

analysis results of data collected form a snapshot of the practice of data warehousing in Australia.

DATA WAREHOUSING OVERVIEW

The concept of data warehousing has evolved with the improvements in computer technology.
It is aimed at overcoming the problems faced by application support groups such as Decision Support System (DSS) and Executive Information System (EIS).

In the past the biggest obstacle of these applications has been the inability of these applications to provide a data base necessary for analysis.
This is mainly caused by the nature of the management's work. The interests of a company's management vary constantly depending on the area covered. Therefore i data collected fundamental for these applications must be able to change rapidly depending on the part to be treated.
This means that i data collected they must be available in the appropriate form for the required analyzes. In fact, the application support groups found many difficulties in the past to collect and integrate data collected from complex and diverse sources.

The remainder of this section presents an overview of the concept of data warehousing and discusses how the data warehouse can overcome the problems of application support groups.
The term "Data WarehouseWas released by William Inmon in 1990. His often cited definition is the Data Warehouse as a collection of data collected subject-oriented, integrated, non-volatile, and variable over time, to support management decisions.

Using this definition Inmon points out that i data collected residing in a data warehouse they must have the following 4 characteristics:

  • ▪ Oriented to the subject
  • ▪ Integrated
  • ▪ Non volatile
  • ▪ Variable over time By Subject Oriented Inmon means that i data collected in data warehouse in the largest organizational areas that have been

defined in the model data collected. For example all data collected concerning i clients are contained in the subject area CLIENTS. Likewise all data collected relating to the products are contained in the PRODUCTS subject area.

By Integrati Inmon means that i data collected from different platforms, systems and locations are combined and stored in one place. Consequentially data collected similar formats must be transformed into consistent formats in order to be easily added and compared.
For example, the male and female gender are represented by the letters M and F in one system, and with 1 and 0 in another. To integrate them properly, one or both formats must be transformed so that the two formats are the same. In this case we could change M to 1 and F to 0 or vice versa. Subject-oriented and Integrated indicate that the data warehouse is designed to provide a functional and transversal vision of data collected by the company.

Non-volatile means that i data collected in data warehouse remain consistent and update the data collected it is not necessary. Instead, any change in the data collected originals is added to the of the data warehouse. This means that the historian of the data collected is contained in the data warehouse.

For Variables Over Time Inmon indicates that i data collected in data warehouse always contain time markers and i data collected they normally cross a certain time horizon. For example a
data warehouse can contain 5 years of historical values ​​of the clients from 1993 to 1997. The availability of the historian and a time series of data collected allows you to analyze trends.

Un data warehouse he can collect his own data collected from OLTP systems; from origins data collected external to the organization and / or other special capture system projects data collected.
I data collected extracts can go through a cleaning process, in this case i data collected they are transformed and integrated before being stored in the of the data warehouse. Then, i data collected

residing inside the of the data warehouse are made available to end-user access and recovery tools. Using these tools, the end user can access the integrated view of the organization of the data collected.

I data collected residing inside the of the data warehouse they are stored both in detail and in summary formats.
The level of summary may depend on the nature of the data collected. THE data collected detailed may consist of data collected current and data collected historical
I data collected real are not included in the data warehouse until i data collected in data warehouse are refreshed.
In addition to storing i data collected themselves, a data warehouse it can also store a different type of date called METADATA describing i data collected residing in his .
There are two types of metadata: development metadata and analysis metadata.
The development metadata is used to manage and automate the processes of extraction, cleaning, mapping and loading of data collected in data warehouse.
The information contained in the development metadata can contain details of operating systems, details of the elements to be extracted, the model data collected of the data warehouse and business rules for the conversion of data collected.

The second type of metadata, known as parsing metadata, enables the end user to explore the content of the data warehouse to find the data collected available and their meaning in clear and non-technical terms.

Therefore analytics metadata functions as a bridge between the data warehouse and end-user applications. This metadata can contain the business model, descriptions of data collected corresponding to the business model, pre-defined queries and reports, information for user accesses and the index.

Analysis and development metadata must be combined into a single integrated containment metadata to function properly.

Unfortunately many of the existing tools have their own metadata and currently there are no existing standards that

allow data warehousing tools to integrate this metadata. To remedy this situation, many traders of the main data warehousing tools have formed the Meta Data Council which later became the Meta Data Coalition.

The purpose of this coalition is to build a standard metadata set that allows different data warehousing tools to convert the metadata.
Their efforts resulted in the birth of the Meta Data Interchange Specification (MDIS) which will allow the exchange of information between Microsoft archives and related MDIS files.

The existence of data collected both summarized / indexed and detailed gives the user the possibility to perform a DRILL DROWN from data collected indexed to detailed ones and vice versa. The existence of data collected detailed histories allows the realization of trend analysis over time. In addition, the parsing metadata can be used as the directory of the of the data warehouse to help end users locate i data collected necessary.

In comparison to OLTP systems, with their ability to support analytics of data collected and reporting, the data warehouse it is seen as a more appropriate system for information processes such as making and answering queries and producing reports. The next section will highlight the differences of the two systems in detail.

DATA WAREHOUSE AGAINST OLTP SYSTEMS

Many of the information systems within organizations are meant to support day-to-day operations. These systems, known as OLTP SYSTEMS, capture continuously updated daily transactions.

I data collected within these systems they are often modified, added or deleted. For example, a customer's address changes as he moves from one place to another. In this case the new address will be registered by modifying the address field of the . The main objective of these systems is to reduce transaction costs and at the same time reduce processing times. Examples of OLTP Systems include critical actions such as order books, payrolls, invoices, manufacturing, AI services clients.

Unlike OLTP systems, which were built for transaction and event-based processes, i data warehouse were created to support the analysis-based processes of data collected and on decision-making processes.

This is normally achieved by integrating i data collected from various OLTP and external systems in a single "container" of data collected, as discussed in the previous section.

Monash Data Warehousing Process Model

The process model for data warehouse Monash was developed by researchers from the Monash DSS Research Group, is based on the literatures of data warehouse, on experience in supporting the development of systems fields, on discussions with application vendors for use on data warehouse, on a group of experts in the use of data warehouse.

The phases are: Initiation, Planning, Development, Operations and Explanations. The diagram explains the iterative or evolutionary nature of the development of a data warehouse process using two-way arrows placed between the different phases. In this "iterative" and "evolutionary" context they mean that, at each step of the process, the implementation activities can always propagate backwards towards the previous phase. This is due to the nature of a design data warehouse in which additional requests from the end user take over at any time. For example, during the development phase of a data warehouse, a new dimension or subject area is requested by the end user, which was not part of the original plan, this must be added to the system. This causes a change in the design. The result is that the design team has to change the requirements of the documents created so far during the design phase. In many cases, the current state of the project must go back to the design stage where the new request is to be added and documented. The end user must be able to see the specific documentation revised and the changes that have been made in the development phase. At the end of this development cycle the project must get excellent feedback from both the development and user teams. The feedback is then reused to improve a future project.

Capacity planning
DWs tend to be very large in size and grow very fast (Best 1995, Rudin 1997a) due to the amount of data collected historians that they retain from their lifetime. Growth can also be caused by data collected add-ons requested by users to increase the value of data collected that they already have. Consequently, the storage requirements for data collected can be significantly enhanced (Eckerson 1997). Thus, it is essential to ensure, by conducting capacity planning, that the system to be built can grow with the growth of needs (Best 1995, LaPlante 1996, Lang 1997, Eckerson 1997, Rudin 1997a, Foley 1997a).
In planning for dw scalability, one must know the expected growth in warehouse size, the types of questions likely to be asked, and the number of end users supported (Best 1995, Rudin 1997b, Foley 1997a). Building scalable applications requires a combination of scalable server technologies and scalable application design techniques (Best 1995, Rudin 1997b. Both are required in building a highly scalable application. Scalable server technologies can make it easy and cost effective to add storage, memory and CPU without degrading performance (Lang 1997, Telephony 1997).

There are two main scalable server technologies: symmetric multiple processing (SMP) and massively parallel processing (MPP)) (IDC 1997, Humphries et al. 1999). An SMP server typically has multiple processors that share a memory, system bus, and other resources (IDC 1997, Humphries et al. 1999). Additional processors can be added to increase its power computational. Another method of increasing the power computational of the SMP server, is to combine numerous SMP machines. This technique is known as clustering (Humphries et al. 1999). An MPP server, on the other hand, has multiple processors each with its own memory, bus system and other resources (IDC 1997, Humphries et al. 1999). Each processor is called a node. An increase in the power computational can be achieved

adding additional nodes to MPP servers (Humphries et al. 1999).

A weakness of SMP servers is that too many input-output (I / O) operations can congest the bus system (IDC 1997). This problem does not occur within MPP servers as each processor has its own bus system. However, the interconnections between each node are generally much slower than the SMP bus system. Additionally, MPP servers can add an extra level of complexity to application developers (IDC 1997). Thus, the choice between SMP and MPP servers can be influenced by many factors, including the complexity of the applications, the price / performance ratio, the processing capacity required, the applications dw prevented and the increase in size of the of dw and in the number of end users.

Numerous scalable application design techniques can be employed in capacity planning. One uses various notification periods such as days, weeks, months and years. Having various notification periods, the it can be divided into handily grouped pieces (Inmon et al. 1997). Another technique is to use summary tables which are built by summarizing data collected da data collected detailed. Thus, i data collected summaries are more compact than detailed, which requires less memory space. So the data collected details can be stored on a cheaper storage drive, which saves even more storage. While using summary tables can save memory space, they require a lot of effort to keep them up to date and in line with commercial needs. However, this technique is widely used and often used in conjunction with the previous technique (Best 1995, Inmon 1996a, Chauduri and Dayal
1997).

defining Data Warehouse Technical Architectures Definition of dw architecture techniques

Initial adopters of data warehousing mainly conceived a centralized implementation of dw in which all data collected, including i data collected external, were integrated into a single,
physical repository (Inmon 1996a, Bresnahan 1996, Peacock 1998).

The main advantage of this approach is that end users are able to access the enterprise-wide view of data collected organizational (Ovum 1998). Another advantage is that it offers standardization of data collected through the organization, which means that there is only one version or definition for each terminology used in the dw reposity metadata (Flanagan and Safdie 1997, Ovum 1998). The disadvantage of this approach, on the other hand, is that it is expensive and difficult to construct (Flanagan and Safdie 1997, Ovum 1998, Inmon et al. 1998). Not long after the storage architecture data collected centralized became popular, the concept of extracting the smallest subsets of gods evolved data collected to support the needs of specific applications (Varney 1996, IDC 1997, Berson and Smith 1997, peacock 1998). These small systems are derived from the larger data warehouse centralized. They are named data warehouse departmental employees or dependent data marts. The dependent data mart architecture is known as three-tiered architecture in which the first row consists of the data warehouse centralized, the second consists of the deposits of data collected departmental and the third consists of access to data collected and analysis tools (Demarest 1994, Inmon et al. 1997).

Data marts are normally built after the data warehouse centralized was built to meet the needs of specific units (White 1995, Varney 1996).
Data marts store i data collected relevant to particular units (Inmon et al. 1997, Inmon et al. 1998, IA 1998).

The advantage of this method is that there will be none date not integrated and that i data collected will be less redundant within data marts since all data collected come from a deposit of data collected integrated. Another advantage is that there will be few links between each data mart and its sources data collected because each data mart has only one source of data collected. Plus with this architecture in place, end users can still access the overview of data collected

business organization. This method is known as the top-down method, in which the data marts are built after the data warehouse (peacock 1998, Goff 1998).
Increasing the need to show results early, some organizations have begun to build independent data marts (Flanagan and Safdie 1997, White 2000). In this case, the data marts take theirs data collected directly from the basics of data collected OLTP and not centralized and integrated repository, thus eliminating the need for central repository on site.

Each data mart requires at least one link to its sources of data collected. A disadvantage of having multiple connections for each data mart is that, compared to the two previous architectures, the overabundance of data collected increases significantly.

Each data mart must store all data collected required locally to have no effect on OLTP systems. This causes the data collected they are stored in different data marts (Inmon et al. 1997). Another disadvantage of this architecture is that it leads to the creation of complex interconnections between data marts and their sources of data collected which are difficult to carry out and control (Inmon et al. 1997).

Another disadvantage is that end users cannot access the corporate information overview since the data collected of the different data marts are not integrated (Ovum 1998).
Yet another disadvantage is that there may be more than one definition for each terminology used in the data marts which generates inconsistencies of data collected in the organization (Ovum 1998).
Despite the disadvantages discussed above, independent data marts still attract the interest of many organizations (IDC 1997). One factor that makes them attractive is that they are quicker to develop and require less time and resources (Bresnahan 1996, Berson and Smith 1997, Ovum 1998). Consequently, they mainly serve as proof-of-concept designs that can be used to quickly identify benefits and / or shortcomings in the design (Parsaye 1995, Braly 1995, Newing 1996). In this case, the part to be implemented in the pilot project must be small but important for the organization (Newing 1996, Mansell-Lewis 1996).

By examining the prototype, end users and the administration can decide whether to continue or stop the project (Flanagan and Safdie 1997).
If the decision is to continue, the data marts for other sectors should be built one at a time. There are two options for end users based on their needs in building independent data matrices: integrated / federated and unintegrated (Ovum 1998)

In the first method, any new data marts should be built based on the current data marts and the model data collected used by the firm (Varney 1996, Berson and Smith 1997, Peacock 1998). The need to use the model data collected of the firm makes sure that there is only one definition for each terminology used through data marts, also to make sure that different data marts can be merged to give an overview of corporate information (Bresnahan 1996). This method is called bottom-up and is best when there is a constraint on financial means and time (Flanagan and Safdie 1997, Ovum 1998, peacock 1998, Goff 1998). In the second method, the built data marts can only satisfy the needs of a specific unit. A variant of the federated data mart is the data warehouse distributed in which the hub server middleware is used to merge many data marts into a single repository of data collected distributed (White 1995). In this case, i data collected companies are distributed in several data marts. End user requests are forwarded to server hub middleware, which extracts all data collected required by the data marts and returns the results to the end user applications. This method provides business information to end users. However, the problems of independent data marts are not yet eliminated. There is another architecture that can be used which is called the data warehouse virtual (White 1995). However, this architecture, which is described in Figure 2.9, is not a storage architecture of data collected real since it does not shift the load from OLTP systems to data warehouse (Demarest 1994).

In fact, the requests for data collected end users are passed on to OLTP systems which return results after processing user requests. While this architecture allows end users to generate reports and make requests, it cannot provide the

data collected history and overview of company information since i data collected from the different OLTP systems are not integrated. Hence, this architecture cannot satisfy the analysis of data collected complex such as forecasts.

Selection of the access and recovery applications data collected

The purpose of building a data warehouse is to convey information to end users (Inmon et al 1997, Poe 1996, McFadden 1996, Shanks et al 1997, Hammergren 1998); one or more access and recovery applications data collected must be provided. To date, there is a wide variety of these applications that the user can choose from (Hammergren 1998, Humphries et al 1999). The selected applications determine the success of the storage effort data collected in an organization because applications are the most visible part of the data warehouse to the end user (Inmon et al 1997, Poe 1996). To be successful a data warehouse, must be able to support the analysis activities of data collected end user (Poe 1996, Seddon and Benjamin 1998, Eckerson 1999). Thus the "level" of what the end user wants must be identified (Poe 1996, Mattison 1996, Inmon et al 1997, Humphries et al 1999).

In general, end users can be grouped into three categories: executive users, business analysts and power users (Poe 1996, Humphries et al 1999). Executive users need easy access to predefined sets of reports (Humphries et al 1999). These reports can be easily reached by navigating the menus (Poe 1996). In addition, reports should present information using graphical representation like tables and patterns to convey information quickly (Humphries et al 1999). Business analysts, who may not have the technical capabilities to develop relationships from scratch on their own, need to be able to modify existing relationships to meet their specific needs (Poe 1996, Humphries et al. 1999). Power users, on the other hand, are the type of end users who have the ability to generate and write requests and reports from scratch (Poe 1996, Humphries et al. 1999). They are the ones who

develop reports for other types of users (Poe 1996, Humphries et al 1999).

Once the requirements of the end user have been determined, a selection of the access and retrieval applications must be made data collected among all those available (Poe 1996, Inmon et al 1997).
Access to data collected and retrieval tools can be classified into 4 types: OLAP tool, EIS / DSS tool, query and reporting tool and data mining tool.

OLAP tools allow users to create ad hoc queries as well as those made on the of the data warehouse. Additionally these products allow users to drill down from data collected general to detailed ones.

EIS / DSS tools provide executive reporting such as “what if” analysis and access to menu-based reports. Reports should be predefined and merged with menus for easier navigation.
The query and reporting tools allow users to produce predefined and specific reports.

Data mining tools are used to identify relationships that could shed new light on forgotten operations in data collected of the datawarehouse.

Alongside the optimization of the requirements of each type of user, the selected tools must be intuitive, efficient and easy to use. They also need to be compatible with other parts of the architecture and able to work with existing systems. It is also suggested to choose data access and retrieval tools with reasonable prices and performances. Other criteria to consider include the tool vendor's commitment to supporting their product and the developments it will have in future releases. To ensure user engagement in using the data warehouse, the development team involves users in the tool selection process. In this case a practical user evaluation should be carried out.

To enhance the value of the data warehouse, the development team can also provide web access to their data warehouse. A web-enabled data warehouse allows users to access data collected from remote places or while traveling. Also the information can

be provided at lower costs by lowering training costs.

2.4.3 Data Warehouse Operation Phase

This phase consists of three activities: definition of data refresh strategies, control of data warehouse activities and management of data warehouse security.

Definition of data refresh strategies

After the initial upload, i data collected in of the datawarehouse must be refreshed periodically to reproduce the changes made on data collected originals. It is therefore necessary to decide when to refresh, how often the refresh should be scheduled and how to refresh the data collected. It is suggested to refresh the data collected when the system can be taken offline. The refresh rate is determined by the development team based on user requirements. There are two approaches to refresh the data warehouse: the complete refresh and the continuous loading of changes.

The first approach, the full refresh, requires reloading all data collected from scratch. This means that all data collected required must be extracted, cleaned, transformed and integrated into each refresh. This approach should be avoided as much as possible as it requires a lot of time and resources.

An alternative approach is to continuously upload changes. This adds i data collected that have been changed since the last refresh cycle of the data warehouse. Identifying new or changed records significantly reduces the amount of data collected which must be propagated to the data warehouse in each update since only these data collected will be added to the of the datawarehouse.

There are at least 5 approaches that can be used to withdraw i data collected new or modified. To obtain an efficient refresh strategy of the data collected a mixture of these approaches that picks up all changes in the system may be useful.

The first approach, which uses timestamps, assumes that it is assigned to all data collected modified and updated a timestamp so you can easily identify all data collected modified and new. This approach, however, has not been widely used in most operating systems today.
The second approach is to use a delta file generated by an application that contains only the changes made to the data collected. Using this file also amplifies the update cycle. However, even this method has not been used in many applications.
The third approach is to scan a log file, which basically contains information similar to the delta file. The only difference is that a log file is created for the recovery process and can be difficult to understand.
The fourth approach is to modify the application code. However, most of the application code is old and fragile; therefore this technique should be avoided.
The last approach is to compare the data collected sources with the main file of data collected.

Control of the data warehouse activities

Once the data warehouse has been released to users, it needs to be monitored over time. In this case, the data warehouse administrator can employ one or more management and control tools to monitor the use of the data warehouse. In particular, information about people and the time they access the data warehouse can be collected. Come on data collected collected, a profile of the work performed can be created which can be used as input in the implementation of the user's chargeback. The Chargeback allows users to be informed about the cost of processing the data warehouse.

In addition, the data warehouse control can also be used to identify query types, their size, number of queries per day, query reaction times, sectors reached and the amount of data collected processed. Another purpose of checking the data warehouse is to identify the data collected which are not in use. These data collected they can be removed from the data warehouse to improve time

query execution response and control the growth of data collected that reside within the data base of the datawarehouse.

Security management of the data warehouse

A data warehouse contains data collected integrated, critical, sensitive that can be reached easily. For this reason it should be protected from unauthorized users. One way to implement security is to use the del function DBMS to assign different privileges to different types of users. In this way, an access profile must be maintained for each type of user. Another way to secure the data warehouse is to encrypt it as written in the data base of the datawarehouse. Access to data collected and the retrieval tools must decrypt the data collected before presenting the results to users.

2.4.4 Data Warehouse Deployment Phase

It is the last phase in the data warehouse implementation cycle. The activities to be carried out in this phase include training users to use the data warehouse and carrying out reviews of the data warehouse.

User training

User training should be done before accessing the data collected of the datawarehouse and the use of retrieval tools. Generally, sessions should begin with an introduction to the concept of storage of data collected, to the content of the data warehouse, to the meta data collected and the basic features of the tools. Then, more advanced users could also study the physical tables and features of the users of data access and retrieval tools.

There are many approaches to training users. One of these involves a selection of many users or analysts chosen by a set of users, based on their leadership and communication skills. They are trained in a personal capacity on everything they need to know to become familiar with the system. Once the training is finished, they return to their work and start teaching other users how to use the system. On the

based on what they have learned, other users can start exploring the data warehouse.
Another approach is to train many users at the same time, as if taking a classroom course. This method is suitable when there are many users who need to be trained at the same time. Yet another method is to train each user individually, one by one. This method is suitable when there are few users.

The purpose of user training is to familiarize themselves with accessing data collected and the retrieval tools as well as the contents of the data warehouse. However, some users may be overwhelmed by the amount of information provided during the training session. So a number of refresher sessions must be done on ongoing support and to answer specific questions. In some cases, a user group is formed to provide this type of support.

Gather feedback

Once the data warehouse has been rolled out, users can use i data collected residing in the data warehouse for various purposes. Mainly, analysts or users use the data collected in the data warehouse for:

  1. 1 Identify company trends
  2. 2 Analyze the purchasing profiles of clients
  3. 3 Divide i clients and of
  4. 4 Provide the best services to clients - customize services
  5. 5 Formulate strategies marketing
  6. 6 Make competitive estimates for cost analyzes and help control
  7. 7 Support strategic decision-making
  8. 8 Identify opportunities to emerge
  9. 9 Improve the quality of current business processes
  10. 10 Check the profit

Following the development direction of the data warehouse, a series of revisions to the system could be conducted to obtain feedback

from both the development team and the end user community.
The results obtained can be considered for the next development cycle.

Since the data warehouse takes an incremental approach, it is essential to learn from the successes and mistakes of previous developments.

2.5 Summary

Approaches found in the literature have been discussed in this chapter. Section 1 discussed the concept of data warehouse and its role in decision science. Section 2 describes the main differences between data warehouse and OLTP systems. In section 3 we discussed the Monash data warehouse model that was used in section 4 to describe the activities involved in the development process of a data warehouse, these theses were not based on rigorous research. What happens in reality can be very different from what the literature reports, however these results can be used to create a background that underlines the concept of data warehouse for this research.

Chapter 3

Research and design methods

This chapter deals with the research and design methods for this study. The first part shows a generic view of the search methods available for information retrieval, and the criteria for selecting the best method for a particular study are discussed. Section 2 then discusses two methods selected with the criteria set out above; one of these will be chosen and adopted with the reasons set out in section 3 where the reasons for the exclusion of the other criterion are also set out. Section 4 presents the research project and section 5 the conclusions.

3.1 Research in information systems

Research in information systems is not limited to just technology but must also be extended to include behavioral and organizational purposes.
We owe this to the theses of various disciplines ranging from the social to the natural sciences; this leads to the need for a certain spectrum of research methods involving quantitative and qualitative methods to be used for information systems.
All available research methods are important, in fact several researchers such as Jenkins (1985), Nunamaker et al. (1991), and Galliers (1992) argue that there is no universal specific method for conducting research in the various fields of information systems; in fact, a method may be suitable for a particular research but not for others. This leads us to the need to select a method that is suitable for our particular research project: for this choice Benbasat et al. (1987) state that the nature and purpose of research must be considered.

3.1.1 Nature of the research

The various nature-based methods of research can be classified into three widely known traditions in information science: positivist, interpretive, and critical research.

3.1.1.1 Positivist research

Positivist research is also known as scientific or empirical study. It seeks to: “explain and predict what will happen in the social world by looking at the regularities and cause-effect relationships between the elements that constitute it” (Shanks et al 1993).

Positivist research is also characterized by repeatability, simplifications and refutations. Furthermore, positivist research admits the existence of a priori relationships between the phenomena studied.
According to Galliers (1992) taxonomy is a research method included in the positivist paradigm, which however is not limited to this, in fact there are laboratory experiments, field experiments, case studies, proofs of theorems, predictions and simulations. Using these methods, the researchers admit that the phenomena studied can be observed objectively and rigorously.

3.1.1.2 Interpretative research

Interpretive research, which is often called phenomenology or anti-positivism, is described by Neuman (1994) as "the systematic analysis of the social meaning of action through the direct and detailed observation of people in natural situations, in order to arrive at understanding and the interpretation of how people create and maintain their social world ”. Interpretative studies reject the assumption that observed phenomena can be objectively observed. In fact they are based on subjective interpretations. Furthermore, interpretative researchers do not impose a priori meanings on the phenomena they study.

This method includes subjective / argumentative studies, research actions, descriptive / interpretative studies, future research and role playing. In addition to these investigations and case studies they can be included in this approach as they concern the studies of individuals or organizations within complex real world situations.

3.1.1.3 Critical research

Critical research is the least known approach in the social sciences but has recently received the attention of researchers in information systems. The philosophical assumption that social reality is historically produced and reproduced by people, as are social systems with their actions and interactions. Their ability, however, is mediated by a number of social, cultural and political considerations.

As well as interpretative research, critical research argues that positivist research has nothing to do with the social context and ignores its influence on human actions.
Critical research, on the other hand, criticizes interpretative research for being too subjective and because it does not set out to help people improve their lives. The biggest difference between critical research and the other two approaches is its evaluative dimension. While the objectivity of the positivist and interpretative traditions is to predict or explain the status quo or social reality, critical research aims to critically evaluate and transform the social reality under study.

Critical researchers usually oppose the status quo in order to remove social differences and improve social conditions. Critical research has a commitment to a processual view of the phenomena of interest and, therefore, is normally longitudinal. Examples of research methods are long-term historical studies and ethnographic studies. Critical research, however, has not been widely used in information systems research

3.1.2 Purpose of the research

Together with the nature of the research, its purpose can be used to guide the researcher in selecting a particular research method. The purpose of a research project is closely related to the position of the research with respect to the research cycle which consists of three phases: construction of the theory, testing of the theory and refinement of the theory. Thus, based on the moment compared to the research cycle, a research project can have an explanatory, descriptive, exploratory or predictive purpose.

3.1.2.1 Exploratory research

Exploratory research is aimed at investigating a totally new topic and formulating questions and hypotheses for future research. This type of research is used in the construction of the theory to obtain initial references in a new area. Normally, qualitative research methods are used, such as case studies or phenomenonological studies.

However, it is also possible to employ quantitative techniques such as exploratory investigations or experiments.

3.1.3.3 Descriptive research

Descriptive research is aimed at analyzing and describing in great detail a particular situation or organizational practice. This is appropriate for building theories and can also be used to confirm or contest hypotheses. Descriptive research usually involves the use of measurements and samples. The most suitable research methods include investigation and analysis of antecedents.

3.1.2.3 Explanatory research

Explanatory research tries to explain why things happen. It is built on facts that have already been studied and tries to find the reasons for these facts.
Therefore explanatory research is normally built on exploratory or descriptive research and is ancillary in order to test and refine the theories. Explanatory research typically employs case studies or inquiry-based research methods.

3.1.2.4 Preventive research

Preventive research aims to predict the events and behaviors under observation that are being studied (Marshall and Rossman 1995). Prediction is the standard scientific test of truth. This type of research generally employs investigation or analysis of data collected historians. (Yin 1989)

The above discussion demonstrates that there are a number of possible research methods that can be used in a particular study. However, there must be one specific method more suitable than the others for a particular type of research project. (Galliers 1987, Yin 1989, De Vaus 1991). Therefore, every researcher needs to carefully evaluate the strengths and weaknesses of various methods, in order to adopt the most suitable and compatible research method with the research project. (Jenkins 1985, Pervan and Klass 1992, Bonomia 1985, Yin 1989, Himilton and Ives 1992).

3.2. Possible research methods

The aim of this project was to study the experience in Australian organizations with i data collected stored with a development of data warehouse. Date that, currently, there is a lack of research in the data warehousing area in Australia, this research project is still in the theoretical phase of the research cycle and has an exploratory purpose. Exploring the experience in Australian organizations adopting data warehousing requires real society interpretation. Consequently, the philosophical assumption underlying the research project follows the traditional interpretation.

After a rigorous examination of the available methods, two possible research methods were identified: surveys and case studies, which can be used for exploratory research (Shanks et al. 1993). Galliers (1992) argues that the suitability of these two methods for this particular study in his revised taxonomy saying that they are suitable for theoretical construction. The following two subsections discuss each method in detail.

3.2.1 Method of investigation research

The survey research method comes from the ancient census method. A census is about collecting information from an entire population. This method is expensive and impractical, particularly if the population is large. Thus, compared to the census, a survey usually focuses on collecting information for a small number, or sample, of representatives of the population (Fowler 1988, Neuman 1994). A sample reflects the population from which it is drawn, with different levels of accuracy, according to the structure of the sample, the size and the selection method used (Fowler 1988, Babbie 1982, Neuman 1994).

The survey method is defined as "snapshots of practices, situations or views at a particular point in time, undertaken using questionnaires or interviews, from which inferences may be
made ”(Galliers 1992: 153) [snapshot of practices, situations or views at a particular time point, undertaken using questionnaires or interviews, from which inferences can be made]. Surveys are concerned with gathering information about certain aspects of the study from a number of participants by asking questions (Fowler 1988). These questionnaires and interviews, which include face-to-face telephone interviews and structured interviews, are also the collection techniques of data collected most common used in investigations (Blalock 1970, Nachmias and Nachmias 1976, Fowler 1988), observations and analyzes can be used (Gable 1994). Of all these methods of collection of the data collected, the use of the questionnaire is the most popular technique, as it ensures that i data collected

collected are structured and formatted, and therefore facilitates the classification of information (Hwang 1987, de Vaus 1991).

In analyzing the data collected, a survey strategy often employs quantitative techniques, such as statistical analysis, but qualitative techniques can also be employed (Galliers 1992, Pervan

and Klass 1992, Gable 1994). Normally, i data collected collected are used to analyze associations' distributions and patterns (Fowler 1988).

Although surveys are generally appropriate for research dealing with the question 'what?' (what) or deriving from it, such as 'how much' (how much) and 'quantum' (how many), they can be asked through the question 'why' (Sonquist and Dunkelberg 1977, Yin 1989). According to Sonquist and Dunkelberg (1977), the research investigation points to difficult hypotheses, evaluation programs, describing the population and developing models of human behavior. Furthermore, surveys can be used to study a certain opinion of the population, conditions, opinions, characteristics, expectations and even past or present behaviors (Neuman 1994).

The investigations allow the researcher to discover the relationships between the population and the results are usually more generic than other methods (Sonquist and Dunkelberg 1977, Gable 1994). The surveys allow researchers to cover a wider geographical area and reach a large number of respondents (Blalock 1970, Sonquist and Dunkelberg 1977, Hwang and Lin 1987, Gable 1994, Neuman 1994). Finally, surveys can provide information that is not available elsewhere or in the form required for analysis (Fowler 1988).

There are, however, some limitations in carrying out an investigation. A disadvantage is that the researcher cannot obtain much information about the object being studied. This is due to the fact that surveys are only performed at a particular time and, therefore, there is a limited number of variables and people that the researcher can

to study (Yin 1989, de Vaus 1991, Gable 1994, Denscombe 1998). Another drawback is that carrying out an investigation can be very costly in terms of time and resources, particularly if it involves face-to-face interviews (Fowler 1988).

3.2.2. Investigation Research Method

The inquiry research method involves in-depth study of a particular situation within its real context over a defined period of time, without any intervention by the researcher (Shanks & C. 1993, Eisenhardt 1989, Jenkins 1985). Mainly this method is used to describe the relationships between the variables that are being studied in a particular situation (Galliers 1992). The investigations can involve single or multiple cases, depending on the phenomenon analyzed (Franz and Robey 1987, Eisenhardt 1989, Yin 1989).

The inquiry research method is defined as "an empirical inquiry that studies a contemporary phenomenon within its real context, using multiple sources gathered from one or more entities such as people, groups, or organizations" (Yin 1989). There is no clear separation between the phenomenon and its context and there is no experimental control or manipulation of the variables (Yin 1989, Benbasat et al 1987).

There is a variety of techniques for the collection of the data collected which can be employed in the inquiry method, which include direct observations, file record reviews, questionnaires, documentation review and structured interviews. Having a diverse range of collection techniques data collected, the investigations allow researchers to deal with both data collected qualitative and quantitative at the same time (Bonoma 1985, Eisenhardt 1989, Yin 1989, Gable 1994). As is the case with the survey method, an investigative researcher acts as an observer or researcher and not as an active participant in the organization being studied.

Benbasat et al. (1987) assert that the inquiry method is particularly suitable for the construction of research theory, which begins with a research question and continues with training.

of a theory during the process of collecting data collected. Being also suitable for the stage

of theory construction, Franz and Robey (1987) suggest that the inquiry method can also be used for the complex theory stage. In this case, based on the evidence gathered, a given theory or hypothesis is verified or refuted. In addition, the inquiry is also suitable for research dealing with the questions 'how' or 'why' (Yin 1989).

Compared to other methods, surveys allow the researcher to capture essential information in more detail (Galliers 1992, Shanks et al. 1993). Furthermore, the investigations allow the researcher to understand the nature and complexity of the processes studied (Benbasat et al 1987).

There are four main disadvantages associated with the investigation method. The first is the lack of controlled deductions. The subjectivity of the researcher can alter the results and conclusions of the study (Yin 1989). The second drawback is the lack of controlled observation. Unlike experimental methods, the investigative researcher cannot control the phenomena studied since they are examined in their natural context (Gable 1994). The third drawback is the lack of replicability. This is due to the fact that the researcher is unlikely to observe the same events, and cannot verify the results of a particular study (Lee 1989). Finally, as a consequence of non-replicability, it is difficult to generalize the results obtained from one or more surveys (Galliers 1992, Shanks et al. 1993). All these problems, however, are not insurmountable and can, in fact, be minimized by the researcher by applying appropriate actions (Lee 1989).

3.3. Justify the research methodology adopted

From the two possible research methods for this study, the survey method is considered to be the most suitable. The investigation one was discarded following a careful consideration of the relative ones

merits and weaknesses. The convenience or inappropriateness of each method for this study is discussed below.

3.3.1. Inappropriate research method of inquiry

The method of inquiry requires in-depth study of a particular situation within one or more organizations over a period of time (Eisenhardt 1989). In this case, the period may exceed the time frame given for this study. Another reason for not adopting the inquiry method is that the results may suffer from lack of rigor (Yin 1989). The subjectivity of the researcher can influence the results and conclusions. Another reason is that this method is more suitable for research on questions of the 'how' or 'why' type (Yin 1989), while the research question for this study is of the 'what' type. Last but not least, it is difficult to generalize results from just one or a few surveys (Galliers 1992, Shanks et al 1993). Based on this rationale, the inquiry research method was not chosen as it was unsuitable for this study.

3.3.2. Convenience of the research method survey

When this research was conducted, the practice of data warehousing had not been widely adopted by Australian organizations. Thus, there was not much information regarding their implementation within Australian organizations. The information available came from organizations that had implemented or used a data warehouse. In this case, the investigation search method is the most suitable since it allows to obtain information that is not available elsewhere or in the form required for analysis (Fowler 1988). In addition, the inquiry research method allows the researcher to obtain a good insight into the practices, situations, or views at a given time (Galliers 1992, Denscombe 1998). An overview was required to increase knowledge about the Australian data warehousing experience.

Furthermore, Sonquist and Dunkelberg (1977) state that survey research results are more general than other methods.

3.4. Survey Research Design

The survey on the data warehousing practice was carried out in 1999. The target population consisted of Australian organizations interested in data warehousing studies, as they were probably already informed about the data collected that they store and, therefore, could provide useful information for this study. The target population was identified with an initial survey of all Australian members of 'The Data Warehousing Institute' (Tdwi- aap). This section discusses the design of the empirical research phase of this study.

3.4.1. Technique of collection of data collected

From the three techniques commonly used in survey research (i.e. mail questionnaire, telephone interview, and personal interview) (Nachmias 1976, Fowler 1988, de Vaus 1991), the mail questionnaire was adopted for this study. The first reason for adopting the latter is that it can reach a geographically dispersed population (Blalock 1970, Nachmias and Nachmias 1976, Hwang and Lin 1987, de Vaus 1991, Gable 1994). Secondly, the mailing questionnaire is suitable for highly educated participants (Fowler 1988). The mailing questionnaire for this study was addressed to data warehousing project sponsors, directors and / or project managers. Third, mailing questionnaires are suitable when a secure list of addresses is available (Salant and Dilman 1994). TDWI, in this case, a trusted data warehousing association provided the mailing list of its Australian members. Another advantage of the mail questionnaire over the telephone questionnaire or personal interviews is that it allows registrants to respond more accurately, particularly when registrants need to consult records or discuss questions with other people (Fowler 1988).

A potential drawback may be the time required to conduct questionnaires by post. Normally, a mail questionnaire is conducted in this sequence: mailing letters, waiting for replies, and sending confirmation (Fowler 1988, Bainbridge 1989). Hence, the total time may be longer than the time required for personal interviews or telephone interviews. However, the total time can be known in advance (Fowler 1988, Denscombe 1998). The time spent conducting personal interviews cannot be known in advance as it varies from interview to interview (Fowler 1988). Telephone interviews can be faster than mailing questionnaires and personal interviews but can have a high rate of non-response due to the unavailability of some people (Fowler 1988). In addition, telephone interviews are generally limited to relatively short lists of questions (Bainbridge 1989).

Another weakness of a mail questionnaire is the high rate of non-response (Fowler 1988, Bainbridge 1989, Neuman 1994). However, countermeasures have been taken by associating this study with a trusted institution in the field of data warehousing (i.e. TDWI) (Bainbridge 1989, Neuman 1994), which sends two reminder letters to non-responders (Fowler 1988, Neuman 1994) and also includes an additional letter explaining the purpose of the study (Neuman 1994).

3.4.2. Analysis unit

The aim of this study is to obtain information on the implementation of data warehousing and its use within Australian organizations. The target population consists of all Australian organizations that have implemented, or are implementing, i data warehouse. Subsequently, the individual organizations are registered. The questionnaire was mailed to organizations interested in adopting data warehouse. This method ensures that the information collected comes from the most suitable resources of each participating organization.

3.4.3. Survey sample

The survey respondents' mailing list was obtained from TDWI. From this list, 3000 Australian organizations were selected as the basis for sampling. An additional letter explaining the design and purpose of the survey, along with an answer sheet and a prepaid envelope to return the completed questionnaire were sent to the sample. Of the 3000 organizations, 198 agreed to participate in the study. Such a small number of responses was expected date the large number of Australian organizations that had then embraced or were embracing data warehousing strategy within their organizations. Thus, the target population for this study consists of only 198 organizations.

3.4.4. Contents of the questionnaire

The structure of the questionnaire was based on the Monash data warehousing model (discussed previously in part 2.3). The content of the questionnaire was based on the literature review presented in chapter 2. A copy of the questionnaire sent to the survey participants can be found in Appendix B. The questionnaire consists of six sections, which follow the steps of the model being treated . The following six paragraphs briefly summarize the content of each section.

Section A: Basic information about the organization
This section contains questions related to the profile of participating organizations. In addition, some of the questions are related to the condition of the participant's data warehousing project. Confidential information such as the organization name was not disclosed in the survey analysis.

Section B: Start
The questions in this section are related to the start-up business of data warehousing. Questions were asked regarding project initiators, sponsors, required skills and knowledge, the goals of data warehousing development, and end user expectations.

Section C: Design
This section contains questions related to the planning activities of the data warehouse. In particular, the questions were about the scope of execution, the duration of the project, the cost of the project and the cost / benefit analysis.

Section D: Development
In the development section there are questions related to the development activities of the data warehouse: collection of end-user requirements, sources of data collected, the logical model of data collected, prototypes, capacity planning, technical architectures and selection of data warehousing development tools.

Section E: Operation
Operation questions relating to the operation and extensibility of the data warehouse, as it evolves in the next stage of development. There quality, the refresh strategies of the data collected, the granularity of the data collected, scalability of the data warehouse and the security issues of the data warehouse were among the types of questions asked.

Section F: Development
This section contains questions related to using the data warehouse by end users. The researcher was interested in the purpose and usefulness of the data warehouse, the review and training strategies adopted and the control strategy of the data warehouse adopted.

3.4.5. Response rate

While mail inquiries are criticized for having a low response rate, steps have been taken to increase the rate of return (as discussed earlier in section 3.4.1). The term 'response rate' refers to the percentage of people in a particular survey sample who respond to the questionnaire (Denscombe 1998). The following formula was used to calculate the response rate for this study:

Number of people who answered
Response rate = ——————————————————————————– X 100 Total number of questionnaires sent

3.4.6. Pilot test

Before the questionnaire was sent to the sample, the questions were examined by conducting pilot tests, as suggested by Luck and Rubin (1987), Jackson (1988) and de Vaus (1991). The purpose of the pilot tests is to reveal all uncomfortable, ambiguous expressions and difficult to interpret questions, to clarify any definitions and terms used, and to identify the approximate time required to complete the questionnaire (Warwick and Lininger 1975, Jackson 1988, Salant and Dilman 1994). The pilot tests were carried out by selecting subjects with characteristics similar to those of the final subjects, as suggested by Davis e Cosenza (1993). In this study, six data warehousing professionals were selected as the pilot subjects. After each pilot test, the necessary corrections were made. From the pilot tests carried out, the participants contributed to remodel and reset the final version of the questionnaire.

3.4.7. Methods of Analysis Di Data

I data collected Surveys collected from closed question questionnaires were analyzed using a statistical program package called SPSS. Many of the responses were analyzed using descriptive statistics. A number of questionnaires returned incomplete. These have been treated more carefully to ensure that i data collected missing were not a consequence of data entry errors, but because the questions were not suitable for the registrant, or the registrant decided not to answer one or more specific questions. These missing answers were ignored during the analysis of data collected and have been coded as '- 9' to ascertain their exclusion from the analysis process.

In preparing the questionnaire, the closed questions were precoded by assigning a number to each option. The number was then used to prepare the data collected during the analysis (Denscombe 1998, Sapsford and Jupp 1996). For example, there were six options listed in question 1 of section B: board of directors, senior executive, IT department, business unit, consultants, and more. In the file of data collected of SPSS, a variable has been generated to indicate 'project initiator', with six value labels: '1' for 'board of directors', '2' for 'senior executive' and so on Street. The use of the Likertin scale in some of the closed questions also allowed for effortless identification given the use of the corresponding numerical values ​​entered in SPSS. For questions with non-exhaustive answers, which were not mutually exclusive, each option was treated as a single variable with two value labels: '1' for 'marked' and '2' for 'unmarked'.

Open questions were treated differently from closed questions. The answers to these questions have not been entered into SPSS. On the contrary, they were analyzed by hand. The use of this type of question allows to gain information about the freely expressed ideas and personal experiences of the registrants (Bainbridge 1989, Denscombe 1998). Where possible, a categorization of the responses was made.

For the analysis of data collected, methods of simple statistical analysis are used, such as the frequency of responses, the mean, the standard deviation and the median (Argyrous 1996, Denscombe 1998).
The Gamma test was effective in obtaining quantitative measures of the associations between data collected ordinals (Norusis 1983, Argyrous 1996). These tests were appropriate because the ordinal scales used did not have many categories and could be shown in a table (Norusis 1983).

3.5 Summary

In this chapter, the research methodology and design adopted for this study were discussed.

Selecting the most appropriate research method for a particular study takes in
consideration of a number of rules, including the nature and type of research, as well as the merits and weaknesses of each possible method (Jenkins 1985, Benbasat et al. 1097, Galliers and Land 1987, yin 1989, Hamilton and ives 1992, Galliers 1992, neuman 1994). Given the lack of knowledge and theory regarding data warehousing adoption in Australia, this research study requires an interpretative research method with an exploratory ability to explore the experiences of Australian organizations. The research method chosen was selected to gather information regarding the adoption of the concept of data ware-housing by Australian organizations. A postal questionnaire was chosen as the collection technique data collected. The justifications for the research method and the collection technique data collected selected will be provided in this chapter. In addition, a discussion was presented on the unit of analysis, the sample used, the percentages of responses, the content of the questionnaire, the pre-test of the questionnaire and the method of analysis of the data collected.

Design a Data Warehouse:

Combining Entity Relationship and Dimensional Modeling

ABSTRACT
Store i data collected it is a major current problem for many organizations. A key problem in the development of storage of data collected it is his design.
The design must support the detection of concepts in the data warehouse a legacy system and other sources of data collected and also an easy understanding and efficiency in the implementation of data warehouse.
Much of the storage literature data collected recommends the use of entity relationship modeling or dimensional modeling to represent the design of data warehouse.
In this journal we show how both representations can be combined in a design approach data warehouse. The approach used is systematically

examined in a case study and is identified in a number of important implications with professionals.

DATA WAREHOUSING

Un data warehouse it is usually defined as a "subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decisions" (Inmon and Hackathorn, 1994). Subject-oriented and integrated indicates that the data warehouse is designed to cross the functional boundaries of legaci systems to offer an integrated perspective of data collected.
Time-variant affects the historical or time-series nature of data collected in a data warehouse, which enables trends to be analyzed. Non-volatile indicates that the data warehouse it is not continuously updated as a of OLTP. Rather it is updated periodically, with data collected from internal and external sources. The data warehouse it is specifically designed for search rather than update integrity and operation performance.
The idea of ​​storing i data collected is not new, it has been one of the management purposes of data collected since the 1982s (Il Martin, XNUMX).
I data warehouse they offer the infrastructure data collected for management support systems. Management support systems include decision support systems (DSS) and executive information systems (EIS). A DSS is a computer-based information system that is designed to improve human decision making and process. An EIS is typically a delivery system of data collected which enables business executives to easily access the view of data collected.
The general architecture of a data warehouse highlights the role of data warehouse in management support. In addition to offering the infrastructure data collected for EIS and DSS, al data warehouse it can be accessed directly through queries. THE data collected included in a data warehouse are based on an analysis of management information requirements and are obtained from three sources: internal legacy systems, special purpose data capture systems and external data sources. THE data collected in internal legacy systems they are frequently redundant, inconsistent, low quality, and stored in different formats so they must be reconciled and cleaned before they can be loaded into the

data warehouse (Inmon, 1992; McFadden, 1996). THE data collected from storage systems data collected ad hoc and from sources data collected external are spent used to augment (update, replace) i data collected from legacy systems.

There are many compelling reasons to develop a data warehouse, which include better decision making through the effective use of more information (Ives 1995), support for a comprehensive business focus (Graham 1996), and reduction in provision costs of data collected for EIS and DSS (Graham 1996, McFadden 1996).

A recent empirical study found, on average, a return on investment for i data warehouse 401% after three years (Graham, 1996). However, the other empirical studies of data warehouse found significant problems including difficulty in measuring and assigning benefits, lack of a clear purpose, underestimating the purpose and complexity of the process of storing data collected, in particular with regard to the sources and cleanliness of data collected. Store i data collected can be considered as a solution to the management problem data collected between organizations. The manipulation of data collected as a social resource it has remained one of the key problems in managing information systems around the world for many years (Brancheau et al. 1996, Galliers et al. 1994, Niederman et al. 1990, Pervan 1993).

A popular approach to managing data collected in the eighties it was the development of a model data collected social. Model data collected social was designed to offer a stable basis for the development of new application systems e and the reconstruction and integration of legacy systems (Brancheau et al.

1989, Goodhue et al. 1988: 1992, Kim and Everest 1994). However, there are many problems with this approach, in particular, the complexity and cost of each task, and the long time required to have tangible results (Beynon-Davies 1994, Earl 1993, Goodhue et al. 1992, Periasamy 1994, Shanks 1997).

Il data warehouse it is a separate database that co-exists with legacy databases rather than replacing them. It therefore allows you to direct the management of data collected and avoid the costly rebuilding of legacy systems.

EXISTING APPROACHES TO DATE DRAWING

WAREHOUSE

The process of building and perfecting a data warehouse should be understood more as an evolutionary process rather than a development lifecycle of traditional systems (Desio, 1995, Shanks, O'Donnell and Arnott 1997a). There are many processes involved in a project data warehouse such as initialization, planning; information acquired from the requisites requested of company managers; sources, transformations, cleaning of data collected and synchronization from legacy systems and other sources of data collected; delivery systems under development; monitoring of data warehouse; and senselessness of the evolutionary and construction process of a data warehouse (Stinchi, O'Donnell and Arnott 1997b). In this journal, we focus on how to design the data collected stored in the context of these other processes. There are a number of approaches proposed for the architecture of data warehouse in literature (Inmon 1994, Ives 1995, Kimball 1994 McFadden 1996). Each of these methodologies has a brief review with an analysis of their strengths and weaknesses.

Inmon's (1994) Approach for Data Warehouse Design

Inmon (1994) proposed four iterative steps to design a data warehouse (see Figure 2). The first step is to design a model data collected social to understand how i data collected can be integrated across functional areas within an organization by subdividing i data collected store in areas. Model data collected it is made for storage data collected relating to decision making, including data collected historical, and included data collected deduced and aggregated. The second step is to identify subject areas for implementation. These are based on priorities determined by a particular organization. The third step involves drawing a for the subject area, it pays particular attention to including appropriate levels of granularity. Inmon recommends using the entity and relationship model. The fourth step is to identify source systems data collected required and develop transformation processes to acquire, clean and format i data collected.

The strengths of Inmon's approach are that the model data collected social provides the basis for the integration of data collected within the organization and planning of supports for the iterative development of data warehouse. Its drawbacks are the difficulty and cost of designing the model data collected social, the difficulty in understanding patterns of entities and relationships used in both models, that data collected social and that of data collected stored by subject area, and the appropriateness of data collected of the drawing of data warehouse for the realization of relational but not for multi-dimensional.

Ives' (1995) Approach to Data Warehouse Design

Ives (1995) proposes a four-step approach to designing an information system that he believes can be applied to the design of a data warehouse (see Figure 3). The approach is heavily based on Information Engineering for the development of information systems (Martin 1990). The first step is to determine goals, critical and success factors, and key performance indicators. Key business processes and necessary information are modeled to lead us to a model data collected social. The second step involves developing a defining architecture data collected stored by areas, di data warehouse, the technology components that are required, the organizational support set required to implement and operate with data warehouse. The third step includes selecting the required software packages and tools. The fourth step is the detailed design and construction of the data warehouse. Ives notes that to store data collected it is a constrained iterative process.

The strength of the Ives approach is the use of technical specifications to determine information requirements, the use of a structured process to support the integration of data warehouse, the appropriate selection of hardware and software, and the use of multiple representation techniques for the data warehouse. Its flaws are inherent in complexity. Others include the difficulty in developing many levels of within the data warehouse in a reasonable time and cost.

Kimball's (1994) Approach to Data Warehouse Design

Kimball (1994) proposed five iterative steps to design a data warehouse (see Figures 4). His approach is particularly dedicated to the drawing of a solo data warehouse and on the use of dimensional models in preference to models of entities and relationships. Kimball analyzes those dimensional models because it is easier for business leaders to understand business, is more efficient when dealing with complex consultations, and the design of physicist is more efficient (Kimball 1994). Kimball acknowledges that the development of a data warehouse is iterative, and that data warehouse separated can be integrated through the division into boards of common dimensions.

The first step is to identify the particular subject area to be refined. The second and third steps concern dimensional modeling. In the second step, the measures identify things of interest in the subject area and grouped into a fact table. For example, in a sales subject area, the measures of interest could include the amount of items sold and the dollar as the sales currency. The third step involves identifying dimensions which are the ways in which facts can be grouped. In a sales subject area, relevant dimensions could include item, location and time period. The fact table has a multi-part key to link it to each of the dimension tables and typically contains a very large number of facts. In contrast, dimension tables contain descriptive information about dimensions and other attributes that can be used to group facts. The proposed associated fact and size table forms what is called a star pattern due to its shape. The fourth step involves building a multidimensional to refine the star pattern. The final step is to identify source systems data collected required and develop transformation processes to acquire, clean and format i data collected.

The strengths of Kimball's approach include the use of dimensional models to represent i data collected stored which make it easy to understand and lead to efficient physical design. A dimensional model that readily uses both systems of relational systems can be perfected or systems multidimensional. Its drawbacks include the lack of some techniques to facilitate the planning or integration of many star schemes within one data warehouse and the difficulty of designing from the extreme denormalized structure in a dimensional model a data collected in legacy systems.

McFadden's (1996) Approach to Data Warehouse Design

McFadden (1996) proposes a five-step approach to drawing a data warehouse (see Figure 5).
His approach is based on a synthesis of ideas from the literature and is focused on the design of a solo data warehouse. The first step involves a requirements analysis. Although technical specifications are not prescribed, McFadden's notes identify entities data collected specifications and their attributes, and refers to readers Watson and Frolick (1993) for acquiring the requirements.
In the second step, a model entity relations for data warehouse and then validated by business executives. The third step involves determining the mapping from the legacy system and external sources of data warehouse. The fourth step involves developing, deploying and synchronizing processes data collected in data warehouse. In the final step, the delivery of the system is developed with particular emphasis on a user interface. McFadden points out that the design process is generally iterative.

The strengths of McFadden's approach focus on participation by business leaders in determining requirements and also the importance of resources data collected, their cleaning and collection. Its flaws relate to the lack of a process to break down a large project data warehouse in many integrated stages, and the

difficulty in understanding the entity and relationship models used in the design of data warehouse.

It is not just those who are close to us who choose us.

    0/5 (0 Reviews)
    0/5 (0 Reviews)
    0/5 (0 Reviews)

    Find out more from Online Web Agency

    Subscribe to receive the latest articles by email.

    author avatar
    admin CEO
    👍Online Web Agency | Web Agency expert in Digital Marketing and SEO. Web Agency Online is a Web Agency. For Agenzia Web Online success in digital transformation is based on the foundations of Iron SEO version 3. Specialties: System Integration, Enterprise Application Integration, Service Oriented Architecture, Cloud Computing, Data warehouse, business intelligence, Big Data, portals, intranets, Web Application Design and management of relational and multidimensional databases Designing interfaces for digital media: usability and Graphics. Online Web Agency offer companies the following services: -SEO on Google, Amazon, Bing, Yandex; -Web Analytics: Google Analytics, Google Tag Manager, Yandex Metrica; -User conversions: Google Analytics, Microsoft Clarity, Yandex Metrica; -SEM on Google, Bing, Amazon Ads; -Social Media Marketing (Facebook, Linkedin, Youtube, Instagram).
    My Agile Privacy
    This site uses technical and profiling cookies. By clicking on accept you authorize all profiling cookies. By clicking on reject or the X, all profiling cookies are rejected. By clicking on customize you can select which profiling cookies to activate.
    This site complies with the Data Protection Act (LPD), Swiss Federal Law of 25 September 2020, and the GDPR, EU Regulation 2016/679, relating to the protection of personal data as well as the free movement of such data.