https://unimib.academia.edu/stefanofantin

https://unimib.academia.edu/stefanofantin

doi:https://unimib.academia.edu/stefanofantin

Data warehouse and ERP | CENTRAL DATA ARCHIVE: HISTORY AND EVOLUTIONS

ARCHIVE DATA CENTRAL: HISTORY AND EVOLUTIONS

The two dominant themes of corporate technology in the 90s are
states i data warehouse and the ERP. For a long time these two powerful
currents have been parts of corporate IT without ever having
intersections. It was almost as if they were matter and anti-matter. But
the growth of both phenomena inevitably led to one
their intersection. Today, companies are facing the problem of
what to do with ERP e data warehouse. This article will illustrate
what the problems are and how they are addressed by companies.
AT THE BEGINNING…
In the beginning there was the data warehouse. Data warehouse was born for
counter the transaction processing application system.
In the early days the memorization of the data collected it was meant to be
just a counterpoint to the processing applications of
transactions. But nowadays, there are far more sophisticated visions
than what a data warehouse. In today's world the
data warehouse it is inserted within a structure that can be
called Corporate Information Factory.
THE CORPORATE INFORMATION FACTORY
(CIF)
The Corporate Information Factory has architectural components
standard: a level of code transformation and integration
which integrates i data collected while I data collected they move from the environment of
environmental application of the data warehouse of the company; a
data warehouse of the company where i data collected
detailed and integrated histories. The data warehouse of the company serves as
foundation upon which all other parts can be built
of the environment of data warehouse; an operational data store (ODS).
An ODS is a hybrid structure that contains some aspects of the data
warehouse and other aspects of an OLTP environment; data marts, in which i
different departments may have their own version of the data
warehouse; a data warehouse of exploration in which i
Company "thinkers" can present their queries
72 hours without harmful effect on data warehouse; and a memory
near line, in which data collected old and data collected bulk detail can be
stored cheap.
WHERE ERP COMBINES WITH THE
CORPORATE INFORMATION FACTORY
The ERP merges with the Corporate Information Factory in two places.
First as a basic application (baseline) that provides the
data collected of the application to data warehouse. In this case i data collected,
generated as a by-product of a transaction process,
are integrated and loaded into the data warehouse of the company. The
second point of union between ERP and CIF and the ODS. Indeed, many
environments the ERP is used as a classic ODS.
In case ERP is used as a core application, it
same ERP can also be used in CIF as ODS. In
however, if the ERP is to be used in both roles, there
it must be a clear distinction between the two entities. In other words,
when ERP plays the role of core application and ODS, the
two architectural entities must be distinct. If a single
implementing an ERP tries to play both roles
at the same time there will inevitably be problems in the
design and implementation of this structure.
SEPARATE ODS AND BASIC APPLICATIONS
There are many reasons that lead to the splitting of components
architectural. Perhaps the most telling question to separate the
different components of an architecture is that each component
of architecture has its own view. The baseline application is needed
for a different purpose than the ODS. Try to overlap
a baseline application view on the world of an ODS or vice versa
it is not a right way to work.
Consequently, the first problem of an ERP in the CIF is that of
check if there is a distinction between baseline applications and the
ODS.
DATA MODELS IN THE CORPORATE
INFORMATION FACTORY
To achieve cohesion between the different components
of the architecture of the CIF, there must be a model of data collected. THE
models of data collected serve as a link between the various components
architecture such as baseline applications and ODS. THE
models of data collected become the "intellectual road map" to have the
right meaning by the different architectural components of the CIF.
Going hand in hand with this notion, the idea is that there should be
be a great and unique model of data collected. Obviously he has to
be a model of data collected for each of the components and furthermore there
it must be a sensible path connecting the different models.
Each component of the architecture - ODS, baseline applications,
data warehouse company, and so on .. - needs its own
model of data collected. And so there must be a precise definition of
like these models of data collected they interface with each other.
MOVE I DATA OF THE ERP DATE
WAREHOUSE
If the origin of the data collected is a baseline application and / or an ODS, when
ERP inserts i data collected in data warehouse, such insertion must
occur at the lowest level of "granularity". Recap or
simply aggregate i data collected as they come out
from the baseline application of the ERP or the ODS of the ERP is not the
right thing to do. THE data collected details are needed in the date
warehouse to form the basis of the DSS process. Such data collected
they will be reshaped in many ways by data marts and explorations
of the data warehouse.
The displacement of data collected from the baseline application environment
of the ERP to the environment of the data warehouse of the company is done in a
reasonably relaxed. That shift happens after that
about 24 hours from the update or creation in the ERP. The fact of
have a "lazy" movement of gods data collected in data warehouse
of the company allows the data collected coming from the ERP to "settle".
Once i data collected are deposited in the baseline application,
then you can safely move i data collected of the ERP
in the company. Another goal achievable thanks to movement
"Lazy" gods data collected is the clear delimitation between operational processes and
DSS. With a "fast" movement of the data collected the dividing line
between DSS and operational remains vague.
The movement of data collected from the ODS of the ERP to data warehouse
of the company is done periodically, usually
weekly or monthly. In this case the movement of
data collected it is based on the need to "clean" the old data collected historians.
Of course, the ODS contains i data collected which are much more recent
compared to data collected historians found in data warehouse.
The displacement of data collected in data warehouse it's almost never done
"Wholesale" (in a wholesaler manner). Copy a table
from the ERP environment to data warehouse it does not make sense. One approach
much more realistic is the displacement of selected units of the data collected.
Only data collected which have changed since the last update of the date
warehouse are the ones that should be moved in the data
warehouse. One way to know which ones data collected they have been modified
since the last update is to look at the timestamps of the data collected
found in the ERP environment. The designer selects all the changes
that have occurred since the last update. Another approach
is to use change acquisition techniques data collected. With
these techniques are analyzed log and journal tapes in order to
determine which ones data collected must be moved from the ERP environment to
That of the data warehouse. These techniques are best at
how much logs and journal tapes can be read from the ERP files
without further effects on the other resources of the ERP.
OTHER COMPLICATIONS
One of the problems with ERP in CIF is what happens to others
application sources or ai data collected of the ODS which must contribute to
data warehouse but they are not part of the ERP environment. Given the
closed nature of ERP, especially SAP, the attempt to integrate
keys from external sources of data collected with data collected that come from the ERP to the
time to move i data collected in data warehouse, it is a great challenge.
And how many exactly are the probabilities that i data collected of applications or
ODS outside the ERP environment will be integrated into the data
warehouse? The odds are actually very high.
FIND DATA HISTORICALS FROM ERP
Another problem with the data collected of the ERP is the resulting one
from the need to have data collected historians within the data warehouse.
Usually the data warehouse needs data collected historians. IS
usually ERP technology does not store these data collected
historical, at least not to the point where it is needed on the date
warehouse. When a large amount of data collected historians begin to
be added to the ERP environment, such an environment must be
cleaned up. For example, suppose a data warehouse should
be loaded with five years of data collected historical while the ERP holds the
maximum six months of these data collected. As long as the company is satisfied with
collect a variety of data collected historical as time passes,
then there are no problems in using the ERP as a source for the
data warehouse. But when the data warehouse must go
back in time and take gods data collected historians who have not been
previously collected and saved by the ERP, then the ERP environment
becomes inefficient.
ERP AND METADATA
Another consideration to make about ERP and data warehouse is that
on existing metadata in the ERP environment. As well as metadata
they pass from the ERP environment to the data warehouse, the
metadata must be moved in the same way. Furthermore, i
metadata must be transformed in format and structure
required by the infrastructure of the data warehouse. There is a big one
difference between operational metadata and DSS metadata. The metadata
operational are mainly for the developer and for the
programmer. DSS metadata is primarily for the user
the final. Existing metadata in ERP applications or ODSs
they need to be converted and this conversion is not always easy
and direct.
SOURCING THE ERP DATA
If the ERP is used as a provider of data collected for the data warehouse ci
it must be a solid interface that moves i data collected from the environment
ERP to the environment data warehouse. The interface must:
▪ be easy to use
▪ allow access to data collected of the ERP
▪ take the meaning of data collected that are about to be moved
in data warehouse
▪ know the limitations of ERP that could arise in
time when you log in to data collected of the ERP:
▪ referential integrity
▪ hierarchical relationships
▪ implicit logical relationships
▪ application convention
▪ all the structures of the data collected supported by the ERP, and so on ...
▪ be efficient in accessing data collected, by providing:
▪ direct movement of data collected
▪ acquisition of change data collected
▪ support timely access to data collected
▪ understand the format of the data collected, and so on…
INTERFACE WITH SAP
The interface can be of two types, homegrown or commercial.
Some of the major commercial interfaces include:
▪ SAS
▪ Prime Solutions
▪ D2k, and so on ...
MULTIPLE ERP TECHNOLOGIES
Treating the ERP environment as if it were a single technology is a
big mistake. There are many ERP technologies, each with its own
strengths. The best known vendors in the market are:
▪ SAP
▪ Oracle Financials
▪ PeopleSoft
JD Edwards
▪ Baans
SAP
SAP is the largest and most complete ERP software. Applications
of SAP encompass many types of applications in many areas. SAP has
the reputation of being:
▪ very large
▪ very difficult and expensive to implement
▪ needs many people and consultants to be
implemented
▪ needs specialized people for implementation
▪ takes a long time to implement
Additionally, SAP has a reputation for memorizing its own data collected very
carefully, making it difficult for one to access them
person outside the SAP area. The strength of SAP is to be
capable of capturing and storing a large amount of data collected.
SAP recently announced its intention to extend
its applications ai data warehouse. There are many pros and cons
in using SAP as a supplier of data warehouse.
One advantage is that SAP is already installed and that most of the
consultants already know SAP.
The disadvantages of having SAP as a supplier of data warehouse are
many: SAP has no experience in the world of data warehouse
If SAP is the supplier of data warehouse, it is necessary to "take out"
i data collected from SAP al data warehouse. Date a SAP's track record of
closed system, it is unlikely to be easy to get i from SAP into
it (???). There are many legacy environments that power SAP,
such as IMS, VSAM, ADABAS, ORACLE, DB2, and so on.
SAP insists on a "not invented here" approach. SAP does not want
collaborate with other vendors to use or create the data warehouse.
SAP insists on generating all of its software on its own.
Although SAP is a large and powerful company, the fact of
attempting to rewrite the technology of ELT, OLAP, administration of the
system and even the base code of the dbms it's just crazy.
Instead of adopting a cooperative attitude with suppliers
di data warehouse long-standing, SAP has followed the approach that
they "know more". This attitude holds back the success that
SAP may have in the area of data warehouse.
SAP's refusal to allow external suppliers to access
promptly and gracefully to them data collected. The very essence of using
un data warehouse is easy access to data collected. The whole story of SAP is
based on making it difficult to access data collected.
SAP's lack of experience in dealing with large volumes of data collected;
in the field of data warehouse there are volumes of data collected never seen since
SAP and to handle these large amounts of data collected you must have one
suitable technology. SAP is apparently not aware of this
technological barrier that exists to enter the data field
warehouse.
The corporate culture of SAP: SAP has created a business
in obtaining i data collected from the system. But to do this you need to have
a different mentality. Traditionally, software companies that were
good at getting data into an environment have not been good at
getting data to go the other way. If SAP can do this kind of
switch will be the first company to do so.
In short, it is questionable whether a company should select
SAP as a supplier of data warehouse. There are very serious risks
on the one hand and very few rewards on the other. But there is another
reason that discourages the choice of SAP as a date supplier
warehouse. Because every company should have the same date
warehouse of all other companies? The data warehouse is the heart
competitive advantage. If every company adopted the same
data warehouse it would be difficult, even if not impossible,
achieve a competitive advantage. SAP seems to think that a
data warehouse it can be seen as a cookie and that is a
further sign of their “get the data
in".
No other ERP vendor is as dominant as SAP.
Undoubtedly there will be companies that will follow the SAP path
for theirs data warehouse but presumably these date
SAP warehouses will be large, expensive and very demanding
time for their creation.
These environments include such activities as "bank teller processing",
processes for airline reservations, processes for complaints
insurance, and so on. More performing was the transaction system,
more obvious was the need for separation between operational process and
DSS (Decision Support System). However, with resource systems
human and personal, you are never faced with large volumes of
transactions. And, of course, when a person is hired
or leave the company this is a record of a transaction.
But relative to other systems, human resource systems and
personal simply don't have many transactions. Therefore, in the
systems of human and personal resources is not entirely obvious that there is
need a DataWarehouse. In many ways these systems
represent the amalgamation of DSS systems.
But there is another factor that needs to be considered if one has to
do with datawarehouse and PeopleSoft. In many environments, i data collected
human and personal resources are secondary to the business
primary company. Most companies perform
manufacturing, sales, provide services and so on. THE
human and personal resource systems are usually secondary (or di
support) to the company's core business line. Therefore, it is
equivocal and inconvenient a data warehouse separate for the
support to human and personal resources.
PeopleSoft is very different from SAP in this respect. With SAP, it is
mandatory that there is a data warehouse. With PeopleSoft, it's not
then so clear. A data warehouse is optional with PeopleSoft.
The best thing that can be said for the data collected PeopleSoft is that the date
warehouse can be used to store i data collected concerning
old human and personal resources. A second reason for the
which a company would like to use a data warehouse a
to the detriment of the PeopleSoft environment is to allow access and
free access to analysis tools, ai data collected by PeopleSoft. But
in addition to these reasons, there may be cases where it is preferable not to
have a data warehouse for data collected PeopleSoft.
In short (Italian only)
There are many insights into the construction of a date
warehouse inside an ERP software.
Some of these are:
▪ It makes sense to have a data warehouse that looks like any
else in the industry?
▪ How flexible an ERP is data warehouse software?
▪ An ERP data warehouse software can handle a volume of
data collected which is located in a "data warehouse arena"?
▪ What is the track record that the ERP vendor makes of
facing easy and inexpensive, in terms of time, ai data collected? (what
is the ERP vendors track record on delivery of inexpensive, on
time, easy to access data?)
▪ What is the understanding of the DSS architecture and the
“Corporate information factory” by the ERP vendor?
▪ ERP vendors understand how to get data collected inside of
environment, but also understand how to export them?
▪ How open is the ERP vendor to date instruments
warehousing?
All these considerations must be made in determining
where to put the data warehouse which will host i data collected ERP and others
data collected. In general, unless there is an compelling reason for it
to do otherwise, it is recommended to build data warehouse outside
from the environment of the ERP vendor.
CHAPTER 1
Overview of the BI Organization
Key points:
The repositories of information work in the opposite way
to business intelligence (BI) architecture:
Corporate culture and IT can limit success in
building BI organizations.
Technology is no longer the limiting factor for BI organizations. The
problem for architects and project planners is not if the
technology exists, but if they can effectively implement the
available technology.
For many companies a data warehouse it is little more than a deposit
passive distributing i data collected to users who need it. THE data collected
they are extracted from the source systems and are populated in target structures
di data warehouse. THE data collected they can also be cleaned with the whole
luck. However no extra value is added either
collected by data collected during this process.
Essentially, passive DW, at best, delivers
only i data collected clean and operational to user associations. There
information creation and analytical understanding depend
entirely by users. Judge if the DW (Data warehouse) either
success is subjective. If we judge the success on the
ability to collect, integrate and clean efficiently i data collected
corporate on a predictable basis, then yes, the DW is a success.
On the other hand, if we look at the collection, the consolidation and it
information exploitation the organization as a whole, then
the DW is a failure. A DW provides little or no value of the
information. As a result, users are forced to make do,
thus creating information silos. This chapter presents
a complete vision to summarize the architecture of BI (Business
Intelligence) of the companies. Let's start with a description of BI and
then we will move on to discussions of design and
information development, as opposed to simply providing i data collected
to users. Discussions are then focused on the calculation of
value of your BI efforts. We conclude by defining how IBM
addresses the architectural BI requirements of your organization.
Description of the architecture of
organization of BI
Powerful transaction-oriented information systems are now
on the agenda in every large enterprise, as they level
effectively the playground for companies around the world.
Staying competitive, however, now requires systems analytically
oriented to which can revolutionize the company's ability by rediscovering and
using the information they already have. These systems
analytics derive from understanding from the wealth of data collected
available. BI can improve performance in all information
of the company. Companies can improve customer relationships and
suppliers, improve the profit of products and services, generate
new and best offers, risk control and among many others
earnings cut spending drastically. With BI yours
company finally begins to use customer information
as a competitive asset thanks to applications that have objectives of
market.
Having the right business means means having definitive answers to
key questions like:
▪ Which of ours clients they make us earn more, or there
do they send at a loss?
▪ Where our best live clients in relation to shop/
warehouse they frequent?
▪ Which of our products and services can be sold the most
effectively and to whom?
▪ Which products can be sold most effectively and to whom?
▪ Which sales campaign was the most successful and why?
▪ Which sales channels are most effective for which products?
▪ How we can improve relationships with our best clients?
Most companies have data collected rough to answer
these questions.
Operational systems generate large quantities of product, of
customer and data collected market from points of sale, bookings,
from customer service and technical support systems. The challenge is
extract and exploit this information.
Many companies take advantage of only small fractions of their own data collected
for strategic analyzes.
I data collected remaining, often joined with i data collected resulting from external sources such as i
"Government reports", and other purchased information, are one
gold mine just waiting to be explored, ei data collected must
only be refined in the informative context of yours
organization.
This knowledge can be applied in several ways, variations
from designing a general corporate strategy to
personal communication with suppliers, through call centers,
invoicing, Internet and other points. Today's business environment dictates
that DW and related BI solutions evolve further
the execution of traditional structures of data collected which i data collected normalized to
atomic-level and "star / cube farms".
What is needed to stay competitive is a merger of
traditional and advanced technology in an effort to support a
vast analytic landscape.
Finally, the general environment must improve knowledge
of the company as a whole, making sure that the actions taken
as a consequence of the analyzes carried out they are useful so that everyone does
benefit.
For example, let's say you rank yours clients in the categories
high or low risk.
If this information is generated by a mining model or
other means, it must be put into DW and made accessible to
anyone, by any means of access, such as i
static reports, spreadsheets, tables, or analytical processing in
line (OLAP).
However, currently, a lot of this type of information
remain in the silos of data collected of the individuals or departments they generate
the analysis. The organization as a whole has little or no visibility
for comprehension. Just by mixing this type of content
information in your company DW you can eliminate the silos of the
information and elevate your Dw environment.
There are two major obstacles to the development of an organization
of BI.
First, we have the problem of the organization itself
of the related regulations.
Although we cannot help with policy changes
organization, we can help you understand the components of
a BI organization, its architecture and how the
IBM technology facilitates its development.
The second barrier to overcome is the lack of technology
integrated and the knowledge of a method that recalls the entire space
of BI as opposed to only a small component.
IBM is responding to changes in technology
of integration. It is your responsibility to provide a design
self conscious. This architecture must be developed with
technology chosen for integration without constraints, or at least with
technology that adheres to open standards. Also, yours
company management must ensure that the enterprise of Bi is
carried out according to the schedule and not to allow it
development of information silos deriving from self-serving
agendas, or goals.
This is not to say that the BI environment is not sensitive to
react to the different needs and requirements of different users; instead, it means
that the implementation of those individual needs and requirements is
done for the benefit of the entire BI organization.
A description of the architecture of the BI organization can
be found on page 9 in Figure 1.1. The architecture demonstrates
a rich blend of technologies and techniques.
From the traditional view, the architecture includes the following components
of warehouse
Atomic Layer.
This is the foundation, the heart of the whole DW and therefore of the
strategic reporting.
I data collected stored here will retain historical integrity, reports of
data collected and include the derived metric, as well as being clean,
integrated, and stored using extracting templates.
All subsequent use of these data collected and related information is
derived from this structure. This is an excellent source for
extraction of data collected and for reports with structured SQL queries
Operational deposit of data collected or report base of
data collected(Operational data store (ODS) or reporting
.)
This is a structure of data collected specifically designed for
technical reporting.
I data collected stored and carried over these structures can finally
propagate in the warehouse through the organization zone (staging
area), where it could be used for strategic reporting.
Staging area.
The first stop for most data collected intended for the environment of
warehouse is the organization zone.
Here i data collected they are integrated, cleaned and transformed into data collected useful that
will populate the warehouse structure
Date marts.
This part of the architecture represents the structure of data collected used
specifically for OLAP. The presence of the datamarts, if i data collected are
stored in the star schema that overlap data collected
multidimensional in a relational environment, or in filing cabinets
di data collected proprietary used by specific OLAP technology, such as the
DB2 OLAP server, it is not relevant.
The only constraint is that the architecture facilitates the use of data collected
multidimensional.
The architecture also encompasses Bi's critical technologies and techniques
which are distinguished as:
Spatial analysis
Space is a windfall of information for the analyst and
it is critical to complete resolution. Space can
represent the information of people living in a certain
location, as well as information about where that location is
physically compared to the rest of the world.
To perform this analysis, you must begin by tying yours
information at latitude and longitude coordinates. This is it
referred to as "geocoding" and must be part of the extraction,
transformation, and the loading process (ETL) at the level
atomic of your warehouse.
Data mining.
The extraction of data collected allows our companies to grow the
number of clients, to predict sales trends and allow
the management of relations with i clients (CRM), among other initiatives of the
BI.
The extraction of data collected it must therefore be integrated with the structures of
data collected of the DWHouse and supported by warehouse processes for
ascertain both the effective and efficient use of technology and
related techniques.
As indicated in the BI architecture, the atomic level of the
Dwhouse, like datamarts, is an excellent source of data collected
for extraction. Those same structures must also be
recipients of extraction results to ascertain availability to
broadest audiences.
Agents.
There are various agents to examine the client for each point such as, i
operating systems of the company and the same dw. These agents can
be advanced neural networks trained to learn about
trends of each point, such as the future demand based product
on sales promotions, rules-based engines to react to
un date set of circumstances, or even simple agents that
they report exceptions to "top executives". These processes do
generally present in real time and therefore must
be closely coupled with the movement of the same data collected.
All these structures of data collected, technologies and techniques guarantee
that you will not spend the night generating an organization of the
your BI.
This activity will be developed in incremental steps, for small ones
points.
Each step is an independent project effort, and is reported
as an iteration in your dw or BI initiative. The iterations
may include the implementation of new technologies, for
start with new techniques by adding new structures data collected ,
loading i data collected additional, or with the expansion of the analysis
your environment. This paragraph is discussed more
in depth in chapter 3.
In addition to the traditional structures of Dw and tools of Bi there are others
functions of your BI organization that you owe
design, such as:
Customer touch points (Customer touch
points).
As with any modern organization there are a number of
customer touchpoints that indicate how to have an experience
positive for yours clients. There are traditional channels such as i
traders, switchboard operators, direct mail, multimedia and printing
advertising, as well as the most current channels such as email and web, i data collected
products with some point of contact must be acquired,
transported, cleaned, transformed and then populated in structures of data collected of
BI.
Basics of data collected operational and user associations (Operational
databases and user communities).
At the end of the contact points of the clients you will find the basics of data collected
application of the company and user communities. THE data collected existing
are data collected traditional that must be reunited and merged with the data collected is
flow from the contact points to satisfy the necessary
information.
Analysts. (Analysts)
The primary beneficiary of the BI environment is the analyst. It is he who
benefits from the current extraction of data collected operational, integrated with
different sources of data collected , augmented with features such as analysis
geographical (geocoding) and presented in BI technologies that
allow you to extract, OLAP, advanced SQL reporting and analysis
geographic. The primary interface for the analyst to the environment of
reporting is the BI portal.
However, the analyst is not the only one who benefits from the architecture of the
BI.
Executives, large user associations, and even partners, suppliers and i
clients they should find benefits in enterprise BI.
Back feed loop.
BI architecture is a learning environment. A principle
characteristic of the development is to allow persistent structures of data collected
to be updated through the BI technology used and through actions
user intrapese. An example is the evaluation of the
customer (customer scoring).
If the sales department makes a mining model
of the customer's scores as to use a new service, then the
sales department shouldn't be the only beneficiary group
of service.
Instead, the mining model should be made as a part
natural data flow within the company and the customer's scores
it should become an integrated part of the information environment of the
warehouse, visible to all users. The IBM Bi-bi-centric Suite
including DB2 UDB, DB2 OLAP Server includes most
part of the important components of technology, defined in the figure
1.1.
We use architecture as it appears in this book figure for
give us a level of continuity and show how each product
of IBM fit the general scheme of BI.
Providing the Information Content (Providing
information content)
Designing, developing and implementing your BI environment is
an arduous operation. The design must embrace so much
current and future business requirements. The design of architecture
it must be complete to include all conclusions found
during the design phase. The execution must remain
committed to a single purpose: to develop the architecture of BI
as formally presented in the drawing and based on the requirements of
business.
It is particularly difficult to argue that discipline will ensure the
relative success.
This is simple because you don't develop a whole BI environment
all of a sudden, but it takes place in small steps over time.
However, identifying the BI components of your architecture is
important for two reasons: You will guide all subsequent decisions
architectural techniques.
You will be able to consciously design a particular use of technology
although you may not get a rep that needs the
technology for several months.
Understanding your business requirements sufficiently will affect the type
of products you will acquire for your architecture.
The design and development of your architecture ensure
that your warehouse is
not a random event, but rather a "well-thought-out",
carefully constructed ad opera of art as a mosaic of
blended technology.
Design the information content
All initial planning must focus and identify
major components of BI that will be needed by the environment
general in the present and in the future.
Knowing the Business Requirements is important.
Even before all the conventional planning begins, the
project planner can often identify one or two
component right away.
The balance of components that may be needed for the
your architecture, however, cannot be found easily.
During the design phase, the main part of the architecture
ties the application development session (JAD) on a search
to identify business requirements.
Sometimes these requirements can be outsourced to
queries and reporting.
For example, users declare that if they want to automate
currently a report have to manually generate by integrating
two current ratios and adding the calculations derived from
combination of data collected.
While this requirement is simple, it defines a certain one
functionality of the feature you must include when
buy reporting tools for your organization.
The designer must also pursue additional requirements for
get a complete picture. Users want to subscribe to
this report?
The subsets of the report are generated and emailed to the various
users? Want to see this report on the company portal?
All of these requirements are part of the simple need for
replace a manual report as required by users. The benefit
of these types of requirements is that everyone, users and designers, have
an understanding of the concept of reports.
There are other types of businesses, however, that we need to plan.
When the business requirements are stated in the form of
Business strategic questions, it's easy for the seasoned designer
discern the measure / fact and dimensional requirements.
Figure 1.2 illustrates measurement and dimensional components of a
Business problem.
If JAD users don't know how to declare their requirements
in the form of a business problem, the designer will often provide
examples to skip-start the collection session
requirements.
The expert designer can help users understand more than just the
strategic trade, but also how to train it.
The requirements gathering approach is discussed in chapter 3; for
now we wish only to indicate the need to design for everyone
the types of BI requirements
A Strategic Business Problem is, not just a requirement
Business, but also a design clue. If you have to answer
to a multidimensional question, then you have to memorize,
submit i data collected dimensions, and if you need to store i
data collected multidimensional, you have to decide what kind of technology or
technique you are going to employ.
Do you implement a reserved cube star scheme, or both?
As you can see, even a simple business problem
can have a considerable influence on the design. However
these types of business requirements are ordinary and of course, at least
by experienced designers and project planners.
There has been sufficient debate on the technologies and support of
OLAP, and a wide range of solutions are available. Till now
we mentioned the need to combine simple reporting with i
dimensional requirements of business, and how these requirements
influence technical architectural decisions.
But what are the requirements that are not readily understood
by users or by the DW team? You will never need the analysis
spatial (spatial analysis)?
The mining models of data collected they will be a necessary part of yours
future? Who knows?
It is important to note that these types of technologies are not a lot
known by the general user communities and team members of
Dw, in part, this could happen because they typically
they are handled by some internal or third-party technical experts. It's a
extreme case of the problems that these types of technologies generate. Self
users cannot describe business requirements or frame them
so as to provide guidelines to designers, they can
go unnoticed or, worse, simply ignored.
It becomes more problematic when the designer and the developer fail
they may recognize the application of one of these advanced but
critical technologies.
As we have often heard the designers say, “well, why
Don't we put it aside until we get this other thing?
“Are they really interested in priorities, or just avoid i
requirements they don't understand? It is most likely the last hypothesis.
Let's say your sales team has communicated a requirement
of business, as stated in Figure 1.3, as you can see, the
requirement is framed in the form of a business problem. There
difference between this problem and the typical dimensional problem is
the distance. In this case, the sales group wants to know,
on a monthly basis, total sales from products, warehouses and
clients who live within 5 miles of the warehouse where they are
they buy.
Sadly, designers or architects simply can
ignore the space component by saying, “we have the customer, the
product and i data collected of the deposit. We keep out the distance up to
another iteration.
"Wrong answer. This type of business problem concerns
entirely BI. It represents a deeper understanding of the
our business and a robust analytics space for our analysts.
BI is beyond simple query or standard reporting, or
even OLAP. This is not to say that these technologies do not
are important to your BI, but alone they do not represent
the BI environment.
Design for the information context
(Design for Information Content)
Now that we have identified the Business requirements that set them apart
various fundamental components must be included in a drawing
general architectural. Some of the components of BI are part of
of our initial efforts, while some will not be implemented for
several months.
However, all known requirements are reflected in the design so that
when we have to implement a particular technology, we are
get ready to do it. Something about the project will reflect the thinking
traditional.
For example, Figure 1.1, at the beginning of the chapter, shows a date
mart which keeps the data collected dimensional.
This set of data collected is used to support later uses of
data collected dimensionally driven by the Business issues that
we have identified. As additional documents are
generated, such as the design development of the data collected, we
we will begin to formalize how i data collected they spread in the environment.
We have ascertained the need to represent the data collected so
dimensional, dividing them (according to specific needs
determined) on marts.
The next question to answer is: how will they be built
these data marts?
Do you build the stars to support the cubes, or just cubes, or just the stars?
(or right cubes, or right stars). Generate the architecture for the data
dependent marts that require an atomic layer for all data collected
acquired? Allow independent data marts to acquire data collected
directly from operating systems?
What cube technology will you try to standardize?
You have massive quantities of gods data collected required for dimensional analysis
or you need your national sales force cubes on one
on a weekly basis or on both? Build a powerful item
such as DB2 OLAP Server for finance or Cognos cubes
PowerPlay for your sales organization or both?
These are the great architectural design decisions that
they will affect your BI environment from here on out. Yup,
you have identified a need for OLAP. Now how will you perform that
type of technique and technology?
How some of the more advanced technologies affect yours
drawings? We assume that you have identified a need
space in your organization. Now you have to call the
architectural drawing editions albeit unplanned of
to carry out space components for several months. The architect must
design today based on what is needed. Predict the need for
spatial analysis that generates, stores, carries out and provides
access to data collected space. This in turn should serve as a
constraint regarding the type of technology and specifications
software platform you can currently consider. For
example, the administration system of data base relational
(RDBMS) that you carry out for your atomic layer must have
a robust spatial extension available. This would ensure the
maximum performance when using geometry and objects
space in your analytical applications. If your RDBMS doesn't
can handle i data collected (spatial-centric) internally, so you'll have to
establish a data base (spatial-centric) external. This complicates the
managing editions and compromising your overall performance,
not to mention the additional problems created for yours
DBAs, as they probably have minimal understanding
of the basics of data collected space as well. On the other hand, if your
RDMBS handles all spatial and related components
optimizer is aware of special needs (for example,
indexing) of the spatial objects, then your DBAs can handle
promptly manage editions and you can raise the
performance.
Also, you need to adjust the staging area (scene area) and layer
atomic environment to include address cleanup (a
key element to spatial analysis), as well as the following
saving space objects. The succession of editions of
design continues now that we have introduced the notion of cleaning
address. For one thing, this app will dictate the type of
software needed for your ETL effort.
You need products like Trillium to provide you with an address
clean, or an ETL provider you have chosen to provide that
functionality?
For now it is important that you appreciate the level of the design that is
it must be completed before you begin making yours
environment (warehouse). The above examples should
demonstrate the multitude of design decisions that must follow
the identification of any particular business requirement. If done
correctly, these design decisions promote
the interdependence between the physical structures of your environment, the
selection of technology used and the flow of propagation
information content. Without this conventional architecture
of BI, your organization will be subject to a mixture
chaotic of existing technologies, at best, so united
not accurate to provide apparent stability.
Maintain information content
Bringing the value of information to your organization is
a very difficult operation. Without sufficient understanding
and experience, or proper planning and drawing, even the
better teams would fail. On the other hand, if you have a large one
intuition and a detailed design but no discipline for
execution, you just wasted your money and your time
because your effort is doomed to fail. The message should
be clear: If you are missing one or more of these
skills, understanding / experience or planning / drawing o
implementation discipline, this will lead to cripple or
destroy the construction of the BI organization.
Is your team prepared enough? There is someone on yours
BI team who understand the vast analytical landscape available
in BI environments, in the necessary techniques and technologies
to effect that landscape? There is someone on your team
which can recognize the application difference between advanced
static reporting and OLAP, or the differences between ROLAP and OLAP? One of the
your team members clearly recognize the way
extract and how it might affect the warehouse or how
can the warehouse support the mining performance? A member
of the team understands the value of data collected space or technology
agent-based? Do you have someone who appreciates the unique application
of ETL tools versus broker technology
message? If you don't have it, get one. BI is much more
large of a normalized atomic layer, of OLAP, of the schemes a
star and an ODS.
Have the understanding and experience to recognize the requirements
of BI and their solutions is essential to your ability
properly formalize user needs and design
and carry out their solutions. If your user community has
difficulty in describing the requirements, it is the team's job
warehouse provide that understanding. But if the team of
warehouse
does not recognize the specific application of BI - for example, given
mining - then it's not the best thing that BI environments do
often limited to being passive deposits. However, ignore these
technologies does not diminish their importance and the effect they have
on the emergence of business intelligence possibilities of yours
organization, as well as the information structure you design
to promote.
The design must include the notion of design, ed
both require a competent individual. In addition, the design
it requires a team werehouse philosophy and observation
of standards. For example, if your company has established one
standard platform or has identified a particular RDBMS that you
want to standardize across the platform, it is looming that
everyone on the team adheres to those standards. Generally one
team exposes the need for normalization (to user
communites), but the team itself is unwilling to join the
standards established in other areas in the company or perhaps even in the
similar companies. Not only is this hyporcritic, but it ensures the firm doesn't
is capable of exploiting existing resources and investments. It does not mean
that there are no situations that guarantee a platform or one
non-standard technology; however, the efforts of the warehouse
they should jealously protect the standards of the enterprise until
that business requirements do not dictate otherwise.
The third key component needed to build a BI
organization is the discipline.
It depends in total, equally on individuals and on the environment.
Project planners, sponsors, architects, and users must appreciate the
discipline necessary to build the information structure of the company.
Designers must direct their design efforts in such a way as to
complete other necessary efforts in society.
For example, suppose your company builds a
ERP application that has a warehouse component.
So it is the responsibility of the ERP designers to collaborate with the
warehouse environment team so as not to compete either
duplicate work already started.
Discipline is also a subject that needs to be occupied
by the entire organization and is usually established and entrusted to a
executive level.
Are executives willing to adhere to a designed approach? A
approach that promises to create information content that to the
end will bring value to all areas of the enterprise, but perhaps
does it compromise individuals or departmental agendas? Remember the saying
“Thinking about everything is more important than thinking about one thing”.
This saying is true for BI organizations.
Unfortunately, many warehouses focus their efforts
trying to address and bring value to a particular department or to
specific users, with a little regard to the organization in
general. Suppose the executive requests assistance from the
werehouse. The team responds with 90 days of work that
includes not only the delivery of the notification requirements defined by the
executive but ensures that all data collected base are mixed in the level
atomic before being introduced into cube technology
proposal.
This engineering addition ensures that the firm's
werehouse will benefit from data collected necessary for the manager.
However, the executive spoke to outside consulting firms that
have proposed a similar application with delivery in less than 4
weeks.
Assuming the in-house werehouse team is competent, the
executive has a choice. Who can support the discipline of
extra engineering needed to cultivate the good
informative enterprise or can choose to carry out their own
solution quickly. The latter seems to be truly chosen
too often and only serves to create information containers of
which benefit only a few or the single.
Short and long term goals
Architects and project planners must formalize one
long-term view of the general architecture and plans for
grow into a BI organization. This combination of
short-term gain and long-term planning
represent the two faces of BI endeavors. The gain in the short term
expiration is the facet of BI that is associated with iterations of the
your warehouse.
This is where planners, architects and sponsors focus on
meet specific commercial requirements. It is at this level where the
physical structures are built, technology is purchased and the
techniques are implemented. They are by no means made to deal with
specific requirements as defined by particular user communities.
Everything is done in order to address specific defined requirements
from a particular community.
Long-range planning, however, is the other facet
of BI. This is where the plans and projects ensured it was
built any physical structure, the technologies selected and the
techniques made with an eye towards the company. And the
long-term planning that provides cohesion
necessary to ensure that the business benefits derive from all
the short-term gains found.
Justify your BI effort
Un data warehouse on its own it has no inherent value. In other
words, there is no inherent value among the technologies of
warehouse and implementation techniques.
The value of any warehouse effort is found in the actions
performed as a result of the warehouse environment and content
informative grown over time. This is a critical point to understand
before you ever attempt to estimate the value of any initiative
home.
Too often, architects and planners try to apply value to
physical and technical components of the warehouse when in fact the value is
founded with the business processes that are positively impacted by
warehouse and well-acquired information.
Here lies the challenge to found BI: How do you justify the investment?
If the wherehouse itself has no intrinsic value, the designers of
project must investigate, define and formalize the benefits
achieved by those individuals who will use the warehouse for
improve specific business processes or the value of
protected information or both.
To complicate matters, any business process
affected by warehouse efforts could provide benefits
"Considerable" or "mild". Considerable advantages provide one
tangible metric to measure the return on investment (ROI) - ad
for example, turn over the inventory an additional time during a period
specific or for lower cost of transport per shipment. It's more
difficult to define the slight advantages, such as improved access to
information, in terms of tangible value.
Connect your project to know the
Business requests
Too often, project designers try to link value
of the warehouse with amorphous objectives of the firm. By stating that
“The value of a warehouse is based on our ability to
satisfy strategic requests ”we open the
speech. But alone is not enough to determine whether
investing in inventory makes sense. It's better to connect reps
warehouse with specific and known commercial requests.
Measuring the ROI
Calculating ROI in a warehouse setup can be
particularly difficult. It is especially difficult if the advantage
principal of a particular repetition is something intangible or
easy to measure. One study found that users perceive
the two main benefits of BI initiatives:
▪ Create the ability to make decisions
▪ Create access to information
These perks are soft (or mild) perks. It is easy to see
how can we calculate an ROI based on a hard lead (or
greater) like the reduction in the cost of transport, but how
do we measure the ability to make better decisions?
This is definitely a challenge for project designers when
they are trying to get the company to invest in a particular one
warehouse effort. Increasing sales or falling costs
they are no longer the central themes that drive the BI environment.
Instead, you are looking for access in business requests
better to information so that a particular department can
make faster decisions. These are strategic drivers to
which happens to be equally important to the business but they are
more ambiguous and more difficult to characterize in a tangible metric.
In this case, calculating ROI can be deceiving, if not irrelevant.
Project designers need to be able to demonstrate value
tangible for executives to decide whether to invest in
a particular repetition holds. However, we will not propose a new one
method for calculating ROI, nor will we do any pro or argument
against it.
There are many articles and books available that discuss the fundamentals for
calculate ROI. There are special value propositions such as value
on investment (YOU), offered by groups like Gartner, that you can
to research. Instead, we will focus on core aspects of any
ROI or other value propositions you need to consider.
Applying ROI
In addition to the argument about "hard" benefits versus "light" benefits
associated with BI efforts there are other issues to consider
when we apply ROI. For example:
Attribute too many savings to the efforts of the DW that would come
however
Let's say that your company passed from an architecture of
mainframe to a distributed UNIX environment. So any
savings that may (or may not) be realized by that effort
should not be attributed exclusively, if to all (?), to the
warehouse.
Not accounting for everything costs. And there are a lot of things from
take into account. Consider the following list:
▪ Cost of start-up, including feasibility.
▪ Cost of dedicated hardware with related storage e
communication
▪ Cost of software, including management of data collected and extensions
client / server, ETL software, DSS technologies,
visualization, programming and flow applications
work and tracking software,.
▪ Structure design cost data collected, with the realization, and
the optimization of
▪ Software development cost directly associated with the effort
BI
▪ Cost of home support, including optimization of
performance, including software version control e
help operations
Apply “Big-Bang” ROI.
The construction of the warehouse as a single and gigantic effort
is doomed to fail, so it also calculates the ROI for an initiative
large enterprise The offer is surprising, and that designers
continue to make feeble attempts to estimate the value of the whole
effort.
Because the designers try to give a monetary value
on business initiative if it is widely known and accepted that
Is it difficult to estimate specific reps? How is it possible? It is not
possible with a few exceptions. Don't do it.
Now that we have established what not to do when we calculate
ROI, here are some points that will help us in the definition of
a reliable process for estimating the value of your BI efforts.
Obtaining the consent of ROI. Regardless of yours
choice of technique to estimate the value of your BI efforts, must
be agreed upon by all parties, including project planners,
sponsors and corporate executives.
Reduce ROI to identifiable parts. A necessary step towards in
reasonable calculation of an ROI is to focus that calculation on one
specific project. This then allows you to estimate a value
based on specific business requirements being met
Define the costs. As mentioned, numerous costs must be
considered. Furthermore, the costs must include not only the associated ones
to the single iteration but also to the associated costs
the assurance of compliance with company standards.
Define benefits. By clearly linking ROI to requirements
commercial, we should be able to identify the
benefits that will lead to meeting the requirements.
Reduce costs and benefits in upcoming earnings. It is the way
best way to base your valuations on net present value
(NPV) unlike trying to predict future value in
future earnings.
Keep your ROI split time to a minimum. IS'
well documented over the long term it has been used in yours
ROI.
Use more than one ROI formula. There are numerous methods for the
ROI forecast and you should plan whether to use one or
plus, including the net present value, the internal speed of the return
(IRR) and recovery.
Define the repeatable process. This is crucial to calculate
any long-term value. A
single repeatable process for all the project sequences a
follow.
The problems listed are the most common ones defined by experts
of the werehouse environment. The insistence on the part of the management of
having a “Big Bang” ROI delivered is very confusing. If you start all
your ROI calculations by reducing them to identifiable and tangible parts, you have
a good chance to estimate an accurate ROI evaluation.
Questions about the benefits of ROI
Whatever your benefits, soft or hard, you can use
some fundamental questions to determine their value. To
example using a simple scale system, from 1 to 10, you
you can detect the impact of any effort using the following
questions:
▪ How would you rate the understanding of data collected following this
your company's project?
▪ How would you estimate the process improvements as a result of
this project?
▪ How would you measure the impact of new insights and inferences now
made available by this iteration
▪ What has been the impact of new computer environments e
performing as a result of what was learned?
If the answers to these questions are few, it is possible that
the company is not worth the investment made. Questions with a high
scoring points to significant value gains and should
serve as guides for further investigation.
For example, a high score for process improvements
it should lead the designers to examine how the processes are
been improved. You may find that some or all of the gains obtained
they are tangible and therefore a monetary value can be readily
applied.
Getting the most out of the first iteration of the
warehouse
The greatest result of your business effort is often in the
first few iterations. These first efforts traditionally
establish the most useful information content for the public e
establishes aid to the technology foundation for the subsequent ones
BI applications.
Usually each subsequent subsequence of data collected of project of
warehouses bring less and less additional value to the enterprise in
general. This is especially true if the iteration doesn't
adds new topics or does not meet the needs of a new one
community of users.
This feature of storing also applies to batteries
increasing of data collected historians. As subsequent efforts require more
data collected and how more data collected are poured into the warehouse over time, the most of
data collected it becomes less relevant to the analysis used. These data collected are
often called data collected doze off and it's always expensive to keep them because
they are almost never used.
What does this mean for project sponsors? Essentially, i
first sponsors share more than the investment costs.
This is primary because they are the impetus to found the layer
large technological environment and warehouse resources,
including organic.
But these first steps bring the highest value and therefore the designers
often have to justify the investment.
Projects made after your BI initiative can have costs
lower (compared with the first) and direct, but carry less value
to the company.
And organization owners need to start considering
throw the accumulation of data collected and less relevant technologies.
Data Mining: Mining Data
Numerous architectural components require variations of
data mining technologies and techniques -
for example, the different "agents" for the examination of the points of interest of the
clients, the company's operating systems and for the same dw. These
agents can be advanced neural networks trained on
pot trends, such as future product demand based on
sales promotions; rules-based engines for
react to a whole date circumstances, for example, diagnosis
medical and treatment recommendations; or even simple agents
with the role of reporting exceptions to senior managers (top
executives). Generally these extraction processes data collected si
verify in real time; therefore, they must be united
completely with the movement of data collected themselves.
Online Analytic Processing Processing
Online Analytics
The ability to slice, chop, roll, drill down
and perform the analysis
what-if, is within the scope, of the suite's goal
IBM technology. For example, the analytical processing functions
online (OLAP) exist for DB2 which brings dimensional analysis into the
engine of same.
Functions add dimensional utility to SQL while
take advantage of all the benefits of being a natural part of DB2. Another
example of OLAP integration is the extraction tool, DB2
OLAP Server Analyzer. This technology allows the cubes of the
DB2 OLAP server to be quickly and automatically
analyzed to identify and report on values of data collected unusual or unexpected
throughout the cube to the trade analyst. And finally, the functions of the
DW Center provide means for architects to control, among the
other things, the profile of a DB2 OLAP server cube as a part
nature of ETL processes.
Spatial Analysis Spatial Analysis
Space represents half of the analytic anchors (conduction)
necessary for a panorama
broad analytic (time represents the other half). The atomic level
(atomic-level) of the warehouse, represented in Figure 1.1,
it includes the fundamentals of both time and space. The recordings
of the time anchor analysis for time and address information
anchor space analysis. Time stamps (Timestamps)
they conduct the analysis in time, and the address information leads
analysis by space. The diagram shows geocoding – process of
converting addresses to points in a map or points in space
so that concepts such as distance and internal / external can be
used in analysis – conducted at the atomic level and spatial analysis
which is made available to the analyst. IBM provides extensions
space, developed with the Environmental System Research Institute (ESRI),
al DB2 so that spatial objects can be
stored as a normal part of the relational. DB2
Spatial Extenders, they also provide all SQL extensions for
take advantage of spatial analysis. For example, SQL extensions from
question about
distance between addresses or whether a point is inside or outside an area
polygonal defined, are an analytical standard with the Spatial
Extender. Refer to chapter 16 for more information.
Database-Resident Tools Tools Database-
R
DB2 has many BI-resident SQL features that assist
in the analysis action. These include:
▪ Recursion functions to perform analysis, such as “find
all possible flight paths from San Francisco a New York".
▪ The analytic functions for ranking, cumulative functions, cube
and rollups to facilitate the tasks that normally occur
only with OLAP technology, they are now a natural part of the
engine of
▪ The ability to create tables that contain results
Sellers of leaders mix more of the BI capabilities
in same.
The main suppliers of data base they are mixing more of the
BI functionality in the same.
This provides the best performance and more execution options for the
BI solutions.
The features and functions of DB2 V8 are discussed
in detail in the following chapters:
Technical Architecture and Data Management Foundations
(Chapter 5)
▪ DB2 BI Fundamentals (Chapter 6)
▪ DB2 Materialized Query Tables (Materialized Query
Tables) (Chapter 7)
▪ DB2 OLAP Functions (Chapter 13)
▪ DB2 Enhanced BI
Features and Functions) (Chapter 15)
Simplified Data Delivery System
Delivery system of data collected Simplified
The architecture represented in Figure 1.1 includes several
structures data collected physical. One is the warehouse of data collected operational.
Generally, an ODS is a subject oriented,
integrated and current. Would you build an ODS to support, ad
example, the sales office. ODS sales would integrate data collected
coming from numerous different systems but would keep only, ad
example, today's transactions. The ODS can be updated
even many times a day. At the same time, the processes
push i data collected integrated into other applications. This structure is
specifically designed to integrate data collected current and dynamic e
would be a likely candidate to undergo real-time analytics,
how to provide service agents clients sales information
currents of a customer by extracting sales trend information
from the warehouse itself. Another structure shown in Figure 1.1 is
a formal status for the dw. Not only is this the place for
the execution of the necessary integration, of the quality of data collected, and
of the transformation of data collected of stock on the way, but it is also
a reliable and temporary storage area for data collected replied that
could be used in real-time analytics. If you decide to
use an ODS or staging area, one
of the best tools to populate these structures data collected using
different operating sources is the heterogeneous distributed query of DB2.
This capability is delivered by the optional DB2 feature
called DB2 Relational Connect (query only) and through DB2
DataJoiner (a separate product that delivers the application,
the insertion, updating and the possibility of cancellation a
Heterogeneous distributed RDBMSs).
This technology allows architects to data collected to tie data collected di
production with analytical processes. Not only can technology
virtually adapt to any of the replication requests that
they might show up with real-time analytics, but it
they can also connect to a wide variety of bases of data collected more
popular, including DB2, Oracle, Sybase, SQL Server,
Informix and others. DB2 DataJoiner can be used to populate
a structure data collected formal as an ODS or even a table
permanent represented in the warehouse designed for restoration
quick of instant updates or for sale. Of course,
these same structures data collected can be populated using
another major technology designed for replication of data collected, IBM
DataPropagator Relational. (DataPropagator is a separate product
for central systems. DB2 UNIX, Linux, Windows and OS / 2 include
replication services of data collected as a standard feature).
Another method for moving the data collected operating around
to the enterprise is an enterprise application integrator otherwise
known as the message broker
unique technology allows unmatched control to center
(targeting) and move data collected around the company. IBM has the broker
of the most widely used message, MQSeries, or a variation
of the product which includes the requirements of E-commerce, IBM
WebSphere MQ.
For more discussion on how to leverage MQ to support a
warehouse and a BI environment, visit website of the book. For now, it is
suffice it to say that this technology is an excellent medium for
capture and transform (using MQSeries Integrator) data collected
targeted operators recruited for BI solutions. There
MQ technology has been integrated and packaged into UDB V8, which
it means that the message queues can now be managed
as if they were DB2 tables. The concept of welding of the
queued and universe messages relational heads
towards a powerful delivery environment data collected.
Zero-Latency Zero latency
The ultimate strategic goal for IBM is zero latency (zerolatency) analysis.
As defined by
Gartner, a BI system must be able to deduce, assimilate
and provide information for analysts upon request. The challenge,
of course, it lies in how to mix data collected current and in real time
with necessary historical information, such as i data collected related model / of
trend, or extracted understanding, as a delineation of the
customer.
Such information includes, for example, the identification of clients ad
high or low risk or which products i clients they will buy a lot
probably if they already have some cheese in their carts
shopping.
Getting zero latency is actually dependent on two
fundamental mechanisms:
▪ Complete union of data collected which are analyzed with the
established techniques and with the tools developed by BI
▪ A delivery system of data collected efficient to ensure that
real-time analysis is really available
These zero latency prerequisites are no different from the two
goals set by IBM and described above.
The close coupling of the data collected is part of the
seamless integration arranged by IBM. And create a system
delivery of data collected efficient is completely dependent on
available technology that simplifies the delivery process of
data collected. Consequently, two of IBM's three goals are critical
to realize the third. IBM is consciously developing its own
technology to ensure that zero latency is a reality for
warehouse efforts.
Summary / Synthesis
The BI organization provides a road map for
create your environment
iteratively. It must be adjusted to reflect the needs of
your business, both current and future. Without an architectural vision
large, the stock repetitions are little more than
random central warehouse implementations that do little for
create a large, informative enterprise.
The first hurdle for project managers is how to justify
investments necessary for the development of the BI organization.
Although ROI calculation has remained a major prop for
warehouse achievements, it is becoming more difficult to
predict exactly. This has led to other methods for the
determining if you are getting your money's worth. The
value on investment2 (YOU), for example, is procured
as a solution.
It is looming over the architects of data collected and on project planners
deliberately generate and provide information to associations of
users and not simply give a service on data collected. There is a
huge difference between the two. Information is something that makes one
difference in decision making and effectiveness; relatively, i
data collected they are building blocks for deriving that information.
Although critical of the source data collected to address requests
commercial, the BI environment should serve a larger role
in creating information content. We have to take
additional measures to clean, integrate, transform or
otherwise to create an information content according to which the
users can take action, and therefore we must make sure that those
actions and those decisions, where reasonable, are reflected
in the BI environment. If we relegate the warehouse to serve only on data collected,
it is ensured that user associations will create the content
information needed to take action. This ensures that their
community will be able to make better decisions, but the firm
it suffers from the lack of knowledge that they have used.
Date that architects and project planners initiate projects
specific to the BI environment, they remain accountable to the enterprise
by and large. A simple example of this two-way feature
faces of the iterations of BI is found in the source data collected. All the
data collected received for specific commercial requests must be
populated in the first atomic layer. This guarantees the development of the
corporate information asset, as well as manage, address the
user specific requests defined in the iteration.

W hatisa D ata W arehouse?
Data warehouse it is the heart of the information systems architecture
since 1990 and supports information processes by offering a solid
integrated platform of data collected historians taken as a basis for later
analysis. THE data warehouse offer ease of integration into a
world of incompatible application systems. Date
warehouse has evolved to become a trend. Data warehouse
organizes and stores i data collected necessary for information processes e
analytical on the basis of a long historical time perspective. All
this involves a considerable and constant commitment in the construction e
in maintaining the data warehouse.
So what is a data warehouse? A data warehouse is:
▪ oriented to subjects
▪ integrated system
▪ variant time
▪ non-volatile (does not cancel)
a collection of data collected used to support managerial decisions in
process implementation.
I data collected inserted in data warehouse arise in most
cases from operational environments. The data warehouse is made by a
storage unit, physically separate from the rest of the
system, which contains data collected previously transformed by
applications that operate on information deriving from the environment
operational.
The literal definition of a data warehouse deserves an in-depth look
explanation as there are important motivations and meanings of
fund that describe the characteristics of a warehouse.
SUBJECT ORIENTATION ORIENTATION
THEMATIC
The first feature of a data warehouse is that it is ai-oriented
major players in a company. The judging of the trials through i
data collected it is in contrast to the more classic method that provides
the orientation of applications towards processes and functions,
method for the most part shared by most of the
older directional systems.
The operating world is designed around applications and functions
such as loans, savings, bankcards and trust for an institution
financial. The world of the dw is organized around subjects
such as the customer, the seller, the product and the business.
Alignment around topics affects design and
on the realization of data collected found in the dw. More relevantly,
the main argument affects the most important part of the
key structure.
The world of the application is influenced by both the design of the date
base and from the process design. The world of
dw is focused solely on modeling data collected and on
drawing of the . The design of the process (in its form
classic) is not part of the dw environment.
The differences between the choice of process / function and application
choice by subject are also revealed as differences in content
of the data collected on a detailed level. THE data collected del dw do not include i data collected is
they will not be used for the DSS process while applications
operational oriented to data collected contain i data collected to satisfy
immediately the functional / processing requirements that may or
minus having any use for the DSS analyst.
Another important way in which operational oriented applications
ai data collected differ from data collected of dw is in the reports of data collected. THE data collected
operational maintain a continuous relationship between two or more tables
based on a business rule that is active. THE data collected by dw
they span a spectrum of time and the ratios found in the dw are
lot of. Many trading rules (and correspondingly, many
reports of data collected ) are represented in the warehouse of data collected between two o
multiple tables.
(For a detailed explanation of how the relationships between the data collected are
managed in the DW, we refer to the Tech Topic on that
question.)
From no other perspective than that of difference
fundamental between a choice of functional / process and application
a subject choice, there is a greater difference between the systems
operational and data collected and the DW.
INTEGRATION INTEGRATION
The most important aspect of dw's environment is that i data collected found
within the dw they are easily integrated. ALWAYS. WITHOUT
EXCEPTIONS. The very essence of the dw environment is that i data collected
contained within the limits of the warehouse are integrated.
Integration reveals itself in many different ways - in conventions
identified as consistent, to the extent of consistent variables, in
codified structures consisting, in the physical attributes of data collected
consistent, and so on.
Over the years the designers of different applications have done it
holding many decisions about how an application should
be developed. Individualized style and design decisions
of designers' applications reveal themselves in a hundred ways: in
differences in coding, key structure, physical characteristics,
identification conventions, and so on. The collective capacity of many
application designers to create inconsistent applications
it's legendary. Figure 3 exposes some of the more differences
important in the ways applications are designed.
Encoding: Encode:
Application designers chose field coding -
sex - in several ways. A designer represents sex as
an "m" and "f". Another designer represents sex as a "1"
and a "0". Another designer represents sex as an "x" e
"Y". Another designer represents sex as "male" e
"female". It doesn't really matter how sex gets in the DW. The "M"
and the "F" are probably as good as the whole
representation.
What matters is that from whatever origin the sex field comes,
that field arrives in the DW in a consistent integrated state. Of
consequence when the field is loaded into the DW from
an application where it has been represented out in format
"M" and "F", i data collected must be converted to the DW format.
Measurement of Attributes: Measurement of
Attributes:
Application designers chose to measure the pipeline in
a variety of ways in the course
Some years. A designer stores i data collected of the pipeline in
centimeters. Another application designer stores i data collected
of the pipeline in terms of inches. Another designer from
application stores i data collected of the pipeline in million cubic feet
per second. And another designer stores the information of the
pipeline in terms of yards. Whatever the source, when the
pipeline information arrives in the DW they must be
measured in the same way.
According to the indications in figure 3, the integration issues
they affect almost every aspect of the project - the features
physical of data collected, the dilemma of having more than one source of data collected,
matter of inconsistent identified samples, formats of data collected
inconsistent, and so on.
Whatever the design argument, the result is the same -
i data collected must be stored in the DW in a singular e
globally acceptable manner even when operating systems of
bottom store differently i data collected.
When the DSS analyst looks at the DW, the analyst's target
should be the exploitation of data collected who are in the warehouse,
rather than wondering about the credibility or consistency of
data collected.
TIME VARIANCY
Our data collected in the DW they are accurate at some point in time.
This basic feature of data collected in DW it is very different from data collected
found in the operating environment. THE data collected of the operating environment are
precise as in the moment of access. In other words,
in the operating environment when a drive is accessed data collected, but also
wait for it to reflect precise values as in the moment of access.
Why i data collected in the DW are as accurate as at some point in the
time (ie, not “right now”), i data collected found in the DW
they are "time variancy".
The time variancy of data collected DW is referred to in numerous ways.
The simplest way is that i data collected of a DW represent data collected on a
long time horizon - five to ten years. The horizon
time represented for the operating environment is much shorter
▪ from today's current values from up to sixty ninety
Applications that must work well and must be
available for transaction processing must carry the
minimum amount of data collected if they admit any degree of
flexibility. So the operational applications have a horizon
short time, as a design topic of
audio applications.
The second way that 'time variancy' appears in the DW is in the
key structure. Each key structure in the DW contains,
implicitly or explicitly, a time element, such as
day, week, month, etc. The time element is almost always
at the bottom of the concatenated key found in the DW. In these
occasions, the time element will exist implicitly, as the case may be
where an entire file is duplicated at the end of the month or quarter.
The third way time variancy is displayed is that i data collected of the
DW, just correctly registered, cannot be
updated. THE data collected of the DW are, for all practical purposes, a long one
series of snapshots (snapshot). Of course if the snapshots is
been taken incorrectly, then snapshots can be
modified. But assuming the snapshots are done
correctly, they are not changed as soon as they are done. In some
cases can be unethical or even invalid that the snapshots in the
DW are modified. THE data collected operating, being precise as in
upon access, they can be updated as it presents itself
the need.
NOT VOLATILE
The fourth important feature of DW is that it is non-volatile.
Updates, insertions, cancellations and changes are made
regularly for record-by-record operational environments. But the
basic manipulation of data collected needed in the DW is much more
simple. There are only two kinds of operations that occur in the
DW - the initial loading of the data collected and access to data collected. There is not
any update of the data collected (in the general sense of
update) in the DW as a normal processing operation.
There are some very powerful consequences of this difference
basis between operational processing and DW processing. On the level
design, the need to be cautious about updating
anomalous is not factor in the DW, since updating the data collected is not
carried out. This means that at the physical level of design,
liberties can be taken to optimize access to data collected,
in particular in dealing with the topics of normalization and
physical denormalization. Another consequence of simplicity
of DW's operations is in the underlying technology used for
run the DW environment. Having to support updates
record by record online (as is often the case with
operational processing) the technology is required to have
very complex foundations under an apparent simplicity.
The technology that supports backup and recovery, transactions
and integrity of data collected and the deadlock condition discovery and remedy is
quite complex and not necessary for DW processing.
The characteristics of a DW, design orientation,
integrating data collected within the DW, time variancy and simplicity
management of data collected, it all leads to an environment that is very, very
different from the classic operating environment. The source of almost all
data collected of DW are the operating environment. It is tempting to think
that there is massive redundancy of data collected between the two environments.
In fact, the first impression many people get is that of
great redundancy of data collected between the operating environment and the
DW. Such an interpretation is superficial and proves one
lack of understanding what happens in the DW.
In fact there is a minimum of redundancy of data collected between the operating environment
and of data collected of the DW. Let's consider the following:
▪ i data collected are filtered date that you pass from the operating environment
to the DW environment. Lot of data collected they never pass out
from the operating environment. Except that i data collected which are needed for
DSS processing find their direction in the environment
▪ the time horizon of the data collected it is very different from an environment
to the other. THE data collected in the operating environment they are very fresh. THE data collected
in DW they are much older. Only from the perspective
of the time horizon, there is very little overlap
between the operating environment and the DW.
▪ The DW contains data collected summary that are never found
in the environment
▪ i data collected undergo a fundamental transformation since
as they pass to Figure 3 illustrates that most
part of the data collected are significantly modified provided
to be selected and moved to the DW. Put another way, the
most data collected is physically modified e
radically how it is moved into the DW. From the point of view
integration are not the same data collected who reside
in the operating environment.
In light of these factors, the redundancy of data collected between the two environments is
a rare event, leading to less than 1% redundancy between the two
environments.
THE STRUCTURE OF THE WAREHOUSE
DWs have a distinct structure. There are various summarizing and di levels
detail that demarcate the DWs.
The various components of a DW are:
▪ Metadata
▪ Data current details
▪ Data of old detail
▪ Data slightly summarized
▪ Data highly summarized
By far the main concern is for i data collected of detail
currents. It is the main concern because:
▪ i data collected current details reflect the most recent events,
which are always of great interest and
▪ i data collected of current detail are voluminous because it is
stored at the lowest level of granularity e
▪ i data collected current details are almost always stored on
disk memory, which is fast to access, but expensive e
complex from
I data collected older details are data collected which are stored on
some memory of mass. It has sporadic access and is
stored at a level of detail compatible with data collected Detailed
currents. While it is not mandatory to store on a
alternate memory, due to the large volume of data collected united with
sporadic access of data collected, the memory medium for data collected di
older detail is usually not stored on disk.
I data collected summarized lightly are data collected which are distilled from below
level of detail found at the current level of detail. This
DW level is almost always stored in disk memory. THE
problems of the design that are presented to the architect of the data collected
in the construction of this level of the DW are:
▪ Which unit of time is the summarization done above
▪ What content, attributes will slightly summarize the
content of data collected
The next level of data collected found in the DW is that of data collected highly
summarized. THE data collected highly summarized are compact and easily
accessible. THE data collected highly summarized are sometimes found
in the DW environment and in other cases i data collected highly summarized are
found outside the immediate walls of the technology hosting the DW.
(in any case, i data collected highly summarized are part of the DW
regardless of where i data collected are physically housed).
The final component of the DW is that of metadata. In many respects
metadata sits in a different dimension than others data collected
of the DW, because the metadata does not contain any date directly
taken from the operating environment. Metadata has a special role e
very important in the DW. Metadata are used as:
▪ a directory to help the DSS analyst locate the
content of the DW,
▪ a guide to the mapping of data collected of how i data collected were
transformed from the operating environment to the DW environment,
▪ a guide to the algorithms used for summarization between data collected di
current detail ei data collected slightly summarized, i data collected highly
summaries,
Metadata plays a much more important role in the DW environment
than they ever had in the operating environment
OLD DETAIL STORAGE MEDIUM
Magnetic tape can be used to store that type of
data collected. In fact there is a wide variety of memorization tools that
should be considered for the preservation of old data collected di
detail.
Depending on the volume of the data collected, the frequency of access, the cost
of the tools and the type of access, it is completely probable
that other tools will need the old level of detail
in the DW.
FLOW OF DATA
There is a normal and predictable flow of gods data collected within the DW.
I data collected enter the DW from the operating environment. (NOTE: there are
some very interesting exceptions to this rule. However, almost
all the data collected enter the DW from the operating environment). Date that data collected
enter the DW from the operating environment, it is transformed as it has been
described above. Provided you enter the DW, i data collected enter the
current level of detail, as shown. It resides there and is used
until one of the three events occurs:
▪ is purified,
▪ is summarized, and / or
▪ is
Obsolete process inside a DW move i data collected current details
a data collected of old detail, according to the age of data collected. The process
summarization uses the detail of data collected to calculate i data collected
slightly summarized and highly summarized levels of data collected. There are
some exceptions to the flow shown (will be discussed later).
However, usually, for the vast majority of data collected found
within a DW, the flow of data collected is as pictured.
USING THE DATAWAREHOUSE
Not surprisingly the various levels of data collected within the DW not
they receive different levels of use. As a rule, the higher the level of
summarization, plus i data collected they are used.
Many uses occur in data collected highly summarized, while the old
data collected details are almost never used. There is a good reason in the
move the organization to the resource use paradigm. More has
summary i data collected, the faster and more efficient it is to get to the data collected. It
un shop find that it does many processes at the detail level of the DW,
then a corresponding large amount of machine resources
is consumed. It is in everyone's best interests to prosecute
as in a high level of summarization as soon as possible.
For many stores, the DSS analyst in a pre-environment DW used
data collected at the level of detail. In many respects the arrival a data collected Detailed
it looks like a security blanket, even when they are available
other levels of summarization. One of the activities of the architect of data collected è
unaccustom the DSS user from constant use of data collected at the plus level
low detail. There are two reasons available
by the architect of data collected:
▪ installing a chargeback system, where the end user pays the
resources consumed e
▪ which indicate that the very good response time can be
obtained when the behavior with i data collected it is at a high level
of summarization, while the poor response time results from the
behavior of data collected at a low level of
OTHER CONSIDERATIONS
There are some other construction and management considerations
D.W.
The first consideration is that of indices. THE data collected at the higher levels of
summarization can be freely indexed, while i data collected
at lower levels of detail they are so bulky that they can be
indexed frugally. From the same token, i data collected at the high levels of
detail can be relatively easily restored,
while the volume of data collected at the lower levels it is so large that i data collected not
they can be easily refurbished. Consequently, the model
of the data collected and the formal work done by the design set the
foundation for the DW applied almost exclusively to the level
detail current. In other words, the modeling activities of
data collected they do not apply to summarization levels in almost every case.
Another structural consideration is that of the subdivision of
data collected by DW.
Partitioning can be done at two levels - at the level of dbms and to
application level. In the division on the level dbms, dbms è
informed of the divisions and checks them accordingly. In the case of
division at the application level, only the programmer is
informed of the divisions and responsibility for them
administration is left to him
Below the level dbms, a lot of work is done automatically. There is
much inflexibility connected with the automatic administration of
divisions. In the case of the division at the application level of the data collected of the
data warehouse, a lot of work is on the programmer, but the
end result is flexibility in the administration of data collected in the date
warehouse
OTHER ANOMALIES
While the components of the data warehouse work as described
for almost all data collected, there are some useful exceptions that must
be discussed. An exception is that of data collected public summaries
(public summary data). These are data collected summaries that have been
calculated outside the data warehouse but they are used by society. THE data collected
public summaries are stored and managed in the data warehouse,
although as mentioned above they are figured out. THE
accountants work to produce such quarterly data collected such as
income, quarterly expenses, quarterly profit, and so on. The work
done by the accountants is external to the data warehouse. However, i data collected are
used “internally” within the company – from marketing, sales, etc.
Another anomaly, which will not be discussed, is that of data collected external.
Another great kind of data collected which can be found on a date
warehouse is that of permanent detail data. These cause the
need to store permanently i data collected at one level
detailed for ethical or legal reasons. If a company is exhibiting i
related workers to hazardous substances there is a need for data collected
detailed and permanent. If a company produces a product that
it involves public safety, what parts of an airplane, there is
the need for data collected permanent details, as well as if a company
enters into dangerous contracts.
The company cannot afford to overlook the details why
during the next few years, in the case of a lawsuit, a recall, a
disputed construction defect, etc. company exposure
it could be great. Consequently there is a unique type of data collected
known as permanent detail data.
SUMMARY
Un data warehouse is an oriented, integrated, variant object of
time, a collection of data collected non-volatile in support of the needs of
decision of the administration. Each of the salient functions of
un data warehouse has its implications. Plus there are four
levels of data collected of the data warehouse:
▪ Old details
▪ Current detail
▪ Data slightly summarized
▪ Data highly summarized
Metadata is also an important part of the data warehouse.
ABSTRACT
The concept of the storage of data collected recently received
a lot of attention and it became a trend of the 90's. That is it
due to the ability of a data warehouse to overcome the
limitations of administration support systems such as i
decision aid systems (DSS) and information systems
executive (EIS).
Although the concept of the data warehouse looks promising,
implement i data warehouse it can be problematic because of
large-scale storage processes. Despite the
complexity of storage projects data collected, many suppliers
and consultants who store data collected they claim that
the storage of data collected current does not involve problems.
However, at the beginning of this research project, almost none
independent, rigorous and systematic research had been carried out. Of
Consequently it is difficult to say what really happens
in industry when building data warehouse.
This study explored the storage practice of data collected
contemporaries that aims to develop a richer understanding
of Australian practice. Analysis of the literature provided the
context and foundation for empirical study.
There are a number of results from this search. First
place, this study revealed the activities that occurred
during the development of the data warehouse. In many areas, i data collected gathered
confirmed the practice reported in the literature. In second
site, issues and issues that may impact
development of the data warehouse were identified from this study.
Finally, benefits derived from the Australian organizations associated with
the use of data warehouse have been revealed.
Chapter 1
Research context
The concept of data warehousing has received widespread popularity
exposure and has morphed into an emerging trend in
90s (McFadden 1996, TDWI 1996, Shah and Milstein 1997,
Shanks and others. 1997, Eckerson 1998, Adelman and Oates 2000). This
can be seen by the growing number of articles on the date
warehousing in commercial publications (Little and Gibson 1999).
Many articles (see, for example, Fisher 1995, Hackathorn 1995,
Morris 1995a, Bramblett and King 1996, Graham et al. 1996,
Sakaguchi and Frolick 1996, Alvarez 1997, Brousell 1997, Clarke
1997, McCarthy 1997, O 'Donnell 1997, Edwards 1998, TDWI
1999) reported significant benefits to organizations
which implement i data warehouse. They supported their theory
with anecdotal evidence of successful implementations, the high return
on investment figures (ROI) and, also, providing guidance
reference or methodologies for the development of data warehouse
(Shanks et al. 1997, Seddon and Benjamin 1998, Little and Gibson
1999). In an extreme case, Graham et al. (1996) have
reported an average return on a three-year investment of 401%.
Much of the current literature, however, has neglected
complexities involved in undertaking such projects. The projects of
data warehouse they are normally complex and large-scale e
therefore they imply a high probability of failing if they are not
carefully controlled (Shah and Milstein 1997, Eckerson 1997,
Foley 1997b, Zimmer 1997, Bort 1998, Gibbs and Clymer 1998, Rao
1998). They require vast amounts of both human and resources
financial and, time and effort to build them (Hill 1998, Crofts 1998). The
typical time and the necessary financial means are respectively
about two years and two or three million dollars (Braly 1995, Foley
1997b, Bort 1998, Humphries et al. 1999). These times and means
financials are required to control and consolidate many aspects
different types of data warehousing (Cafasso 1995, Hill 1998). To the side
hardware and software considerations, other functions, which vary
from the extraction of data collected to the loading processes of data collected, From
memory capacity to manage updates and meta data collected
for user training, they must be considered.
At the time of the start of this research project, there was very little
academic research conducted in the field of data warehousing,
especially in Australia. This was evident from the shortage of articles
published on data warehousing by newspapers or other writings
academics of the time. Many of the academic writings
available described the US experience. The lack of
Academic research in the sl data warehousing area has caused the
demand for rigorous research and empirical studies (McFadden 1996,
Shanks and others. 1997, Little and Gibson 1999). In particular, the studies
research on the implementation process of data warehouse
they need to be done to extend knowledge
general about the implementation of data warehouse e
will serve as the basis for a future research study (Shanks ed
others. 1997, Little and Gibson 1999).
The purpose of this study, therefore, is to study what is really
it happens when organizations maintain and use data
warehouse in Australia. Specifically, this study will involve
an analysis of an entire development process of a data warehouse,
starting from initiation and planning through design and
implementation and subsequent use within organizations
Australian. In addition, the study will also contribute to existing practice
identifying areas where practice can be further
improved and inefficiencies and risks can be minimized or
avoided. In addition, it will serve as the basis for other studies on data warehouse in
Australia and will fill the gap that currently exists in the literature.
Research questions
The goal of this research is to study the activities involved
in the implementation of data warehouse and their use by
Australian organizations. In particular, the elements are studied
with regard to project planning, development, al
operation, use and risks in question. Hence the question
of this research is:
“How is the current practice of data warehouse in Australia?"
To effectively answer this problem, a
certain number of subsidiary research questions. In particular, three
sub-questions were identified from the literature, that is
presented in chapter 2, to guide this research project:
How are i data warehouse by organizations
Australian? What are the problems encountered?
What are the benefits experienced?
In answering these questions, a drawing was used
exploratory research employing inquiry. As a study
exploratory, the answers to the above questions are not complete
(Shanks et al. 1993, Denscombe 1998). In this case, it is
triangulation is required to improve responses to these
questions. However, the investigation will provide a solid foundation for
future work examining these questions. A detailed one
discussion on the justification of the research method and design
is presented in chapter 3.
Structure of the research project
This research project is divided into two parts: the contextual study
of the concept of datawarehousing and empirical research (see
figure 1.1), each of which is discussed below.
Part I: Contextual study
The first part of the research consisted in re-examining the
current literature on various types of data warehousing including i
decision aid systems (DSS), information systems
executive (EIS), the case studies of data warehouse and date concepts
warehouse. Furthermore, the results of the foum sui data warehouse and
meeting groups for experts and professionals led by the group of
Monash DSS research, contributed to this phase of the study
which was intended to obtain information on the practice of the data
warehouse and to identify the risks involved in their adoption.
During this period of contextual study, understanding
the problem area has been established to provide knowledge of
basis for subsequent empirical investigations. However, this
was an ongoing process while conducting the study of
search.
Part II: Empirical Research
The relatively new concept of data warehousing, especially
in Australia, it created the need to conduct an investigation for
get a large picture of the usage experience. This
part was done once the problem domain was
been established through extensive literature review. The concept
of data-warehousing formed during the contextual study phase is
was used as input for the initial questionnaire of this study.
After this, the questionnaire was reviewed. Six date experts
warehouse participated in the test. The purpose of the
Initial questionnaire was to check for completeness and accuracy
some questions. Based on the test results, the questionnaire is
been modified and the modified version was sent to
survey participants. Questionnaires returned then were
analyzed for i data collected in tables, diagrams and other formats. THE
analysis results of data collected form a snapshot of the
practice of data warehousing in Australia.
DATA WAREHOUSING OVERVIEW
The concept of data warehousing has evolved with the improvements
of computer technology.
It is aimed at overcoming the problems encountered by groups of
support of applications such as Decision Support System (DSS) e
Executive Information System (EIS).
In the past the biggest obstacle of these applications has been
the inability of these applications to provide a data base
necessary for analysis.
This is mainly caused by the nature of the work of the
leadership. The interests of a company's management vary
constantly depending on the treated area. Therefore i data collected
fundamental for these applications must be able to
change quickly depending on the part to be treated.
This means that i data collected they must be available in the form
adequate for the required analyzes. In fact, the support groups of the
applications found many difficulties in the past to collect and
integration data collected from complex and diverse sources.
The rest of this section presents an overview of the concept of
data warehousing and deals with how the data warehouse can exceed i
Application Support Groups Issues.
The term "Data Warehouse”Was released by William Inmon in 1990.
Its often cited definition sees the Data Warehouse like
collection of data collected subject-oriented, integrated, non-volatile, and variable
over time, to support management decisions.
Using this definition Inmon points out that i data collected residents
in a data warehouse must possess the following 4
features:
▪ Subject-oriented
▪ Integrated
▪ Non volatile
▪ Variable over time
By Subject Oriented Inmon means that i data collected in the date
warehouse in the largest organizational areas that have been
defined in the model data collected. For example all data collected concerning i clients
are contained in the subject area CLIENTS. Likewise all
data collected related to the products are contained in the subject area
PRODUCTS.
By Integrati Inmon means that i data collected coming from different
platforms, systems and locations are combined and stored in
only place. Consequently data collected similar must be transformed
in consistent formats to be added and compared
easily.
For example the male and female gender are represented
by the letters M and F in one system, and with 1 and 0 in another. For
integrate them in the right way, one or both formats must
be transformed so that the two formats are equal. In this
case we could change M to 1 and F to 0 or vice versa. Oriented to
subject and integrates indicate that the data warehouse It is designed for
provide a functional and transversal vision of data collected gives part
company.
Non-volatile means that i data collected in data warehouse remain
consistent and updating of data collected it is not necessary. Instead, each
change in data collected originals is added to the of the date
warehouse. This means that the historian of the data collected is contained in the
data warehouse.
For Variables Over Time Inmon indicates that i data collected in data warehouse
always contain time markers and i data collected normally
they cross a certain time horizon. For example a
data warehouse can contain 5 years of historical values of the clients twigs
1993 to 1997. The availability of the historian and a time series
of the data collected allows you to analyze trends.
Un data warehouse he can collect his own data collected from systems
OLTP; from origins data collected external to the organization and / or by other specialists
capture system projects data collected.
I data collected extracts can go through a cleaning process, in
this case i data collected they are transformed and integrated before being
stored in the of the data warehouse. Then, i data collected
residing inside the of the data warehouse are made available
end user access and recovery tools. Using
these tools the end user can access the integrated view
of the organization of data collected.
I data collected residing inside the of the data warehouse are
stored both in detail and in summary formats.
The level of summary may depend on the nature of the data collected. THE data collected
detailed may consist of data collected current and data collected historical
I data collected real are not included in the data warehouse until i data collected
in data warehouse are refreshed.
In addition to storing i data collected themselves, a data warehouse can also
store a different type of date called METADATA which
describe i data collected residing in his .
There are two types of metadata: development metadata and
analysis.
Development metadata is used to manage and automate
processes of extraction, cleaning, mapping and loading of data collected in
data warehouse.
The information contained in the development metadata may contain
details of operating systems, details of the elements to be extracted, the
model data collected of the data warehouse and corporate rules for the
conversion of data collected.
The second type of metadata, known as analytics metadata
enables the end user to explore the content of the data
warehouse to find the data collected available and their meaning in terms
clear and non-technical.
Therefore analytics metadata functions as a bridge between the data
warehouse and end-user applications. This metadata can
contain the business model, descriptions of data collected correspondents
to the business model, pre-defined queries and reports,
information for user logins and the index.
The analysis and development metadata must be combined into one
integrated containment metadata to work properly.
Unfortunately, many of the existing tools have their own
metadata and currently there are no existing standards that
allow data warehousing tools to integrate these
metadata. To remedy this situation many merchants of the
main data warehousing tools have formed Meta Data
Council which later became the Meta Data Coalition.
The purpose of this coalition is to build a set of metadata
standard that allows different data warehousing tools of
convert metadata
Their efforts resulted in the birth of the Meta
Data Interchange Specification (MDIS) that will allow the exchange
of information between the Microsoft archives and the related MDIS files.
The existence of data collected both summarized / indexed and detailed gives
to the user the possibility to perform a DRILL DROWN
(drilling) come on data collected indexed to detailed ones and vice versa.
The existence of data collected detailed histories allows the making of
trend analysis over time. In addition the analysis metadata can
be used as the directory of the of the data warehouse for
help end users locate i data collected necessary.
In comparison to OLTP systems, with their ability to support
analysis of data collected and reporting, the data warehouse it is seen as a system
more appropriate for information processes such as carrying out e
answer queries and produce reports. The next section
will highlight the differences of the two systems in detail.
DATA WAREHOUSE AGAINST OLTP SYSTEMS
Many of the information systems within organizations
they are meant to support daily operations. These
systems known as OLTP SYSTEMS, capture transactions
continuously updated every day.
I data collected within these systems they are often modified, added or
deleted. For example a customer address just changes
he moves from place to place. In this case the new address
will be registered by changing the address field of the .
The main objective of these systems is to reduce the costs of
transactions and at the same time reduce processing times.
Examples of OLTP systems include critical actions such as writes
accounting of orders, payrolls, invoices, manufacturing, ai services clients.
Unlike OLTP systems, which were built for processes
based on transactions and events, i data warehouse were created
to support processes based on the analysis of data collected and
decision processes.
This is normally achieved by integrating i data collected from various systems
OLTP and external in a single "container" of data collected,as discussed
in the previous section.
Monash Data Warehousing Process Model
The process model for data warehouse Monash was developed by the
Monash DSS Research Group researchers, is based on the
literatures of data warehouse, on the experience in supporting the
development of systems fields, on discussions with vendors of
applications for use on data warehouse, on a group of experts
in the use of data warehouse.
The phases are: Initiation, Planning, Development, Operations and
Explanations. The diagram explains the iterative nature o
evolutionary development of a data warehouse process using
two-way arrows placed between the different stages. In this
"iterative" and "evolutionary" context mean that, at each
step of the process, implementation activities are possible
always propagate back to the previous stage. This is
due to the nature of the project of a data warehouse in which
additional requests from the part take over at any time
end user. For example, during the development phase of a
process of data warehouse, one is required by the end user
new dimension or subject area, which did not pertain to the
original plan, this must be added to the system. This
causes a change in the project. The result is that the
design must change the requirements of the documents created so far
during the design phase. In many cases, the current state of the
project must go back to the design stage where
the new request must be added and documented. The user
final must be able to see the specific documentation revised and
changes that have been made in the development phase. At the end of
this development cycle the project must get excellent feedback from
both teams, the development team and the user team. THE
feedback is then reused to improve a future project.
Capacity planning
DWs tend to be very large in size and grow
very quickly (Best 1995, Rudin 1997a) following the
amount of data collected historians that they retain from their lifetime. There
growth can also be caused by data collected additional required by
users to increase the value of data collected that they already have. Of
accordingly, the storage requirements for data collected can
be significantly enhanced (Eckerson 1997). So, it is
essential to ensure, conducting a planning of the
capacity, which the system to be built can grow with
growing needs (Best 1995, LaPlante 1996, Lang 1997,
Eckerson 1997, Rudin 1997a, Foley 1997a).
In planning for dw scalability, one must know the
expected growth in warehouse size, types of questions
likely to be carried out, and the number of end users supported (Best
1995, Rudin 1997b, Foley 1997a). Build scalable applications
requires a combination of scalable and technical server technologies
scalable application design (Best 1995, Rudin 1997b.
Both are required in creating an application
extremely scalable. Scalable server technologies can
make it easy and profitable to add storage, memory and
CPU without degrading performance (Lang 1997, Telephony 1997).
There are two main scalable server technologies: compute
symmetric multiple (SMP) and massive processing
parallel (MPP)) (IDC 1997, Humphries et al. 1999). One server
SMP normally has multiple processors sharing a memory,
bus system and other resources (IDC 1997, Humphries et al. 1999).
Additional processors can be added to increase
its power computational. Another method of increasing the
power computational of the SMP server, you combine numerous
SMP machines. This technique is known as clustering (Humphries
et al. 1999). An MPP server, on the other hand, has multiple processors each
with its own memory, bus system and other resources (IDC 1997,
Humphries et al. 1999). Each processor is called a node. A
increase in power computational can be achieved
adding additional nodes to MPP servers (Humphries et al.
1999).
A weakness of SMP servers is that too many input-output operations
(I / O) can congest the bus system (IDC 1997). This
problem does not occur within MPP servers since each
processor has its own bus system. However, the interconnections
between each node they are generally much slower than the bus system
of the SMPs. Additionally, MPP servers can add a layer
additional complexity to application developers (IDC
1997). Thus, the choice between SMP and MPP servers can be influenced
by many factors, including the complexity of the questions, the relationship
price / performance, the required processing capacity, the
dw applications prevented and the increase in size of
of dw and in the number of end users.
Numerous scalable application design techniques
they can be used in capacity planning. One
uses various notification periods such as days, weeks, months and years.
Having various notification periods, the can be divided into
manageably grouped pieces (Inmon et al. 1997). Another one
technique is to use summary tables that are constructed
summing up data collected da data collected detailed. Thus, i data collected summaries are more
compact than detailed, which requires less memory space.
So the data collected details can be stored in a unit of
cheaper storage, which saves even more storage.
Although using summary tables can save space of
memory, they require a lot of effort to keep them updated and in
in line with commercial needs. However, this technique is
widely used and often used in conjunction with the technique
previous (Best 1995, Inmon 1996a, Chauduri and Dayal
1997).
defining Data Warehouse Technical
Architectures Definition of techniques
architectures by dw
Initial adopters of data warehousing primarily conceived
a centralized implementation of dw where all data collected, including
i data collected external, were integrated into a single,
physical repository (Inmon 1996a, Bresnahan 1996, Peacock 1998).
The main advantage of this approach is that the end users
I am able to access the view on an entrepreneurial scale
(enterprise-wide view) of the data collected organizational (Ovum 1998). Another
advantage is that it offers standardization of data collected through
organization, which means there is only one version or
definition for each terminology used in the dw repository
(reposity) metadata (Flanagan and Safdie 1997, Ovum 1998). The
disadvantage of this approach, on the other hand, is that it is expensive and difficult
to be built (Flanagan and Safdie 1997, Ovum 1998, Inmon et al.
1998). Not long after the storage architecture data collected
centralized became popular, the concept of extraction evolved
of the smaller subsets of the data collected to support the needs of
specific applications (Varney 1996, IDC 1997, Berson and Smith
1997, peacock 1998). These small systems are derived from the more
great data warehouse centralized. They are called date
dependent departmental warehouses or dependent data marts.
The dependent data mart architecture is known as
three-tiered architecture in which the first row consists of the date
centralized warehouse, the second consists of the warehouses of data collected
departmental and the third consists of access to data collected and the tools of
analysis (Demarest 1994, Inmon et al. 1997).
Data marts are normally built after the data warehouse
centralized was built to meet the needs of
unit specifications (White 1995, Varney 1996).
Data marts store i data collected very relevant relating to details
units (Inmon et al. 1997, Inmon et al. 1998, IA 1998).
The advantage of this method is that there will be none date not
integrated and that i data collected they will be less redundant within the data
marts since all data collected come from a deposit of data collected integrated.
Another advantage is that there will be few links between each
data mart and related sources of data collected because each data mart only has
a source of data collected. Plus with this architecture in place, users
endings can still access the overview of data collected
corporate organizational. This method is known as the
top-down method, in which data marts are constructed after the date
warehouse (peacock 1998, Goff 1998).
Increasing the need to show results early, some
organizations have begun to build independent data marts
(Flanagan and Safdie 1997, White 2000). In this case, the data marts
they take theirs data collected directly from the basics of data collected OLTP and not since
centralized and integrated storage, thus eliminating the need for
have the central warehouse in place.
Each data mart requires at least one link to its sources
di data collected. A disadvantage of having multiple links for each date
mart is that, compared to the two previous architectures, the
overabundance of data collected increases significantly.
Each data mart must store all data collected locally required for
have no effect on OLTP systems. This causes the data collected
they are stored in different data marts (Inmon et al. 1997).
Another disadvantage of this architecture is that it leads to the
creation of complex interconnections between data marts and theirs
sources of data collected which are difficult to carry out and control (Inmon ed
others. 1997).
Another drawback is that end users cannot power
access the company information overview since i data collected
of the different data marts are not integrated (Ovum 1998).
Yet another downside is that there may be more than one
definition for any terminology used in the data marts it generates
inconsistencies of data collected in the organization (Ovum 1998).
Despite the disadvantages discussed above, independent data marts
still attract the interest of many organizations (IDC 1997).
One factor that makes them attractive is that they are quicker to develop
and require less time and resources (Bresnahan 1996, Berson e
Smith 1997, Ovum 1998). Consequently, they mainly serve
as proof projects that can be used to identify
quickly the benefits and / or imperfections in the project (Parsaye
1995, Braly 1995, Newing 1996). In this case, the from part
implementing in the pilot project must be small but important
for the organization (Newing 1996, Mansell-Lewis 1996).
By examining the prototype, end users and administration can
decide whether to continue or stop the project (Flanagan and Safdie
1997).
If the decision is to continue, the data marts for other sectors
they should be built one at a time. There are two options for
end users based on their needs in building data
independent matrs: integrated / federated and unintegrated (Ovum
1998)
In the first method, each new data mart should be built
based on current data marts and model data collected used
from the firm (Varney 1996, Berson and Smith 1997, Peacock 1998).
The need to use the model data collected of the company makes it necessary
make sure there is only one definition for each terminology
used through data marts, this also to make sure that date
different marts can be joined to give an overview of the
corporate information (Bresnahan 1996). This method is
referred to as the bottom-up and is best when there is a constraint on
financial means and time (Flanagan and Safdie 1997, Ovum 1998,
peacock 1998, Goff 1998). In the second method, the data marts
built can only meet the needs of a specific unit.
A variant of the federated data mart is the data warehouse distributed
in which the server hub middleware is used to join many
data marts in a single deposit of data collected distributed (White 1995). In
this case, i data collected companies are distributed in several data marts.
End user requests are forwarded to
server hub middleware, which extracts all data collected required by the data
marts and returns the results to the end user applications. This
method provides business information to end users. However,
the problems of the data marts are not yet eliminated
independent. There is another architecture that can be used which is
called the data warehouse virtual (White 1995). However, this one
architecture, which is described in figure 2.9, is not an architecture
of storage of data collected real since it does not move the load
from OLTP systems to data warehouse (Demarest 1994).
In fact, the requests for data collected end users have passed over the
OLTP systems that return results after processing the
user requests. Although this architecture allows users
to generate reports and formulate requests, cannot provide i
data collected history and overview of company information since i data collected
from the different OLTP systems are not integrated. So this
architecture cannot satisfy the analysis of data collected complex which ad
forecast example.
Selection of access applications and
recovery of data collected
The purpose of building a data warehouse is to convey
information to end users (Inmon et al 1997, Poe 1996,
McFadden 1996, Shanks et al 1997, Hammergren 1998); one or
multiple access and recovery applications data collected must be provided. To
today, there is a wide variety of these applications among which the user can
choose (Hammergren 1998, Humphries et al 1999). The
selected applications determine the success of the effort
of storage of data collected in an organization because the
applications are the most visible part of the data warehouse to the user
final (Inmon et al 1997, Poe 1996). For a successful date
warehouse, must be able to support the analysis activities of the data collected
end user (Poe 1996, Seddon and Benjamin 1998, Eckerson
1999). So the "level" of what the end user wants has to be
identified (Poe 1996, Mattison 1996, Inmon et al 1997,
Humphries et al 1999).
In general, end users can be grouped into three
categories: executive users, business analysts and power users (Poe
1996, Humphries et al. 1999). Executive users need
easy access to predefined sets of reports (Humphries ed
others 1999). These relationships can easily be achieved with
menu navigation (Poe 1996). Plus, reports should
present information using graphical representation
like tables and templates to quickly deliver
information (Humphries et al 1999). Business analysts, who don't
they may have the technical capabilities to build relationships from
zero by themselves, need to be able to change the current ratios for
meet their specific needs (Poe 1996, Humphries et al
1999). Power users, on the other hand, are the kind of end users that
they have the ability to generate and write requests and reports from
zero (Poe 1996, Humphries et al. 1999). They are the ones who
develop reports for other types of users (Poe 1996, Humphries
and others 1999).
Once the end user requirements have been determined it must be done
a selection of access and retrieval applications data collected among all
those available (Poe 1996, Inmon et al 1997).
Access to data collected and retrieval tools can be
classified into 4 types: OLAP tool, EIS / DSS tool, query tool and
reporting and data mining tool.
OLAP tools allow users to create ad hoc queries as well as
those made on of the data warehouse. Plus these products
allow users to drill down from data collected general to those
detailed.
EIS / DSS tools provide executive reporting as "what if" analysis
and access to reports organized in menus. Reports must be
pre-defined and merged with menus for easier navigation.
Query and reporting tools allow users to produce reports
predefined and specific.
Data mining tools are used to identify relationships that
could shed new light on forgotten operations in data collected of the
data warehouse.
Alongside the optimization of the requirements of each type of user, i
selected tools must be intuitive, efficient and easy to use.
They also need to be compatible with other parts of the architecture e
able to work with existing systems. It is also suggested to
choose data access and retrieval tools with prices and performance
reasonable. Other criteria to consider include the commitment of the
vendor of the tool in supporting their product and the developments that make it
same will have in future releases. To ensure user engagement
in using the data warehouse, the development team involves the
users in the tool selection process. In this case
a practical user evaluation should be made.
To improve the value of the data warehouse the development team can
also provide web access to their data warehouses. A
web-enabled datawarehouse allows users to access data collected
from remote places or while traveling. Also the information can
be provided at lower costs by reducing costs
of training.
2.4.3 Data Warehouse Operation Phase
This phase consists of three activities: definition of date strategies
refresh, control of data warehouse activities and management of
data warehouse security.
Definition of data refresh strategies
After the initial upload, i data collected in of the datawarehouse
they must be periodically refreshered to reproduce
changes made on data collected originals. It is therefore necessary to decide
when to refresh, how often should the
refresh and how to refresh the data collected. It is suggested to do the
refresh of data collected when the system can be taken offline. There
refresh rate is determined by the relying development team
on user requirements. There are two approaches to refresh the
datawarehouse: complete refresh and continuous loading of
changes.
The first approach, full refresh, requires reloading the
all the data collected from scratch. This means that all data collected required must
be extracted, cleaned, transformed and integrated into each refresh. This
approach should be avoided as much as possible because
requires a lot of time and resources.
An alternative approach is to continuously load the
changes. This adds i data collected that have been changed
from the last refresh cycle of the data warehouse. The identification of
new or modified records significantly reduces the amount of
data collected which must be propagated to the data warehouse in each
update since only these data collected will be added to the
of the datawarehouse.
There are at least 5 approaches that can be used to withdraw
i data collected new or modified. To obtain an efficient strategy of
refresh of data collected a mixture of these approaches may be useful
pick up all changes in the system.
The first approach, which uses timestamps, assumes it comes
assigned to all data collected modified and updated a timestamp so
to be able to easily identify all data collected modified and new.
This approach, however, has not been widely used in most
part of today's operating systems.
The second approach is to use a delta file generated by
an application that contains only the changes made to the data collected.
Using this file also amplifies the update cycle.
However, even this method has not been used in many
applications.
The third approach is to scan a log file, which
basically it contains information similar to the delta file. The only one
difference is that a log file is created for the recovery process and
it can be difficult to understand.
The fourth approach is to modify the application code.
However most of the application code is old and
fragile; therefore this technique should be avoided.
The last approach is to compare the data collected sources with the file
main of data collected.
Control of the data warehouse activities
Once the data warehouse has been released to users, it is
necessary to monitor it over time. In this case, the administrator
of the data warehouse can use one or more management tools e
control to monitor the use of the data warehouse. In particular
information about people and the time in
which they access the data warehouse. Come on data collected collected can be created
a profile of the work performed which can be used as input
in the implementation of the user's chargeback. The Chargeback
allows users to be informed about the cost of processing the
data warehouse.
In addition, the data warehouse control can also be used for
identify the types of queries, their size, the number of queries per
day, query reaction times, sectors reached and quantity
di data collected processed. Another purpose of doing the checking of the
datawarehouse is to identify the data collected which are not in use. These data collected
they can be removed from the data warehouse to improve time
query execution response and control the growth of
data collected that reside within the data base of the datawarehouse.
Security management of the data warehouse
A data warehouse contains data collected integrated, critical, sensitive that
can be reached easily. For this reason it should
be protected from unauthorized users. One way to
implement security is to use the function of the DBMS
to assign different privileges to different types of users. In this
Thus, a profile must be maintained for each type of user
access. Another way to secure the data warehouse is to encrypt it
as it is written in data base of the datawarehouse. Access to
data collected and the retrieval tools must decrypt the data collected before presenting i
results to users.
2.4.4 Data Warehouse Deployment Phase
It is the last phase in the data warehouse implementation cycle. The
activities to be carried out in this phase include training of
users to use the data warehouse and carry out reviews
of the datawarehouse.
User training
User training should be done first
access to data collected of the data warehouse and the use of the
retrieval. Generally, sessions should start with
the introduction to the concept of the storage of data collected,
content of the datawarehouse, to the meta data collected and the basic features
of tools. Then, more advanced users could also study the
physical tables and the features of the users of data access and tools
retrieval.
There are many approaches to training users. One of
these provide a selection of many users or analysts chosen by a
set of users, relying on their leadership and skills
communication. These are trained in a personal capacity on
all they need to know to become familiar with the
system. After the training, they return to their work e
they start teaching other users how to use the system. On
based on what they have learned, other users can start to
explore the datawarehouse.
Another approach is to train many users in the same
time, as if you were taking a classroom course. This method
it is suitable when there are many users who need to be trained
at the same time. Yet another method is to train
each user individually, one by one. This method is
suitable when there are few users.
The purpose of user training is to become familiar
with access to data collected and the retrieval tools as well as the contents of the
datawarehouse. However, some users can be overwhelmed
the amount of information provided during the session
training. So a number of must be done
update sessions ongoing assistance and to respond
to specific questions. In some cases a group of
users to provide this type of support.
Gather feedback
Once the data warehouse has been rolled out, users can
use i data collected residing in the data warehouse for various purposes.
Mainly, analysts or users use the data collected in
datawarehouse for:
1 Identify company trends
2 Analyze the purchasing profiles of clients
3 Divide i clients and of
4 Provide the best services to clients - customize services
5 Formulate strategies marketing
6 Make competitive estimates for cost analyzes and help
control
7 Support strategic decision-making
8 Identify opportunities to emerge
9 Improve the quality of current business processes
10 Check the profit
Following the development direction of the data warehouse, they could
conducting a series of revisions to the system to obtain feedback
from both the development team and the community
end users.
The results obtained can be taken into account for the
next development cycle.
Since the data warehouse has an incremental approach,
it is essential to learn from the successes and mistakes of previous ones
developments.
2.5 Summary
In this chapter, the approaches found in
literature. In section 1 the concept of
datawarehouse and its role in decision science. In
section 2 described the main differences between
data warehouse and OLTP systems. Section 3 discussed the
Monash data warehouse model that was used
in section 4 to describe the activities involved in the process
development of a data warehouse, these theses were not based on
rigorous research. What happens in reality can be
very different from what the literature reports, however these
results can be used to create a basic baggage that
underline the concept of data warehouse for this research.
Chapter 3
Research and design methods
This chapter deals with research and design methods for
this study. The first part shows a generic view of the methods
also available for the retrieval of information
The criteria for selecting the best method for one are discussed
particular study. Two methods are then discussed in section 2
selected with the criteria set out above; of these will be chosen and
adopted one with the reasons set out in section 3 where they are
the reasons for the exclusion of the other criterion were also presented. There
section 4 presents the research project and section 5 le
conclusions.
3.1 Research in information systems
Research in information systems is not limited to simply
to the technological realm but must also be extended to include
purposes relating to behavior and organization.
We owe this to the theses of various disciplines ranging from
social to natural sciences; this leads to the need for a
certain spectrum of research methods involving quantitative methods
and qualitative to be used for information systems.
All available research methods are important, in fact varied
researchers such as Jenkins (1985), Nunamaker et al. (1991), and Galliers
(1992) argue that there is no universal specific method
to conduct research in the various fields of information systems; indeed
a method may be suitable for a particular search but not
for others. This leads us to the need to select a method which
is suitable for our particular research project: for this
choice Benbasat et al. (1987) state that they must be considered
the nature and purpose of the research.
3.1.1 Nature of the research
Various methods based on the nature of research can be
classified into three widely known traditions in science
information: positivist, interpretative and critical research.
3.1.1.1 Positivist research
Positivist research is also known as scientific study or
empirical. It seeks to: “explain and predict what will happen in
social world looking at regularities and cause-effect relationships
among its constituent elements ”(Shanks et al 1993).
Positivist research is also characterized by repeatability,
simplifications and refutations. Furthermore positivist research admits
the existence of a priori relationships between the phenomena studied.
According to Galliers (1992), taxonomy is a research method
included in the positivist paradigm, which however is not limited to this,
in fact there are laboratory experiments, field experiments,
case studies, theorem proofs, predictions and simulations.
Using these methods the researchers admit that the phenomena
studied can be observed objectively and rigorously.
3.1.1.2 Interpretative research
Interpretative research, which is often called phenomenology or
anti-positivism is described by Neuman (1994) as “analysis
systematic of the social meaning of action through direct e
detailed observation of people in natural situations in order
to arrive at the understanding and interpretation of how
people create and maintain their social world ”. Studies
interpretative reject the assumption that the observed phenomena
can be observed objectively. Indeed they are based
on subjective interpretations. Also interpretive researchers do not
they impose a priori meanings on the phenomena they study.
This method includes subjective / argumentative studies, actions of
research, descriptive / interpretative studies, future research and games of
role. In addition to these surveys and case studies can be
included in this approach as they concern the studies of
individuals or organizations within complex situations
of the real world.
3.1.1.3 Critical research
Critical research is the least known approach in the sciences
social but has recently received the attention of researchers
in the field of information systems. The philosophical assumption that the
social reality is historically produced and reproduced by people,
as well as social systems with their actions and interactions. Their
skill, however, is mediated by a number of consideration
social, cultural and political.
As well as interpretative research, critical research argues that the
positivist research has nothing to do with the social context and ignores the
its influence on human actions.
Critical research, on the other hand, criticizes interpretative research for
be too subjective and because it does not aim to help
people to improve their lives. The biggest difference between the
critical research and the other two approaches is its evaluative dimension.
While the objectivity of the positivist and interpretative traditions is for
predict or explain the status quo or social reality, critical research
aims to critically evaluate and transform the social reality below
study.
Critical researchers usually oppose the status quo in order to
remove social differences and improve social conditions. There
critical research has a commitment to a procedural view of the
phenomena of interest and, therefore, is normally longitudinal.
Examples of research methods are long-term historical studies and
ethnographic studies. Critical research, however, was not
widely used in information systems research
3.1.2 Purpose of the research
Together with the nature of the research, its purpose can be used
to guide the researcher in selecting a particular method
Research. The purpose of a research project is closely related
the position of the search with respect to the search cycle that consists of
three phases: construction of the theory, test of the theory and refinement of the
theory. Thus, based on the moment relative to the research cycle, a
research project can have an explanatory, descriptive purpose
exploration or predictive.
3.1.2.1 Exploratory research
Exploratory research is aimed at investigating a topic
totally new and formulate questions and hypotheses for research
future. This type of research is used in the construction of the
theory to obtain initial references in a new area.
Normally qualitative research methods are used, such as cases
study or phenomenonological studies.
However, it is also possible to employ quantitative techniques such as
exploratory investigations or experiments.
3.1.3.3 Descriptive research
Descriptive research is aimed at analyzing and describing in great
detail a particular organizational situation or practice. This
it is suitable for building theories and can also be used for
confirm or dispute hypotheses. Descriptive research usually
includes the use of measurements and samples. The most suitable research methods
they include investigations and background analyzes.
3.1.2.3 Explanatory research
Explanatory research tries to explain why things happen.
It is built on facts that have already been studied and tries to find
the reasons for these facts.
So explanatory research is normally built on research
exploratory or descriptive and is ancillary in order to test and refine
the theories. Explanatory research typically employs case studies
o survey-based research methods.
3.1.2.4 Preventive research
Preventive research aims to predict events and behaviors
under observation that they are being studied (Marshall and Rossman
1995). Prediction is the standard scientific test of truth.
This type of research generally employs investigation or analysis of
data collected historians. (Yin 1989)
The above discussion shows that there are a number of
possible research methods that can be used in a study
particular. However, there must be a specific method that is more suitable
others for a particular type of research project. (Galliers
1987, Yin 1989, De Vaus 1991). Every researcher, therefore, has
need to carefully evaluate the strengths and weaknesses of
various methods, in order to adopt the most suitable research method e
compatible with the research project. (Jenkins 1985, Pervan and Klass
1992, Bonomia 1985, Yin 1989, Himilton and Ives 1992).
3.2. Possible research methods
The goal of this project was to study the experience in
Australian organizations with i data collected stored with one
development of data warehouse. Date which, currently, there is one
lack of research in the area of data warehousing in Australia,
this research project is still in the theoretical phase of the cycle
research and has an exploratory purpose. Exploring the experience in
Australian organizations adopting data warehousing
it requires the interpretation of real society. Consequently, the
the philosophical assumption underlying the research project follows
the traditional interpretation.
After a rigorous review of the available methods, they were identified
two possible research methods: surveys and case studies
(case studies), which can be used for research
exploratory (Shanks et al. 1993). Galliers (1992) argues that
the suitability of these two methods for this particular study in the
his taxonomy revisited saying that they are suitable for construction
theoretical. The following two subsections discuss each method in
detail.
3.2.1 Method of investigation research
The method of investigation research comes from the ancient method of
census. A census consists of collecting information from
an entire population. This method is expensive and impractical, in
particularly if the population is large. So, compared to the
census, a survey is normally focused on the
collect information for a small number, or sample, of the
representatives of the population (Fowler 1988, Neuman 1994). A
sample reflects the population from which it is drawn, with different
levels of accuracy, according to the structure of the sample, the
size and selection method used (Fowler 1988, Babbie
1982, Neumann 1994).
The survey method is defined as "snapshots of practices,
situations or views at a particular point in time, undertaken using
questionnaires or interviews, from which inferences may be
made "(Galliers 1992: 153) [snapshot of the files,
situations or views in particular time point, undertaken using
questionnaires or interviews, from which inferences can be made]. The
surveys deal with the collection of information on certain aspects
of the study, by a number of participants, making
questions (Fowler 1988). Even these questionnaires and interviews, which
include face-to-face telephone interviews and structured interviews,
are the collection techniques of data collected most common used in
investigations (Blalock 1970, Nachmias and Nachmias 1976, Fowler
1988), observations and analyzes can be used (Gable
1994). Of all these methods of collection of the data collected, the use of the
questionnaire is the most popular technique, as it ensures that i data collected
collected are structured and formatted, and thus facilitates the
classification of information (Hwang 1987, de Vaus 1991).
In analyzing the data collected, an investigation strategy often employs
quantitative techniques, such as statistical analysis, but they can be
qualitative techniques are also employed (Galliers 1992, Pervan
and Klass 1992, Gable 1994). Normally, i data collected collected are
used to analyze distributions and association models
(Fowler 1988).
Although surveys are generally appropriate for research
dealing with the question 'what?' (what) or from it
deriving, such as 'quantum' (how much) and 'quant'è' (how many), esse
can be asked via the question 'why' (Sonquist and
Dunkelberg 1977, Yin 1989). According to Sonquist and Dunkelberg
(1977), the research investigation points to difficult hypotheses, program of
evaluation, describing the population and developing models of the
human behaviour. Also, surveys can be used
to study a certain opinion of the population, conditions,
past opinions, characteristics, expectations and even behavior
or present (Neuman 1994).
The investigations allow the researcher to discover the relationships between the
population and the results are usually more generic than
other methods (Sonquist and Dunkelberg 1977, Gable 1994). The
surveys allow researchers to cover a geographical area
wider and to reach many registrants (Blalock 1970,
Sonquist and Dunkelberg 1977, Hwang and Lin 1987, Gable 1994,
Neuman 1994). Finally, surveys can provide the information
which are not available elsewhere or in the form required for analysis
(Fowler 1988).
There are, however, some limitations in carrying out an investigation. One
disadvantage is that the researcher cannot get much information
about the object studied. This is due to the fact that the
investigations are carried out only at a particular time and, therefore,
there is a limited number of variables and people that the researcher can
studying (Yin 1989, de Vaus 1991, Gable 1994, Denscombe 1998).
Another drawback is what a survey can be
very expensive in terms of time and resources, particularly if
involves face-to-face interviews (Fowler 1988).
3.2.2. Investigation Research Method
The investigation research method involves in-depth study on
a particular situation within its real context in a
defined period of time, without any intervention by the
researcher (Shanks & C. 1993, Eisenhardt 1989, Jenkins 1985).
Mainly this method is used to describe the relationships between
the variables that are being studied in a particular situation
(Galliers 1992). Investigations may involve individual cases or
multiple, depending on the phenomenon analyzed (Franz and Robey 1987,
Eisenhardt 1989, Yin 1989).
The investigation research method is defined as “an inquiry
empirical which studies a contemporary phenomenon within the
its real context, using multiple sources collected from either or
multiple entities such as people, groups, or organizations ”(Yin 1989).
There is no clear separation between the phenomenon and its context e
there is no experimental control or manipulation of the variables (Yin
1989, Benbasat et al 1987).
There is a variety of techniques for the collection of the data collected that they can
be employed in the method of inquiry, which include the
direct observations, reviews of archive records, questionnaires,
documentation review and structured interviews. Having
a diverse range of collection techniques data collected, the inquiries
allow researchers to deal with both data collected qualitative that
quantitative at the same time (Bonoma 1985, Eisenhardt 1989, Yin
1989, Gable 1994). As is the case with the survey method, a
investigative researcher acts as an observer or researcher and not
as an active participant in the organization under study.
Benbasat et al. (1987) assert that the method of inquiry is
particularly suitable for the construction of research theory, which
it begins with a research question and continues with training
of a theory during the process of collecting data collected. Being
also suitable for stage
of the construction theory, Franz and Robey (1987) suggest that
the investigation method can also be used for the complex
theory stage. In this case, based on the evidence gathered, one
given theory or hypothesis is verified or refuted. Plus, the investigation is
also suitable for research dealing with questions 'like' or
'why' (Yin 1989).
Compared to other methods, inquiries allow the researcher to
capturing essential information in greater detail (Galliers
1992, Shanks et al. 1993). Furthermore, the investigations allow the
researcher to understand the nature and complexity of the processes studied
(Benbasat et al 1987).
There are four main drawbacks associated with the method
investigation. The first is the lack of controlled deductions. There
The researcher's subjectivity can skew the results and conclusions
of the study (Yin 1989). The second drawback is the lack of
controlled observation. Unlike experimental methods, the
Investigative researcher cannot control the phenomena studied
since they are examined in their natural context (Gable 1994). The
third disadvantage is the lack of replicability. This is due to the fact
that the researcher is unlikely to observe the same events, e
cannot verify the results of a particular study (Lee 1989).
Finally, as a consequence of non-replicability, it is difficult
generalize the results obtained from one or more surveys (Galliers
1992, Shanks et al. 1993). All these problems, however, fail
are insurmountable and can, in fact, be minimized by
researcher applying appropriate actions (Lee 1989).
3.3. Justify the research methodology
adopted
From the two possible research methods for this study, the
survey is regarded as the most suitable. That of inquiry is
been discarded following a careful consideration of the relative
merits and weaknesses. The convenience or inappropriateness of each
method for this study is discussed below.
3.3.1. Inappropriate research method
of inquiry
The investigation method requires thorough study about one
particular situation within one or more organizations for a
time period (Eisenhardt 1989). In this case, the period can
exceed the time frame given for this study. Another
reason for not adopting the investigation method is that the results
they can suffer from lack of rigor (Yin 1989). Subjectivity
of the researcher can influence the results and conclusions. Another
reason is that this method is more suitable for research on questions
of the type 'how' or 'why' (Yin 1989), while the research question
for this study it is of the 'what' type. Last but not least
Importantly, it is difficult to generalize the results from just one o
few investigations (Galliers 1992, Shanks et al 1993). On the base of
this rationale, the investigation research method is not
was chosen as unsuitable for this study.
3.3.2. Convenience of the research method
survey
When this research was conducted, the practice of datawarehousing
it had not been widely adopted by
Australian organizations. So, there wasn't much information
regarding their implementation within the
Australian organizations. The information available came
by organizations that had implemented or used a data
warehouse. In this case, the survey research method is the most
suitable because it allows to obtain information that is not
available elsewhere or in the form required for analysis (Fowler 1988).
In addition, the survey research method enables the researcher to
get a good insight into practices, situations, or
seen at a given time (Galliers 1992, Denscombe 1998).
An overview had been requested to augment the
knowledge about the Australian experience of data warehousing.
Again, Sonquist and Dunkelberg (1977) state that the results of
survey research are more general than other methods.
3.4. Survey Research Design
The investigation into the data warehousing practice was carried out in 1999.

The target population consisted of organizations
Australians interested in data warehousing studies, as they were
probably already informed about i data collected that they store and,
therefore, it could provide useful information for this study. There
target population was identified with an initial survey of
all Australian members of 'The Data Warehousing Institute' (Tdwiaap).
This section discusses the design of the research phase
empirical analysis of this study.
3.4.1. Technique of collection of data collected
From the three techniques commonly used in survey research
(ie questionnaire by mail, telephone interview and interview
personal) (Nachmias 1976, Fowler 1988, de Vaus 1991), per
this study was adopted the questionnaire by mail. The first
reason for adopting the latter is that it can achieve a
geographically dispersed population (Blalock 1970, Nachmias e
Nachmias 1976, Hwang and Lin 1987, de Vaus 1991, Gable 1994).
Secondly, the mailing questionnaire is suitable for participants
highly educated (Fowler 1988). The questionnaire by mail for this
study was addressed to data warehousing project sponsors,
directors and / or project managers. Third, the questionnaires on
mail are suitable when you have a safe list of
addresses (Salant and Dilman 1994). TDWI, in this case, a
trusted data warehousing association provided the mailing list
of its Australian members. Another advantage of the questionnaire
by post with respect to the telephone questionnaire or interviews
personal is that it allows registrants to respond more
accuracy, particularly when registrants need to consult
records or discuss questions with other people (Fowler
1988).
A potential downside can be the time it takes
conduct questionnaires by post. Normally, a questionnaire on
mail is conducted in this sequence: send letters, wait for them
replies and send confirmation (Fowler 1988, Bainbridge 1989).
Hence, the total time may be longer than the time required for
personal interviews or telephone interviews. However, the
total time can be known in advance (Fowler 1988,
Denscombe 1998). The time spent conducting the interviews
personal cannot be known in advance since it varies from
an interview with the other (Fowler 1988). The telephone interviews
can be faster than post questionnaires and
but personal interviews can have a high failure rate
response due to the unavailability of some people (Fowler 1988).
In addition, telephone interviews are generally limited to lists of
relatively short questions (Bainbridge 1989).
Another weakness of a mail questionnaire is the high rate of
lack of response (Fowler 1988, Bainbridge 1989, Neuman
1994). However, countermeasures were taken by associating
this study with a trusted institution in the data field
warehousing (i.e. TDWI) (Bainbridge 1989, Neuman 1994), the
which sends two reminder letters to those who have not answered
(Fowler 1988, Neuman 1994) and also includes a letter
which explains the purpose of the study (Neuman 1994).
3.4.2. Analysis unit
The purpose of this study is to obtain information about
the implementation of data warehousing and its use
within Australian organizations. The target population
is made up of all Australian organizations that have
implemented, or are implementing, i data warehouse. In
the individual organizations are then registered. The questionnaire
it was mailed to organizations interested in adoption
di data warehouse. This method guarantees that the information
collections come from the most suitable resources of each organization
participant.
3.4.3. Survey sample
The survey respondents' mailing list was obtained from
TDWI. From this list, 3000 Australian organizations
have been selected as the basis for sampling. A
additional letter explained the project and the purpose of the investigation,
along with an answer card and a prepaid envelope for
return the completed questionnaire were sent to the sample.
Of the 3000 organizations, 198 agreed to participate in the
study. Such a small number of responses was expected date il
large number of Australian organizations they had then
embraced or were embracing the date strategy
warehousing within their organizations. So, the
target population for this study consisted of only 198
organizations.
3.4.4. Contents of the questionnaire
The structure of the questionnaire was based on the date model
Monash warehousing (discussed earlier in part 2.3). The
questionnaire content was based on the analysis of
literature presented in chapter 2. A copy of the questionnaire
mailed to survey participants can be found
in Appendix B. The questionnaire consists of six sections, which
follow the phases of the model treated. The following six paragraphs
briefly summarize the content of each section.
Section A: Basic information about the organization
This section contains questions related to the profile of
participating organizations. Plus, some of the questions are
relating to the condition of the data warehousing project of
participant. Confidential information such as name
of the organization were not revealed in the survey analysis.
Section B: Start
The questions in this section are related to the starting activity of
data warehousing. Questions have been asked for how long
it's about project initiators, sponsors, skills and knowledge
demands, the goals of data warehousing development and the
end user expectations.
Section C: Design
This section contains questions related to the activities of
planning of data warehouse. Specifically, the questions are
state about the scope of execution, the duration of the project, the cost
of the project and the cost / benefit analysis.
Section D: Development
In the development section there are questions related to the activities of
development of the data warehouse: collection of user requirements
final, the sources of data collected, the logical model of data collected, prototypes, the
capacity planning, technical architectures and selection of
data warehousing development tools.
Section E: Operation
Operation questions related to operation ed
extensibility of data warehouse, as it evolves into the
next stage of development. There quality, the strategies of
refresh of data collected, the granularity of the data collected, data scalability
warehouse and the security issues of the data warehouse they were between
the types of questions asked.
Section F: Development
This section contains questions about using the date
warehouse by end users. The researcher was interested
for the purpose and usefulness of the data warehouse, review and strategies
training programs and the data control strategy
warehouse adopted.
3.4.5. Response rate
Although postal surveys are criticized for having a
low response, measures have been taken to increase the
rate of return (as discussed earlier in part
3.4.1). The term 'response rate' refers to the percentage of
people in a particular survey sample that answers the
questionnaire (Denscombe 1998). The following was used
formula to calculate the response rate for this study:
Number of people who answered
Response rate =
——————————————————————————– X 100
Total number of questionnaires sent
3.4.6. Pilot test
Before the questionnaire is sent to the sample, the questions are
were tested by conducting pilot tests, as suggested by Luck
and Rubin (1987), Jackson (1988) and de Vaus (1991). The purpose of the
pilot tests is to reveal all uncomfortable, ambiguous and expressions
difficult questions to interpret, to clarify any
definitions and terms used and to identify approximate time
required to complete the questionnaire (Warwick and Lininger 1975,
Jackson 1988, Salant and Dilman 1994). The pilot tests were
carried out by selecting subjects with characteristics similar to those
of the final subjects, as suggested by Davis e Cosenza (1993). In
this study, six data warehousing professionals were
selected as the pilot subjects. After each pilot test, they are
necessary corrections have been made. From the pilot tests carried out, i
participants helped to remodel and reset the
final version of the questionnaire.
3.4.7. Methods of Analysis Di Data
I data collected of survey collected from closed question questionnaires are
were analyzed using a statistical program package
called SPSS. Many of the responses have been analyzed
using descriptive statistics. A number of questionnaires
they returned incomplete. These were treated with major
attention to make sure that i data collected missing were not one
consequence of data entry errors, but why the questions don't
they were suitable for declarer, or declarer decided not
answer one or more specific questions. These answers
missing were ignored during the analysis of data collected and they were
coded as '- 9' to ascertain their exclusion from the process
analysis.
In preparing the questionnaire, the closed questions were
precoded by assigning a number to each option. The number
then it was used to prepare the data collected during analysis
(Denscombe 1998, Sapsford and Jupp 1996). For example, there were
six options listed in question 1 of section B: advice
administration, senior executive, IT department, unit
of business, consultants and more. In the file of data collected of SPSS, is
a variable has been generated to indicate 'the project initiator',
with six value labels: '1' for the 'board of directors', '2'
for 'the high-level executive' and so on. The use of the Likertin scale
in some of the closed questions also allowed
an identification that does not require effort given the use of values
corresponding numbers entered in SPSS. For questions with
non-exhaustive answers, which were not mutually exclusive,
each option was treated as a single variable with two
value labels: '1' for 'marked' and '2' for 'unmarked'.
Open questions were treated differently from questions
closed. The answers to these questions were not posted in
SPSS. On the contrary, they were analyzed by hand. The use of this
type of questions allows you to gain information about ideas
freely expressed and personal experiences of registrants
(Bainbridge 1989, Denscombe 1998). Where possible, it has been done
a categorization of responses.
For the analysis of data collected, methods of simple statistical analysis are used,
such as the frequency of the answers, the mean, the squared deviation
mean and median (Argyrous 1996, Denscombe 1998).
The Gamma test was performing to obtain quantitative measurements
associations between data collected ordinals (Norusis 1983, Argyrous 1996).
These tests were appropriate because the ordinal scales used were not
they had many categories and could be shown in a table
(Norusis 1983).
3.5 Summary
In this chapter, the research methodology and the
designs adopted for this study.
Selecting the most appropriate search method for a
particular study takes in
consider a number of rules, including nature and type
research, as well as the merits and weaknesses of each possible
method (Jenkins 1985, Benbasat et al. 1097, Galliers and Land 1987,
yin 1989, Hamilton and ives 1992, Galliers 1992, neuman 1994). View
the lack of existing knowledge and theory about it
of the adoption of data warehousing in Australia, this study by
research requires an interpretative research method with a skill
exploratory to explore the experiences of organizations
Australian. The search method chosen has been selected for
collect information regarding the adoption of the date concept
ware-housing by Australian organizations. A
Postal questionnaire was chosen as the collection technique data collected. THE
justifications for the research method and collection technique data collected
selected will be provided in this chapter. It also was
presented a discussion on the unit of analysis, the sample
used, the percentages of answers, the content of the questionnaire, the
pre-test of the questionnaire and the analysis method data collected.

Design a Data Warehouse:
Combining Entity Relationship and Dimensional Modeling
ABSTRACT
Store i data collected it is a major current problem for many
organizations. A key development problem
of the storage of data collected it is his design.
The design must support the detection of concepts in the data
warehouse a legacy system and other sources of data collected and also one
easy understanding and efficiency in implementing data
warehouse.
Much of the storage literature data collected recommended
the use of entity relationship modeling or dimensional modeling for
represent the drawing of data warehouse.
In this newspaper we show how both
representations can be combined in an approach for the
drawing of data warehouse. The approach used is systematically
examined in a case study and is identified in a number of
important implications with professionals.
DATA WAREHOUSING
Un data warehouse it is usually defined as a "subject-oriented,
integrated, time-variant, and non-volatile collection of data in support
of management's decisions "(Inmon and Hackathorn, 1994).
Subject-oriented and integrated indicates that the data warehouse è
designed to cross the functional boundaries of legaci systems per
offer an integrated perspective of data collected.
Time-variant affects the historical or time-series nature of data collected in
un data warehouse, which enables trends to be analyzed.
Non-volatile indicates that the data warehouse it is not continuously
updated as a of OLTP. Rather it is up to date
periodically, with data collected from internal and external sources. The
data warehouse it is specifically designed for research
rather than for the integrity of the updates and the performance of the
operations.
The idea of storing i data collected it's not new, it was one of the purposes
management of data collected since the 1982s (Il Martin, XNUMX).
I data warehouse they offer the infrastructure data collected for management
support systems. Management support systems include decision
support systems (DSS) and executive information systems (EIS).
A DSS is a computer-based information system that is
designed to improve the process and consequently the grip of
human decision. An EIS is typically a delivery system of
data collected which enables business executives to easily access the view
of the data collected.
The general architecture of a data warehouse highlights the role of
data warehouse in management support. In addition to offering
the infrastructure data collected for EIS and DSS, al data warehouse is possible
access it directly through queries. THE data collected included in a date
warehouse are based on an analysis of the information requirements of
management and are obtained from three sources: internal legacy systems,
special purpose data capture systems and external data sources. THE
data collected in internal legacy systems they are frequently redundant,
inconsistent, low quality, and stored in different formats
so they must be reconciled and cleaned before they can be loaded into the
data warehouse (Inmon, 1992; McFadden, 1996). THE data collected from
from storage systems data collected ad hoc and from sources data collected
external are spent used to augment (update, replace) i
data collected from legacy systems.
There are many compelling reasons to develop a data warehouse,
which include better decision making through use
actual more information (Ives 1995), support for a focus
on the complete business (Graham 1996), and the reduction in costs of
provision of data collected for EIS and DSS (Graham 1996, McFadden
1996).
A recent empirical study found, on average, a return of
investments for i data warehouse 401% after three years (Graham,
1996). However, the other empirical studies of data warehouse have
found significant problems including difficulty in measuring and
assigning benefits, lack of a clear purpose, underestimating it
purpose and complexity of the storage process i data collectedin
particularly with regard to the sources and the cleanliness of the data collected.
Store i data collected can be considered as a solution
to the problem of managing data collected between organizations. There
manipulation of data collected as a social resource it has remained one of
key issues in managing information systems around the
world for many years (Brancheau et al. 1996, Galliers et al. 1994,
Niederman et al. 1990, Pervan 1993).
A popular approach to managing data collected in the eighties it was
the development of a model data collected social. Model data collected social it was
designed to offer a stable basis for the development of new systems
applications and and legacy reconstruction and integration
systems (Brancheau et al.
1989, Goodhue et al. 1988: 1992, Kim and Everest 1994).
However, there are many problems with this approach, in
particular, the complexity and cost of each task, and the long time
required for tangible results (Beynon-Davies 1994, Earl
1993, Goodhue et al. 1992, Periasamy 1994, Shanks 1997).
Il data warehouse it is a separate database that co-exists with legacy
databases rather than replacing them. It therefore allows you to
direct the management of data collected and avoid costly rebuilding
of legacy systems.
EXISTING APPROACHES TO DATE DRAWING
WAREHOUSE
The process of building and perfecting a data warehouse
it should be understood more as an evolutionary process rather than a
traditional systems development lifecycle (Desio, 1995, Shanks,
O'Donnell and Arnott 1997a). There are many processes involved in a
project of data warehouse such as initialization, planning;
information acquired from the requisites requested of company managers;
sources, transformations, cleaning of data collected and synchronization from legacy
systems and other sources of data collected; delivery systems under development;
monitoring of data warehouse; and nonsense of the process
evolutionary and construction of a data warehouse (Shin, O'Donnell
and Arnott 1997b). In this journal, we focus on how
draw i data collected stored in the context of these other processes.
There are a number of approaches proposed for data architecture
warehouse in literature (Inmon 1994, Ives 1995, Kimball 1994
McFadden 1996). Each of these methodologies has a brief
review with an analysis of their strengths and not.
Inmon's (1994) Approach for Data Warehouse
Design
Inmon (1994) proposed four iterative steps for drawing a date
warehouse (see Figure 2). The first step is to design a model
data collected social to understand how i data collected can be integrated
across functional areas within an organization
subdividing the data collected store in areas. Model data collected it is made for
to store data collected relating to decision making, including data collected
historical, and included data collected deduced and aggregated. The second step is
identify subject areas for implementation. These are based
on priorities determined by a particular organization. The third
step involves drawing a for the subject area, poses
particular attention is paid to including appropriate levels of granularity.
Inmon recommends using the entity and relationship model. Fourth
step is to identify systems of sources data collected required and develop
transformation processes to acquire, clean and format i data collected.
The strengths of Inmon's approach are that the model data collected Commitment
provides the basis for the integration of data collected within the organization
and planning of supports for iterative development of date
warehouse. Its drawbacks are the difficulty and cost of drawing
model data collected social, the difficulty in understanding patterns of entities e
relationships used in both models, that data collected social and that of data collected
stored by subject area, and the appropriateness of data collected of the
drawing of data warehouse for the realization of
relational but not for multi-dimensional.
Ives' (1995) Approach to Data Warehouse
Design
Ives (1995) proposes a four-step approach to drawing a
information system that he considers applicable to the design of a date
warehouse (see Figure 3). The approach is very much based on
Information Engineering for the development of information systems
(Martin 1990). The first step is to determine the objectives, the factors
critical and successful and key performance indicators. THE
key business processes and necessary information are
molded to lead us to a model data collected social. The second step
it involves the development of a defining architecture data collected
stored by areas, di data warehouse, the components
of technology that are required, the set of organizational support
required to implement and operate with data warehouse. The third
step includes the selection of required software packages and tools.
The fourth step is the detailed design and construction of the
data warehouse. Ives notes that to store data collected is a bound
iterative process.
The strength of Ives' approach is the use of technical specifications for
determine the information requirements, the use of a structured one
process to support the integration of data warehouse,
the appropriate selection of hardware and software, and the use of multiple
representation techniques for the data warehouse. Its flaws
they are inherent in complexity. Others include difficulty in
develop many levels of within the data warehouse in
reasonable time and cost.
Kimball's (1994) Approach to Data Warehouse
Design
Kimball (1994) proposed five iterative steps for drawing a date
warehouse (see Figures 4). His approach is particularly
dedicated on the drawing of a solo data warehouse and on the use of templates
dimensional in preference to models of entities and relationships. Kimball
analyze those dimensional models because it is easier for i to understand
business executives are more efficient when dealing
complex consultations, and the design of physical is more
efficient (Kimball 1994). Kimball acknowledges that the development of a
data warehouse is iterative, and that data warehouse separated can
be integrated through the division into tables of dimensions
common.
The first step is to identify the particular subject area to be
perfected. The second and third steps concern modeling
dimensional. In the second step the measures identify things of
interest in the subject area and grouped in a fact table.
For example, in a subject area of sales the measures of interest
they could include the amount of items sold and the dollar
as a sales currency. The third step involves identifying the
dimensions which are the ways in which they can be grouped i
facts. In a sales subject area, relevant dimensions
they could include item, location and time period. There
fact table has a multi-part key to link to each
of dimension tables and typically contains a very large number
great in fact. In contrast, size boards contain
descriptive information about the dimensions and other attributes that
they can be used to group facts. The fact table e
dimensions associated with proposal form what is called one
star pattern due to its shape. The fourth step involves
building a multidimensional to perfect it
star scheme. The final step is to identify source systems data collected
required and develop transformation processes to acquire, clean
and format i data collected.
The strengths of Kimball's approach include the use of models
dimensional to represent i data collected stored that make it
easy to understand and leads to efficient physical design. A
dimensional model that readily uses both
systems of relational systems can be perfected or systems
multidimensional. Its flaws include lack
some techniques to facilitate the planning or integration of
many star patterns within one data warehouse ,
difficulty of designing from the extreme denormalized structure in a
dimensional model a data collected in legacy systems.
McFadden's (1996) Approach to Data
Warehouse Design
McFadden (1996) proposes a five-step approach to
draw a data warehouse (see Figure 5).
His approach is based on a synthesis of ideas from the literature
and is focused on the drawing of one data warehouse. The first
step involves a requirements analysis. Even if the specifics
techniques are not prescribed, McFadden's notes identify the
entity data collected specifications and their attributes, and refers to Watson readers
and Frolick (1993) for the acquisition of the requirements.
In the second step, a model entity relations for
data warehouse and then validated by business executives. The third
step includes determining the mapping from legacy system
and external sources of data warehouse. The fourth step involves
processes in development, distribution and synchronization of data collected in
data warehouse. In the final step, the delivery of the system is
developed with particular emphasis on a user interface.
McFadden points out that the drawing process is generally
iterative.
The strengths of McFadden's approach rely on participation
by business executives in determining requirements and also
the importance of resources data collected, their cleaning and collection. Her
flaws relate to the lack of a process for splitting a
great project of data warehouse in many integrated stages, and the
difficulty in understanding the entity and relationship models used in the design of
data warehouse.

0/5 (0 Reviews)

Find out more from Online Web Agency

Subscribe to receive the latest articles by email.

admin CEO

👍Online Web Agency | Web Agency expert in Digital Marketing and SEO. Web Agency Online is a Web Agency. For Agenzia Web Online success in digital transformation is based on the foundations of Iron SEO version 3. Specialties: System Integration, Enterprise Application Integration, Service Oriented Architecture, Cloud Computing, Data warehouse, business intelligence, Big Data, portals, intranets, Web Application Design and management of relational and multidimensional databases Designing interfaces for digital media: usability and Graphics. Online Web Agency offer companies the following services: -SEO on Google, Amazon, Bing, Yandex; -Web Analytics: Google Analytics, Google Tag Manager, Yandex Metrica; -User conversions: Google Analytics, Microsoft Clarity, Yandex Metrica; -SEM on Google, Bing, Amazon Ads; -Social Media Marketing (Facebook, Linkedin, Youtube, Instagram).

See Full Bio

Share:

I like it:

Find out more from Online Web Agency