logo image

Chapter 8: Big Data, Data Warehouses, and Business Intelligence Systems

Big Data
The term for the enormous datasets generated by the Web applications such as search tools and social networks.
Business Intelligence (BI) systems
Information systems that assist managers and other professionals in the analysis of current and past activities and in the prediction of future events. Unlike transaction processing systems, they do not support operational activities, such as the recording and processing of orders. Instead they are used to support management assessment, analysis, planning, control and ultimately, decision-making.
operational systems
A database system in use for the operations of an enterprise, typically an OLTP system.
Online Transaction processing (OLTP) System
An operational database system available for, and dedicated to, transaction processing or the ongoing stream of businesses transactions. Also known as transactional system.
1. Read and process data existing in the operational database but they use the operational DBMS to obtain this data, but do not insert, modify or delete operational data
2. Process data extracted from operational databases, in this situation, they manage the extracted database using a BI DBMS which may be the same as or different from the operational DBMS
3. BI systems read data purchased from data vendors
From where do BI systems obtain data? And how do they process the obtained data?
reporting systems
This type of BI system sorts, filters, groups and makes elementary calculations on operational data. Summarizes current status to past or predicted status. Classify entities (customers, products, employees, etc.) and report delivery crucial.
data mining applications
This type of BI system performs sophisticated analyses on data, analyses that usually involve complex statistical and mathematical processing. They are used for: what-if analyses, predictions, decisions. The results are often incorporated into some other report or system.
Online Analytical Processing (OLAP)
A technique for analyzing data values, called measures, aganist characteristics associated with those data values, called dimensions. They are sometimes used to ease the task of report production.
1. Most data mining applications have only a few users and those users have sophisticated computer skills
2. The results of a data mining analysis are usually incorporated into some other report, analysis or information system
Why is report delivery not as important for data mining systems as it is for reporting systems? (2 main things)
1. Querying data for BI applications can place a substantial burden on the DBMS and unacceptably slow the performance of operational applications
2. The creation and maintenance of BI systems require application programs, facilities and expertise that are not normally available from operations
3. Operational data have problems that limit their use for BI applications
Why is operational data difficult to use?
data warehouse
A database system that has data, programs and personnel that specialize in the preparation of data for BI processing.
extract, transform and load (ETL) system
This system cleans and prepares data for BI processing. Problematic operational data that may have been cleaned in this system can also be used to update the operational system to fix the original data problems.
-“dirty data” problematic data such as “G” for a alue of gender, “213” for age, part of a color “rd”, etc.
-missing values
-inconsistent data (data that has changed, such as a customer’s phone number or address)
-nonintegrated data (data from two or more sources that need to be combined)
-incorrect format
-too much data
What are some of the problems of using operational data for BI processing?
data warehouse metadata database (or simply, “data warehouse”)
The place where metadata concerning the data’s source, format, assumptions, constraints and other facts are kept.
data warehouse
Can be thought of like a distributor in a supply chain. Takes data from the manufacturers (operational systems and purchased data), cleans and processes them and locates that data. The people who work in a data warehouse are experts at data mgmt, cleaning, transformation and the like. They are usually not experts at a given business function.
data mart
A collection of data that is smaller than the data warehouse and addresses a particularly component or functional area of the business. Users obtain data from the data warehouse that pertain to a particular business function. They do not have the expertise of data warehouse employees, but they are knowledgable analysts for a given business function.
enterprise data warehouse (EDW) architecture
The system design of a corporation’s BI data. In one configuration, a data warehouse maintains all enterprise BI data and acts as an authoritative source for data extracts provided to the data marts. Data marts receive all data from the data warehouse.
dimensional database
A database design that is used for data warehouses and is designed for efficient queries and analysis. It contains a central fact table connected to one or more dimension tables.
1. Used for unstructured analytical data processing
2. Current and historical data are used
3. Data are loaded and updated systematically, not by users
What are the 3 main characteristics of a dimensional database?
slowly changing dimension
Rare changes in data information (such as a customer moving to a different city or state, or changed address) that would affect the data in a data mart
star schema
Model for dimensional databases where fact table is the center with connected dimension tables, there is usually a date or time dimension to track changes ver time.
snowflake schema
A more complex version of the star schema. In this schema, each dimension of the table is normalized, which may created additional tables attached to the dimension table.
measures (of business activities)
Quantitative or factual data about the entity represented by the fact table.
RFM Analysis
A data collection and processing analyzes and ranks customers according to their purchasing patterns. It is a simple customer classification technique that considers how recently (R) a customer orders, how frequently (F) a customer orders, and how much money (M) the customer spends per order.
OLAP reports
Online Analytical Processing report, which usually contains the results of OLAP, which provides the ability to sum, count, average and perform other simple artithmetic operations on groups of data. Also called an OLAP cube.
drill down (into data)
To further divide data into more detail
server cluster
A group of associated servers
distributed database
A database that is stored and processed on more than one computer
partitioning
One way of distributing a database which means breaking down the database into pieces and storing the pieces on multiple computers.
replication
A means of distributing a database by storing copies of the database on multiple computers;
dirty read
A read of data that have been changed but not yet committed to a database. Such changes may later be rolled back and removed from the database.
object-oriented programming (OOP)
A technique for designing and writing computer programs. Today most new program development is done using languages Java, C++, C# and Visual Basic.NET
objects
Data structures that have both methods and properties
methods
Computer programs that perform some task
properties
Data items particular to an object
object persistence
When using an OOP, the properties of the object are created and stored in main memory. This is the term for storing the values of properties of an object.
object-relational database
Basically add-on features and functions for a DBMS product which facilitate object persistence
NoSQL (or Not only SQL) Database
A distributed, replicated database used where this type of DBMS is needed to support large datasets.
-Key-Value–Dynamo and MemcacheDB
-Document–Couchbase and MongoDB
-Column Family- Apache Cassandra and HBase
-Graph–Neo4J and AllegroGraph
What are the Four Categories of NoSQL Databases?
supercolumns
When columns are grouped into sets , i.e Customer Name ______ consists of a FirstName column and a LastName column and which stores the CustomerName “FirstName LastName”
column families
The combination of columns and supercolumns that result to form the database storage equivalent of RDBMS tables
keyspace
Where all the column families are contained; this provides the set of RowKey values that can be used in the data store.
Map Reduce Process
Process used to break a large analytical task into smaller tasks, assign each smaller task to a separate computer in the cluster, gather the results of each of those tasks and combine them in the final product of the original tasks.
Hadoop Distributed File System (HDFS)
An Apache software which provides standard file services to clustered servers so that their file systems can function as one distributed file system. It was originally part of Casandra but the project spun off a nonrelational datastore of its own called HBase and query language named Pig.

Need essay sample on "Chapter 8: Big Data, Data Warehouses, and Business Intelligence Systems"? We will write a custom essay sample specifically for you for only .90/page

Can’t wait to take that assignment burden offyour shoulders?

Let us know what it is and we will show you how it can be done!
×
Sorry, but copying text is forbidden on this website. If you need this or any other sample, please register
Signup & Access Essays

Already on Businessays? Login here

No, thanks. I prefer suffering on my own
Sorry, but copying text is forbidden on this website. If you need this or any other sample register now and get a free access to all papers, carefully proofread and edited by our experts.
Sign in / Sign up
No, thanks. I prefer suffering on my own
Not quite the topic you need?
We would be happy to write it
Join and witness the magic
Service Open At All Times
|
Complete Buyer Protection
|
Plagiarism-Free Writing

Emily from Businessays

Hi there, would you like to get such a paper? How about receiving a customized one? Check it out https://goo.gl/chNgQy