Sherry Li's BI Corner

Friday, April 9, 2010

I Moved!

I’ve found a better home for my Business Intelligence blog.

http://bisherryli.wordpress.com/

SSRS – Why Do We Need InScope Function in Matrix Subtotals

Here is a pretty good article about advanced matrix reporting technique. I’ll show you an example where I need to manipulate the Subtotals in a matrix to get a subtotal that is not the default sum of all the data in the group. (I need to check out if this is still true in SSRS 2008.)

By default, the subtotal in the matrix is the sum of all numbers in the defined group. But in many cases, we need to have custom aggregates on a matrix report. Examples are, you need average instead of sum, you need growth percentage instead of sum. In this example, I need to sum only the numbers in bold, and ignore the numbers that are not in bold.

matrix1_RowGroup2: this is the lowest level grouping Dates on the column.

matrix1_ColumnGroup2: this is the lowest level grouping on the row.

To get a custom aggregates, enter this in the Expression editor of the measure cell. (To be continued…)

=iif(inscope("matrix1_ColumnGroup2") and inscope("matrix1_RowGroup2"), IIF(SUM(Fields!ACCT_COUNTER.Value) = 0, "", SUM(Fields!ACCT_COUNTER.Value)), IIF(SUM(Fields!ACCT_COUNTER_SUBTOTAL.Value) = 0, "", SUM(Fields!ACCT_COUNTER_SUBTOTAL.Value)))

SSIS – Automate a Package in Development

In development, I have no ways to automate a SSIS package via AutoSys.

Here are some simple steps I take to automate execution of a SSIS package.

1) Import the package into SQL Integration Services.

2) Create a job to execute the package.

3) In the Data sources tab, check the appropriate data source(s) that need explicit login credentials, and enter appropriate user id and password (batch login IDs are the best).

Thursday, April 8, 2010

SQL – Index Clustered vs Non-Clustered

Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.

In SQL server, indexes can be clustered indexes and non-clustered indexes.

When you create a clustered index on a table, all the rows in the
table are “physically” stored in the order of the clustered index key. So, there can be only one clustered index per table.

Non-clustered indexes have their own storage separate from the table data storage. Data are only “logically” stored in the order (with a row locator).

Appropriate indexes improve the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan.

However, too many indexes can slow down other data modification operations (such as INSERT, UPDATE, DELETE). This is because every time data changes in the table, all the indexes need to be updated. Another disadvantage of having too many or unnecessary indexes is that, indexes need disk space, the more indexes you have, more disk space is used.

SQL – Denormalization & Normalization

In the reporting business, we would reverse some of the normalization process.

It's the controlled introduction of redundancy in to
the database design. It helps improve the query performance as the number of joins could be reduced.

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data and ensuring data dependencies make sense (only storing related data in a table).

First Normal Form (1NF) – primary key

First normal form (1NF) sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form (2NF) – foreign key

Second normal form (2NF) further addresses the concept of removing duplicative data:

Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF) – all columns depend on the PK

Third normal form (3NF) goes one large step further:

Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:

Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.

SQL – Confused by keys?

Define candidate key, alternate key, composite key.

A candidate key is one that can identify each row of a table uniquely.

Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.

A key formed by combining at least two or more columns is called
composite key.

In the end, it’s all about the uniqueness of each row in the table.

SSAS - AdventureWorksDW sample data warehouse for data mining and OLAP

The Adventure Works DW sample data warehouse is used to demonstrate the business intelligence features that are available in SQL Server. The data in the data warehouse has been trended to support both data mining and Online Analytical Processing (OLAP) scenarios.

The follow is from Microsoft site AdventureWorks Sample Data Warehouse: http://msdn.microsoft.com/en-us/library/ms124623.aspx

The following sections provide more information about the data warehouse itself and about the scenarios that it supports:

The Data Warehouse

Adventure Works DW contains a subset of the tables from the OLTP database, in addition to financial information that is pulled from a separate data source. The data is kept in synch with the OLTP database to support typical Integration Services scenarios such as loading and updating the data warehouse. The Readme_AWDataWarehouseRefresh Package Sample demonstrates how to use an Integration Services package to load data into Adventure Works DW from the Adventure Works OLTP.

Adventure Works DW contains two subject areas, finance and sales, which are described in the following sections. Additionally, the data warehouse contains views that are used to support the data mining scenarios that are described later in this topic.