Multisource Universe
Executive summary
Data Federation is considered a next generation data warehouse where the physical need of a warehouse is bypassed. It is a methodology that takes a layered approach for dealing with data from disparate sources. It provides on-demand alternatives to traditional EDWH. This paper shows how to implement this methodology using SAP BusinessObjects Data Federation, performance evaluation criteria considered, and lessons learned.
Scenario
The aim is to unify data from SAP BW and DB2 legacy on the fly, making remote data appear as if it resides in a single database and evaluate the performance of SAP BusinessObjects Data Federation query engine.
Approach
- Maintain AS-IS tables in SAP BW and DB2 legacy mainframes.
- Develop a Data Foundation/Business Layer using SAP BusinessObjects IDT.
- Run queries to evaluate the performance of Data Federation query engine.
- Trace query run times from Data Federation admin tool and respective databases.
Building a multisource universe
Step 1: Create connections
- Login to Repository through IDT
- Right click on “Connections” folder and select “Insert Relational Connection” to create a connection to DB2
- Provide a name for the connection
- Select “JDBC Divers” for DB2
- Provide the system details
- Create a relational connection for SAP BW
- Provide a name for the connection
- Select “SAP Java Connector (SAP JCo)”
- Provide server details
- Create connection shortcuts in local project folder
Step 2: Create Data Foundation
- Right click on Local projects → New → Data Foundation
- Provide a name for Data Foundation
- Select “Multisource-Enabled” option
- Click Next
- Select the connections for the Data Foundation
- Click Next
- Click Next
- Select the tables from respective connections
- Do the necessary joins and modeling
Intelligent queries for union data sets
Intelligent queries can be triggered to query a respective source system based on the input parameter values. For BusinessObjects to act intelligently, a derived table should be created to do a union of data from both source systems; a mandatory parameter should be created and the query should be hard-coded with the mandatory parameter LOVs in where condition.
In this scenario, the few facilities data is available in SAP BW and few facilities data is available in DB2. For query to be triggered intelligently, we have created a derived table (DB2 View) to union data sets from both the source systems, a mandatory parameter and have encoded the LOVs in the where condition of respective source system queries.
When the end user inputs a facility value, it checks both the queries and runs only the query which has that value in where condition. Below are the conditions that have been included as part of where condition for queries to behave intelligently.
DB2: Table__5."CUST_CD" in ('DCT','YRK','ATF','CRP','GTX')
BW: Table__9."Z1A4GSTO" in ('PTI','RTF','VRF','HDF','IVC')
Step 3: Create business layer
- Right click on Local projects → New → Business Layer
- Select Relational Data Foundation and click Next
- Provide a name for the Business Layer
- Select the Data Foundation by clicking the button highlighted below
- Select the Data Foundation
Performance test conditions
Below are the On Demand and Scheduling performance test conditions considered:
- All facilities data for one month duration (A1M)
- All facilities data for one week duration (A1W)
- High-volume facility data for one month duration (H1M)
- High-volume facility data for one week duration (H1W)
- Medium-volume facility data for one month duration (M1M)
- Medium-volume facility data for one week duration (M1W)
- Low-volume facility data for one month duration (L1M)
- Low-volume facility data for one week duration (L1W)
Testing results
Overall
- 49% of Time spent in BW
- 39% of Time spent in BOBJ
- 12% of Time spent in DB2
Overall Run-times
On-demand
- All facilities one month data (2.2 M): ~18.25 (BOBJ 12.5 mins)
- All facilities one week data (0.54 M): ~7.25 mins (BOBJ 3 mins)
On-Demand Run-times
Schedule
- All facilities one month data (2.2 M): ~ 16 mins (BOBJ 9 mins)
- All facilities one week data (0.54 M): ~ 8.25 mins (BOBJ 4.5 mins)
Schedule Run-times
Lessons learned
Union
- To trigger intelligent queries, a pre-defined list of values must be hard-coded in the derived table.
- Data types from each source system must match. Otherwise, workaround of “calculated column” must be implemented.
Data Modeling
- Optimize the data model in multisource universe so that only single query processing happens per source system to get better execution time.
Performance
- Sequential query processing on respective databases (BW, DB2 etc.).
- Only simple SQL queries with “where clause” are pushed to respective databases out of various complex queries generated by SAP BOBJ multisource universe.
- SQL queries with complex operations like “Group By,” “Having,” and “Case” etc. are processed in SAP BOBJ Data Federation query engine.
- Background scheduling of Web Intelligence reports should be leveraged for large data volume. On-demand report execution should be leveraged whenever smaller data volume is required.
- Optimization steps should be carried out on the respective source systems to have better query/report performance.
Appendix
System configuration
- Total of 7 VM Guest running Red Hat Linux RHEL5
- Web Tier — Tomcat — 2 VM Guest — 4 x Intel Xeon 5670 @2.93 GHz, 16 GB RAM
- Intelligence Tier — 2 Node CMS Cluster — 2 VM Guest — 4 x Intel Xeon 5670 @2.93 GHz, 16 GB RAM
- Processing Tier — 3 VM Guests — 4 x Intel Xeon 5670 @2.93 GHz, 24 GB RAM
- Disk Storage — 200 GB shared IFRS/OFRS
BOBJ Landscape
Data Volumes
System | Records (Millions) |
---|---|
SAP BW | 79.12 |
DB2 | 95.13 |
0 comments:
Post a Comment