What is the truth? How do I find the truth? Just the truth, the real truth.
This is a common demand of police detectives, lawyers, relationships, or every day life. But let us now forget that those of us working in the BI community demand the the singular truth for our solutions to be a success. Our business users are most certainly demanding that. Without, we are providing business solutions used for critical decisions that would be based on possibly false knowledge or obfuscated facts.
Continuing on the most recent DBCF postings around data quality and data integration, I want to stretch this out to the concepts of SSOT (single source of truth) or SVOT (single version of the truth). These are essentially common database design objectives and are critical in BI solutions. A business scenario which includes BI as a component can be solved by simply putting in a cube and a spreadsheet. The sometimes hard & tedious work of wading through the sources of similar but different data and battling the organizational politics can be 80% of your efforts in a complete BI solution including data integration.
What does a fact table not contain facts? In the world of dimensional modeling for data warehouses, the analyst and the modeller must perform the exercises of modelling the star schema, but cannot ignore a deep analysis of data sources. Data profiling in this context is the primary activity where you will look for duplicates, similarities and differences and determine, along with the owners of the data, which customers, products, LOBs or sales are to be treated as the database of record. And how can the datawarehouse bring disparate sources together for a single version or vision of the truth. These are just a few examples. I encourage further research and have a look at this article, which is one that I enjoyed reading on this topic: http://www.computerworld.com/databasetopics/data/story/0,10801,88349,00.html.
When we work on BI solutions specific to industry business scenarios, we build data marts so that we can have the proper data sources integrated to generate reports and analytics needed to solve business problems. There are a number of Microsoft partners that have tools and practices to make this a success in your project. Searching Microsoft.com for data integration partners will provide a good start for you. For our MEC (Microsoft Enterprise Cube) framework customers, we provide data quality and integration tools and expertise to work with our customers on-site to uncover the SSOT and SVOT.
As always, drop us a line at bidis@microsoft.com for questions regarding these BI solution concepts and other related topics.
Happy new year everyone! As we enter into 2008 I thought I would start us off on a few musings that I have around data quality in relation to business intelligence and data warehouse projects. A few recent projects that I have worked on and witnessed had me thinking about the topic and some thoughts to relate to you all.
First, why is data quality important in BI projects? A very basic question to ask, indeed, but a valid starting point because I hear this question too often. Doesn't matter if you are using SQL Server with SSIS, custom ETL, any data integration whether that be with Oracle or specialized tools like Informatica ... data quality is crucial to you BI application project. In the end, not ensuring data quality will likely result in poor decisions being made by the business from faulty statistics or incorrect data. When your end users are looking at BI reports, they should be looking at actionable knowledge. The analytics must be the result of vetted quality data.
Second, profile your data. This is very important and can help you identify common sources of data quality problems such as duplicate data, missing data and suspicious data. I would also recommend having a look at the new Excel data mining add-in cpabilities to examine source data files so that you can identify outliers in your data sources. Missing data and duplicate data are common sources of downstream business intelligence flaws. I promise to follow up on this post with a few words about finding proper entity sources and other data integration concepts including MDM very shortly to address those concerns.
A nice source of data quality strategies utilizing SQL Server is available here: http://msdn2.microsoft.com/en-us/library/aa964137.aspx.
That's all for now, folks. Until next time ... BR, Mark
Is dimensional modeling too difficult? I do hear this question when debating sources for cube data in business intelligence solutions. At Microsoft, our OLAP engine and data warehouse tools are all wrapped up into SQL Server and its components: SQL Server database engine, SQL Server Analysis Services (SSAS) and SQL Server Integration Services. But if the requirements for your BI application are real time or near real time, there may be a need to build your cube directly off an OLTP or transactional database system.
Data warehouse and BI purists will generally tend to steer away from that. In fact, in SSAS, there are many capabilities allowing you to have your cube built and refreshed upon partitions and measure groups and sensing source data changes so that portions of the cube are refreshed. Many of these mechanisms give you essentially the near real time capability of what is sometimes refered to as "operational BI".
But most certainly a common good practice is to build your BI solution upon a cube in the OLAP engine that is based on a data warehouse of some sort. In my role at Microsoft, my group generally looks to build functional BI solutions built all Microsoft and partner products that give you a snap-in solution for a particular business problem. Therefore, we do not generally provide an enterprise data warehouse. Instead, we call them data marts with entities and attributes that fit just what is needed for a particular business problem.
Indeed, we utilize dimensional modeling based upon the TDWI methodolgy devised by Ralph Kimball. We have found that SQL Server is very well suited for building what we call a Unified Dimensional Models (UDM) for your data warehouse. But it can be challenging for database administrators or developers who are comfortable and well versed in relational database design. A very good source of information on dimensional models can be found from the Microsoft reference architecture of Project Real: http://www.microsoft.com/sql/solutions/bi/projectreal.mspx. I recommend looking at the sample data warehouse that ships with SQL Server to see examples of a star schema for storing your measures & dimensions. Taking the time to learn these techniques to build a data mart or data warehouse will provide the best foundation for a successful Microsoft BI implementation. And this way, your decisions can be based on application requirements such as realtime, near realtime, quarterly reporting, predictive reports, etc. as opposed to the technical challenges of building a star schema.
I think that's it from here at the DBCF BI Solutions desk for the year. Happy Holidays and I look forward to blogging in the new year! BTW, be sure to keep an eye out for our MEC BI solution launch events coming in 2008 ...
- Mark
A very common scenario that we have run across from our customers when speaking with them about BI projects, is "Where do I start"? Many IT professional and business professionals have an understanding of business intelligence systems and the power that they can bring to your organization and the value that they can bring an organization. Yet these are complex systems which can include data warehouses, analytics, reporting, workflow, business process management, portals, on and on. In fact, the Microsoft BI stack includes all of these through SQL Server, MOSS, Visual Studio, Proclarity and PerformancePoint Server.
So where to begin ... What are my data sources? Do they support the problem scenarios that I am trying to provide to the business? What should my UDM look like? Star schema or snowflake? How should I surface my heirarchies, KPIs, analytics ... ? How long is this going to take me?
For those that have been reading my blog over the past several months, you may have captured a few bits of information that I have released early before our official launch regarding Microsoft Enterprise Cube (MEC). This is a way in which Microsoft can provide for you a beginning baseline out of the box to begin the process of gaining insights into your customers, providing immediate ROI on your BI investments and lowering operating income.
These are all the lofty goals and intent of any BI project. Here Microsoft will provide data models to use for your staging and star schemas, preditictive analytics algorithms, KPIs, reports, etc. all based on industry standards from your particular industry.
Here is an example: With an MEC solution for Churn Management in a telecommunications customer, you would receive the Microsoft BI stack with these elements included:
1. Physical ODS & star schema
2. Cubes to support the scenario
3. KPIs & reports to measure the business
4. UI integrated with Virtual Earth and Silverlight
5. ETL processes to bring your existing data into an existing data mart
6. High speed source data adapters
I'm going to post a few notes this month, I promise, on some techniques that can be used to map data sources to an existing data mart and how to address such issues. Without clean data properly summarized in your data mart, your business intelligence solution will not bring you proper value to your business users.
If we are to look at the time it takes to bring a successful, complete Business Intelligence solution on the Microsoft platform to production in your environment, what components of building this solution would need to be accounted for in such a project?
Some of the most comprehensive examinations of the lifecycle of BI projects are available in the Kimball Group's Datawarehouse Toolkit books. But in a nutshell, what we mean in such cases is that what really makes a BI project complete and successful is to perform the necessary upfront analysis, interviews with the business stakeholders, build your data warehouse using proven dimensional modeling techniques and to build your schemas based on data models to support your intended BI application solution. Without the solid foundation to support your KPIs, measures, reports, etc. you will not be successful.
The Microsoft BI platform products fully support you to build these solutions: SQL Server (SSIS, SSRS, SSAS), PPS and Proclarity. But if you do not have the in-house staff, time or funds to invest in an end-to-end time & materials project to implement these solutions from scratch, or wish to cut down on your time to production, Microsoft offers pre-built BI solutions called Microsoft Enterprise Cube (MEC) solutions.
MEC solutions allow you to implement the Microsoft BI solution platform based on popular scenarios such as Churn Management, Customer Segmentation and Profitability Analysis with Microsoft Services or Microsoft BI partners. The intent is to provide you with 70% or so of what you need to get started around these scenarios including the database schemas, data models, KPIs, reports and workflow. In this way, you can bring your solution to production in your environment in 16-18 calendar weeks by utlizing the MEC data models and analytics.
This is not to say that your project lifecycle with this approach to BI solutions does not include analysis and modeling. Instead, we look at the efforts involved around extending an existing data model, star schema and cube structure. Since these are based on industry best practices and standards, standing up such a complete scenario takes much less time, effort and risk.
There are no good shortcuts in the BI project lifecycle. But there are methods to solve your business problems with BI scenarios in a much quicker method using a prebuilt solution like that we are speaking of here with MEC.
Happy Thanksgiving and I will report back in December with some further musings around Microsoft BI solutions ... BR, Mark K
Seems like there has always been a question that Microsoft has received from the data warehouse, business intelligence and business community: Does Microsoft really *get* BI? In other words, I think that means are we serious about the technology, the capabilities, and the benefits it brings to business. Also, can Microsoft speak the business talk in its products and solutions that address business pain points that can be solved with BI.
Those that have been working with BI systems for many years have seen large software vendors like Oracle, SAP and SAS perform very well in this area when you look at Gartner's magic quadrant for business intelligence and analytics systems. The BI-specific niche companies make a nice showing there as well, such as Business Objects and Cognos. Then there is a separate category of consulting services-driven solutions from CapGemini, HP, IBM, etc. that leverage their business consulting to add value to a customer solution.
With the very recent release of Microsoft Office PerformancePoint Server, this is all possible and being delivered by Microsoft today. Yes, Microsoft does indeed get it! We are putting together solutions based on these BI & BPM tools that are being driven by what OLAP Report lists as the #1 OLAP engine on the market today, SQL Server 2005 Analysis Services and our ETL engine, SSIS, and updating those in 2008 with brand new versions. By working with industry experts in the fields of telcommunications, health services and retail, Microsoft is taking those tools and creating data models and cubes to solve business problems for business users. And all of that analysis and knowledge is then surfaced through MOSS 2007 so that it is available to the masses: BI for everyone in your organization! Empower, enlighten and participate in excellent business decisions is what Microsoft BI solutions position for.
So that's it from Redmond for me for now. Have a very happy Fall & Halloween! Best regards, Mark
bidis@microsoft.com
For those that read the DBCF database blogs which I post here from time to time, you will notice my move into business intelligence solutions full time since the beginning of this year (2007) under the BIDIS Microsoft alias (bidis@microsoft.com). I wanted to write a brief entry here to sort of wrap-up a few comments about the pervasiveness of BI in business today that we see and that I have encountered in working with our customers, business decision makers and industry leaders.
Saying that BI is pervasive in today's business structure can seem very much like marketing hyperbole. Personally, I think it really does not do justice to the fact that I believe we are trying to put across here. The takeaway for me that I wish to highlight is that business intelligence as a solution to every day business problems such as common pain points including customer profitability, revenue management, customer churn analysis and customer segmentation, touches nearly every business in every sector. BI, when done right, is truly a solution to the issues that are raised by those problem areas.
I'll just spend 1 paragraph here focusing on a single area within the problem space I mention above. Let's take Customer Profitability, which Microsoft Services packages as a ready-to-deploy module within the Microsoft Enterprise Cube solution. A deep, analytical 360 degree vision of your customer is the only true way to decipher his and her effect on your business operating income, revenue streams, margins and profits. To be able to analyze your customer base at this micro level provides strategic advantages over competitors and enables you to understand your customers better than ever before. Turning vast amounts of data into knowledge is not only pervasive, but necessary to compete and win in your business.
The way that this is done is to extract the data points needed to form our pre-built Profitability Anlysis cubes through our ETL layer and into the analytics layer provided by Microsoft PerformancePoint Server. SQL Server 2005 in the engine that drives this functionality and provides the ODS for the data storage while the analytical charts, KPIs and scorecards are surfaced through MOSS 2007 and ProClarity.
For more information on Microsoft packaged BI solutions, please send an email to this alias: bidis@microsoft.com.
Thanks and take care! Mark
In the Microsoft Industry Solutions Group, we are developing pre-built BI solutions for vertical business needs. Each of these solutions solve specific problems within a vertical business such as Churn Management for Telcos, patient segmentation for healthcare, and customer profitability for retail businesses.
Prebuilt BI solutions in this context represent a package of the Microsoft BI products (SQL Server & OBA product line) with the specific required set of ETL packages, database connectors, data models, cubes, KPIs, scorecards, reports and dashboard. The idea here is to get your business running in a production environment with a specific pain point solution based on Microsoft BI quickly without the need for months of analysis, architecting and building the building-blocks needed to accomplish the feat of turning your mass amounts of data into knowledge for customer segmentation, churn analysis, profitability analysis, etc.
There would be no need to hire data modelers, DBAs, developers, business analysts and project managers for a 12-16 month project to figure out not only how to make this all happen in business intellegence tools. But also to figure out what exactly it is that you need in regards to data models, data extraction, cubes and analytics in order to track, measure and forecast your business performance.
The old adage of one size does not fit all certainly pertains to an off-the-shelf BI solution such as this. But if 70% of the lengthy, expensive and complex job of building and implementing a pure BI or BPM solution is already completed for you, then the BI solution has been a wise investment.
And in the case of these Microsoft BI solutions I spoke of here, they include analytics and workflows built in PerformancePoint and SQL Server that are based on industry standards and best practices from some of the top industry consultants and standards forums for each industry such as TMF and ARTS.
What are some of the most common problems encountered when integrating data from multiple sources to retrieve the data points necessary for business analytics? Very frequently asked questions that we hear when starting business intelligence projects include:
How to connect to the data sources? How do I manage different keys for similar data? How to handle large data feeds? Should I use a temporary staging table or load dimensions and facts immediately? Must I use a UDM for data model?
These are complex questions with answers that depend on many factors. But they are frequently asked and I am putting together a list of answers for these based on Microsoft SQL Server best practices for BI and DW using SSAS and SSIS and various solutions through our partners including those that offer solutions for high-speed adapters.
Those answers and responses will be posted here on my blog over the next several weeks, so please check back frequently ... hang in there!! Mark
As we wrap-up our first ever BI conference at Microsoft here at the Washington State Convention Center, I just wanted to take a few minutes to type in a few blog notes.
Our team was here representing the Microsoft Services Industry Solutions Group (ISG). Hopefully, if you were here in Seattle this week, that you had the time to stop by our booth in the Microsoft pavillion. Here we demonstrated the Enterprise Cube solution framework based on the Microsoft BI stack. This an enabling technology for you to adopt BI solutions in your enterprise in a quick manner (4-5 months) that fits a specific business need (Churn Management, Customer Profit Analysis, etc.) and is based on the industry-leading TCO toolset for BI solutions.
The sessions and keynotes during the conference hopefully portrayed the Microsoft vision accurately for you regarding our message of BI for Everyone, Pervasive BI in the enterprise and our solutions and tools for Business Performance Management, such as PerformancePoint Server and our group's framework, Enterprise Cube, for solving industry-specific business problems.
If you have interest in industry-specific pre-built solutions based on industry standards and common practices, please contact the product group here in Redmond, WA for further information about demos which we have ready and showed here this work, or to set-up a proof of concept for you. Our email address is bidis@microsoft.com.
Thanks and I hope to see you all back here next year!
In the world of data warehouse and business intelligence terminology, there is quite a bit of ambiguity. When our group produces BI and business performance management solutions based on the complete Microsoft BI stack, we like to stick with solve particular business problems unique to a vertical industry. So we call the supporting database for the analytics, KPIs and BI a data mart as opposed to a data warehouse.
Our group is the Industry Solutions Group and the particular solution I am refering to is the Microsoft BI solution for vertical industries. The Microsoft BI stack is split-up into back-end database, ETL and analysis services (SQL Server product stack) and the front-end user interaction and analysis work through portals, which we call executive dashboards (Sharepoint, Proclarity and BSM, soon to be PerformancePoint Server).
Have a look on Microsoft.com for PerformancePoint Server which is currently in CTP 2. If you are interested in how such end-to-end Microsoft BI solutions really work and how to build a supporting data mart for such a solution, start with PPS. It will change the way that you look at business intelligence.
It is so easy to use as a tool that it seems impossible. And when you think about solving particular, easy to consume problems which BI can help solve, think data mart.
Friday was my final appearance with the joint HP / Microsoft roadshow showcasing SQL Server 2005 running on HP Integrity hardware and focused on customers migrating from SQL Server 2000 or competitive database platforms.
Here are a few notes of interest that I felt would be good for the blog-o-shpere in the hopes of providing insight, chuckles, or reference data for those of you interested in SQL Server 2005 and migrating to the SQL Server database platform:
1. Customers familiar with BI and data warehousing on SQL Server 2000 *LOVE* the improvements in SQL Server 2005 which include the new SSIS (replacing DTS), proactive caching in SSAS and table partitioning for ETL.
2. Those same customers expressed their pain when trying to *migrate* from DTS to SSIS. There were a few success stories around housing SQL Server 2000 DTS packages inside a DTS 2000 task in SSIS. But generally, the feeling is that rewritting ETL in SSIS is the best path, albeit not the quickest. Contact dbcf@microsoft.com and we'll point you to some partners and Microsoft resources to help you with those efforts.
3. Nearly everyone was in agreement to move to SQL Server 2005 64-bit. The expanded memory and direct addressing available beyond 32-bit SQL Server is well worth the investment. Memory management has been improved in 2005 over SQL 2000 and 64-bit is even better at it.
Please check the Microsoft.com SQL Server home page: http://www.microsoft.com/sql and look for other events in your area for SQL Server. We are always hosting events, webcasts and roadshows to showcase SQL Server 2005.
We've been talking quite a bit about database & application migrations, as well as interoperability with exiting infrastructure, applications and reporting tools with SQL Server. I suppose I should also mention that this blog contains a bit of information about Microsoft business intelligence using SQL Server as the database or just parts of the SQL Server system such as Analysis Services, Integration Services or Reporting Services.
Well, simply put, there is a lot there. A lot to consume and a lot to think about. But it would be somewhat incomplete to not talk about operating system migrations. So I'll spend today's post with a little bit of information and include public Microsoft links to further information regarding operating system migrations and interopability and how that all relates to database migrations and consolidations in the context of DBCF, database consolidation framework.
Let's say that you have an Oracle 9i database that is hosting a couple of internally-developed Java applications in your organization. The Oracle database is currently hosted on Sun Solaris, let's say, and you are looking to migrating these applications as-is to SQL Server 2005.
Well, the DBCF guidebook for migrations would point you toward SSMA for Oracle for the existing database schema and existing data migration from Oracle 9i to SQL Server 2005. Then you would need the SQL Server JDBC driver to replace the Oracle JDBC driver for database connectivity. The next step would be to evaluate the JDBC calls used by the developers to ensure that they are not using any Oracle-specific non-standard JDBC calls, which would require rewriting those lines of code if that is the case.
But for the operating system you may have scripts, jobs, etc. that you would like to port in-place over to Windows. The options you have include using server virtualization to host Solaris environments on your Windows 2003 servers, port to Windows scripting host, and in fact, you could even host Oracle in Windows 2003 Server natively, or through virtualization within Windows Server.
Here is the primary Windows migration link to start with when migration from Unix systems: http://www.microsoft.com/windowsserver/unix/default.mspx
For mainframe migrations, which you will find often when migrating to SQL Server from IBM DB2, start here: http://www.microsoft.com/windowsserver/mainframe/migration.mspx
Lastly, I mentioned Windows Scripting earlier. There is quite a bit that you can do in Windows through scripting similar to that of Unix shell scripting, if that is what you are comfortable doing as an administrator. Aside from porting Perl to Windows-based ActivePerl, have a look here at a great TechNet site on scripting:
http://www.microsoft.com/technet/scriptcenter/default.mspx
Happy migrating! Mark (DBCF)
It's Friday and I wanted to put a few words of wisdom into the DBCF blog to end the week. Whether you are in the middle of or thinking about a database migration to SQL Server, or are looking to leverage pieces of the SQL Server ecosystem like SSAS or SSRS, or maybe you like the Microsoft BI tools like ProClarity or BSM, just think of how great life is creating solutions on this platform!
Why do I say that? Well, because that's what the DBCF team does! And it's Friday, so I'm in a jolly mood! No doubt that migrating critical applications in your data center is an undertaking that takes time, money, resources, aspirin and close scrutiny. Microsoft understands that and speaking with us about those issues and how to make the conversion work for you can also lead you to a happy, prosperous weekend!
Creating a functioning, value-added, business-critical BI application has its own challenges. The DBCF offerings that are being trialed right now solve business problems and give you the 80% complete out-of-the-box functionality that you need to start solving complex business problems now!
If you have an Oracle database now, I suggest to download SQL Server from Microsoft.com today. Also download the SSMA migration tool for Oracle. Put your database into SQL and then play around with SQL Server Analysis Services. You can use the cube wizard to begin building business intelligence and browse your data with ProClarity.
Have fun and come back and see us with the results and advice!
DBCF@Microsoft.com
Getting back to the core foundation of DBCF solutions, we'll take the database schema and data migration up to the presentation layer of systems which require migration to SQL Server.
There are many reporting tools on the market and Crystal Reports, Business Objects, Hyperion, etc. can all interop perfectly well on SQL Server. Following the migration of your database to SQL Server from, say, Oracle, you would need to swap-out your database driver to a SQL Server database driver and then modify table, view or field names that may have changed during the migration. Keep in mind that the Microsoft free download product named SQL Server Migration Assitant (SSMA) will do a fantastic job of keeping your database objects named the same from source database (Oracle, Sybase or Access) to new target SQL Server database.
But on the front-end of your system, if you are ready to move to the .NET world, leverage the easy and quick deplyoment abilities found in the SQL Server Report Services, which comes with the SQL Server database software, there are roadmaps, partners and tools to assist you in migrating your existing reports from Crystal .rpt files to SQL Server's RDL XML-based reports.
Here is a list of report tool converter software options:
http://www.microsoft.com/sql/technologies/reporting/partners/crystal-migration.mspx#isv
And here is a list of report conversion partners that can help you with your migration to SQL Server Reporting Services from existing Crystal Reports:
http://www.microsoft.com/sql/technologies/reporting/partners/crystal-migration.mspx#services