Welcome to MSDN Blogs Sign in | Join | Help

PowerPivot Team Blog

Information, tips, news and announcements about PowerPivot (formerly known as Gemini) directly from the product team
PowerPivot Data Refresh

In this posting we will take a look at the data refresh facility in PowerPivot for SharePoint and show you how to use it for maximum benefit. To start off, let’s begin with a simple question: So . . . what is data refresh? And why should I care? Data refresh is all about keeping your data current. It ensures that your users are making decisions based on the most current and up-to-date information possible. This is done with as little investment of your time as possible. We want to make it easy and straightforward to have your data be as current as it can be. Set it up once and it runs until you tell it to stop.

Example: Suppose you have spent several hours, maybe days, on an important workbook. You have gather data from five sources: three corporate IT databases, a local spreadsheet provided by a colleague with market share information, and finally a local table embedded in a worksheet that you have typed in by-hand. The local table drives a market penetration forecast that is the whole point of the workbook. The business problem at hand is this forecast. And you publish the workbook to your local PowerPivot for SharePoint site --- and all of your fellow team members LOVE IT! Excellent work. Your boss walks in the door and says “Nice work. But we’ve just published this month’s data, can you update the numbers?” Aww, Oh. Now what do you do? Updating the data wasn’t in your original plan.

Sure you could copy down the whole workbook and update the data on your local machine and re-publish, but there should be a way for you to automatically request that the data be updated for you. After all, all of the data is on the server. Why not? What about the fact that some of the corporate databases are updated weekly, some are monthly – do you have to manage that process yourself also? Why couldn’t the system update the data and then it is just there when you came into work the next day?

Well, the answer is that the system can – and that is what the PowerPivot data refresh facility is all about.

Before we get too deep into it, let’s talk a bit about what the data refresh facility isn’t. The word ‘refresh’ and ‘data’ is used a lot in SharePoint; all over Excel and all over most IT systems. In our case, we are talking about a specific kind of data refresh. We are talking about the data that is stored inside the PowerPivot workbook. When you have the workbook open, it is the data stored in the PowerPivot Client Window. It is the thousands and thousands (possibly even millions) rows of data that you have imported into the workbook.

clip_image002

It is that data. . . To update that data on the server, the following two restrictions have to hold:

1. The data has to be available to the server. This seems obvious, but sometimes its not. Files located on your local hard drive, or within the workbook itself (such as the local table) cannot be refreshed on the server. The data has to be located somewhere the server can get to it.

2. You have to ask that the data be refreshed. You have to setup a refresh schedule. You have to indicate what data sources to be included. The system does not do the data refresh by itself – well, it does – but only if you request it.

This raises an interesting question. Just because you have a workbook, does it have to be refreshed? Well, the answer is NO. It is up to you and your business problem. It is quite reasonable that you might consider it more important that you (as the expert) review the workbook before the data is published. It might be that assumptions and business rules that applied in the past are not longer valid. And a person would only know that if they saw the report prior to others using it. If this is the case then by all means DO NOT SETUP DATA REFRESH! Only use the facility if it solves your business problem.

However, for many classes of problems, the automatic daily and weekly refreshing of the data is as important to its use. Data can only be acted upon if it is fresh and relevant to the business. Stale data is misleading and could harm the very business processes that are being developed. Only you the expert knows the difference.

How to setup a data refresh schedule

There are two ways of setting up a data refresh schedule. First, you can use the standard document library option menu: the item is called “Manage data refresh”. The other option is to use the Gallery Silverlight control and click on the “Manage data refresh” icon.

clip_image004

To setup the schedule, click on “Enable” and provide the following information:

  • When to run the schedule? Typically you would click on daily or weekly, but there are lots of options here. Many of the options are only displayed when the relevant time period is selected. For example if you click on “Monthly” then the following options appear.

clip_image006

  • When should the refresh schedule run during the day? Normally you should enter “During after business hours” – which the administrator establishes as properties of the PowerPivot service application. If you don’t want to run the schedule then, you can enter in the starting time for when the job will run. This has two interesting side-effects: (1) the time specified is the ‘starting’ time – meaning that given system usage and available resources, the actual job may run later, this is just the earliest that it could run; and (2) that after a successful run, the schedule is automatically disabled. To be totally honest, the second side-effect is done on purpose to discourage users from not using ‘after business hours’. We think that that the ‘after business hours’ approach makes the most effective use of resources.
  • To whom should failure email notifications be sent to? You can enter several email addresses/users if you wish. To be notified when workbooks are refreshed successfully, use the normal SharePoint alerting facility to send emails out when new content on a document library arrives. This is normal SharePoint. This option is for failures.

Now let’s take a look at the Windows credentials to run the schedule under:

clip_image008

  • Who should the user be? There is one Windows user regardless of how many schedules are specified in this job. In this field you specify “Who” the user is that will run at 2am in the morning. It must be a Windows user. You have several options here, the end result of each one is an NT account:
    • “Connect using the credentials already stored inside the workbook” – in this case, the system will use the unattended execution account that is specified for the PowerPivot for SharePoint service application. Typically this is a low permissions account that has no trusted access to the various data sources. In this case, the data refresh facility will use the non-Windows authentication (i.e. sql logins) that have been specified on the connection string in the data sources.
    • “Connect using the following Windows user credentials” – The user can enter in a Windows username and password. We will store these credentials in the SharePoint Secure Store facility for the user. At 2am the data refresh facility will pull the credentials out and do a Windows logon using them – and the data sources can then make trusted connections.
    • “Connect using the credentials saved in Secure Store Service” – if the user has the ability to enter in their own Secure Credentials the data refresh facility will use them at 2am for the logon. If using this approach, the application ID must be a Secure Store group that includes the service account used by the PowerPivot service application (so we can read the credentials).

NOTE: These options are important because they establish the Windows environment for the job. If there are any trusted connections used in the workbook’s data sources, then this is the Windows user that will be logged on for the job; the credentials in the data source are used for non-Windows authentication..

Finally let’s take a look at the data sources contained in the workbook.

clip_image010

  • Lastly you will see a list of data sources. You can select all data sources, or data source by data source within this job. For each data source you specify if you want it included in the data refresh schedule or if you would like to schedule it separately. For a data source you can specify:
    • If you want this data source to be included in the default schedule, or you can specify a custom schedule. All of the same options for the workbook can be ran on each data source, e g. the third Tuesday of the month.
    • For each data source you can specify if you want the data source included in the default schedule with the workbook or separately. If separately you can control which tables are updated and when. The system knows what tables have been imported from what data sources. When you select a specific data source, then only those tables that are built upon it are updated using this schedule. If you have two tables that you would like refreshed at different periods but they use the same data source, then you have to enter the same data source twice so you can schedule the two tables independently.
    • The username and passwords that are specified for the data source can either be (1) from the embedded workbook (i.e. you entered the credentials in the client when the workbook was created), (2) custom username and password that you enter in by-hand (and we will store for you in Secure Store), or (3) they come from a Secure Store Application ID that you have created. In any case, these credentials are for non-Windows authentication. They are not used to establish a trusted connection – that is what the other Windows credentials are for up above in the dialog box.
    • If you do not want the data source to be updated, such as if the data source is contained on your C:\ drive, then uncheck the data source and it will be ignored when the schedule runs.

Once you have established a data refresh schedule then you will find that the “Manage data refresh” option first points to the history page of the workbook. The history page shows you when the data refresh schedule(s) were run and what the outcome of the job was (i.e. success or failure).

clip_image012

Conclusion

In summary, use the data refresh facility to keep your data up-to-date. Just setup your schedule and the system runs in the background automatically. Your data just arrives when asked for. The facility is designed as a simple, straightforward way to have end users schedule their own data refresh requests. The system runs the schedule automatically (typically ‘after business hours’) and then posts the updated workbooks back into SharePoint in the same document library where the workbooks came from.

Finally, we wouldn’t be honest if we said that the data refresh facility was the right solution for everyone. Clearly it isn’t. It is oriented to you, the end user; not to your IT colleagues. As a scheduler, it lacks many of characteristics that IT systems typically have. For example, there is no real-time or on-demand access to data; or the ability to issue frequent updates, such as hourly or every 5-10 minutes. These capabilities are important in their own way, and some systems demand them, but in the managed self-service world they come with a management overhead and complexity that is just not appropriate for our users.

Sharing workbooks using PowerPivot Gallery

Now that you have downloaded the PowerPivot for Excel November Technology Preview at PowerPivot.com, you have probably built a really interesting BI application right there in the workbook.  Importing and mashing up data from different sources, building calculations with business logic using DAX, and then creating rich interactive views in Excel with pivots, charts and the awesome new slicers.  Surely there are others on your team who would get a lot of benefit from using this workbook in their daily jobs?  This post is about what happens when you move from doing self-service BI on your desktop, to sharing a team BI application using PowerPivot for SharePoint.

Make your workbook look great in SharePoint

It’s pretty easy to share a workbook with your colleagues using regular SharePoint document libraries.  But, after all the trouble you went to build a cool and useful BI application on your desktop, we wanted to make sure that the experience of the consumers of your work was really great.  When you install PowerPivot for SharePoint, you get a new kind of document library called the PowerPivot Gallery which is designed to highlight your workbook and let people use it for slicing and dicing right there in the browser without having to download it and open in Excel.

image

You can save your workbook to a PowerPivot Gallery using the regular Excel features for publishing to SharePoint, or just open a browser and choose Upload from the ribbon on the SharePoint site.  Once the workbook gets uploaded, PowerPivot for SharePoint will figure out the thumbnails and display them in the gallery view.

Using the workbook in the browser

As you move the mouse over the thumbnails, you can see a preview of the information.  If you click on a thumbnail, it will open the sheet in your browser using Excel Services, which is the SharePoint service that powers this experience.  Now you can start using the slicers to do analysis, which is really great for users who aren’t Excel power users but can figure out how to use web applications.

image

Other views

The PowerPivot Gallery shows the Gallery view by default, but like most SharePoint document libraries you can choose to see the information in different ways.  One way is to change back to the regular document library view.  To do this, use the Library ribbon tab and select All Documents from the Current View menu.

All Documents

You can also choose one of the two other Silverlight views that we include, Theatre and Carousel.  The default view that is displayed when users go to PowerPivot Gallery is the Gallery view, but you can change that as well as the sort sequence by choosing Modify View from the Library ribbon tab.

Carousel Theatre

Creating new reports based on your workbook

One of the best things about having PowerPivot for SharePoint installed is that the PowerPivot workbooks that you publish can be used as data sources for other reports and workbooks.  Just by sharing in SharePoint, you can point Report Builder or Excel to the URL of the workbook and build new reports in your favorite Analysis Services client tool.

image

If a consumer of your workbook decides that they need a different view of the data in your workbook, they can easily create a new report from that data source by choosing “New Excel Workbook” or “New Report Builder Report”.  This will add a new file to the gallery and launch Excel or Report Builder.  When you save the document, the thumbnail for the report or workbook will be calculated and users of the gallery will get the same cool experience as for the original workbook.

Troubleshooting the icons in the gallery

When we are calculating the thumbnails for the gallery, we rely on the fact that the workbook can actually be displayed by Excel Services.  If there is some problem with this process, then we show an error icon instead of a thumbnail as shown below.

image

The usual issue is that the workbook is not displaying properly in Excel Services, so the most common way to troubleshoot this is to click on the icon and see what error is being displayed in the Excel Services view.  If you fix the problem and republish the workbook, then the thumbnail will be recalculated.

If you are a system administrator and want to look at the details, there is a LOG\Gemini.log file in the SharePoint folder on the server with full details of what is happening.

What’s next

The other great feature that you get when you publish your workbook to SharePoint is the ability to schedule a regular data refresh.  Look for a follow-up post on this.

November Technology Preview of PowerPivot is here!

If you thought everything in the August Technology Preview was great (for a reminder of what was there see the Overview of Gemini Features posting from August) please take a look at what will be offered in the November Preview.


What's New in PowerPivot in SQL Server 2008 R2 November CTP

PowerPivot for Excel

PowerPivot Field List enhancements

Numerous enhancements were added to the Gemini field list. These include:

· Automatic detection of relationships when columns from unrelated tables are used in a PivotTable.

· Support for searching fields by name

· Support for named sets

PowerPivot Data Source editing

· Support for refreshing data for table, for all tables from a data source, or for all the tables Gemini.

· Support for editing data source information such as server name, database name and credentials.

· Support for editing table settings such as data source table or query used, columns imported and filters.

Support for more types of Data Sources

This CTP introduces targeted support for importing data from:

· Microsoft Access

· Microsoft SQL Server Analysis Services

· PowerPivot workbooks published to Analysis Services running in SharePoint Integrated    Mode

· Text files

· Excel workbooks

· Microsoft SQL Azure

· Oracle

· Teradata

· Sybase

· Informix

· DB2

Enhanced Data Analysis eXpressions (DAX)

This CTP significantly expands the DAX functionality previous available:

· User interface enhancements include better propagation of errors and easier creation of DAX formulas.

· Several functions were added, including a set of functions that operate on textual data as well as those that provide common Date and Time manipulations.

· Several performance and functional capabilities were introduced including automatic recalculation

· Introduction of support for handling referential integrity violations between related tables

· Automatic selection of data types for expressions.

PowerPivot for SharePoint

Management Dashboard

This CTP introduces a management dashboard for IT. This dashboard provides visibility into PowerPivot usage on a SharePoint farm that includes information about the following areas, as well as a rich PowerPivot workbook that can be used to build custom reports:

· Published PowerPivot workbooks, including number of queries executed, number of users and size per workbook

· Hardware resource utilization for the PowerPivot service including CPU and memory

· Data refresh activity

Enhanced Infrastructure

· Claims-aware support to enable users identity to flow between SharePoint components in a secure manner

· Parallelized refresh of PowerPivot data

· Performance optimizations

 

Get the PowerPivot November Technology Preview now at http://powerpivot.com/download.aspx

For instructions on installing PowerPivot for SharePoint go here.

Additional questions or need help? Try our public forums for PowerPivot for Excel or PowerPivot for SharePoint.

Linked Tables

In our previous blog post, we discussed how easy it is to use the Copy/Paste feature of the PowerPivot [aka Gemini] add-in to bring in data from virtually any external source that cannot otherwise be accessed by means of an ODBC, OLEDB or Managed .NET provider. In these scenarios, business users are free to copy new or updated data at any time, but the source of the data is not retained in the model itself, and so multiple copies may need to be manually performed over the model’s life span in order to reflect changes to the sources.

Now consider the following business scenario: an analyst needs to produce a forecast of their company’s sales by product over the next year. Then, he or she must create a report combining the actual and forecasted data, with the forecasts being reviewed and updated at the beginning of each quarter. To do this they will need to thoroughly analyze actual historic data, explore trends, make a number of assumptions, to eventually come up with the forecasted values; some of them might be directly inferred using basic arithmetic formulas; but others could result from far more complex calculations, or need to be keyed in manually.

Excel is the natural tool of choice for such scenarios that blend ad-hoc data analysis, business modeling, data entry, and reporting tasks. Together with the PowerPivot add-in, it allows business analysts to seamlessly mash up, massage, and relate massive amounts of data from multiple heterogeneous, refreshable sources. And thanks to the linked tables feature, regular Excel tables can now be combined with other external data sources to support scenarios where data editing is required.

For example, consider the following table in Excel:

Linked Tables 1

This workbook contains sales data for a number of products; the data was simply keyed in manually in Excel as a regular range, and then made into a table by applying one of the available formats from the Format as Table gallery in the Table Tools ribbon tab. Alternatively, it may have resulted from a query against an external database, or from a bunch of user-defined Excel formulas, or a combination of both.

Making this data available to a PowerPivot model is as simple as going to the Gemini [aka PowerPivot] ribbon tab in Excel, and clicking the Create Linked Table button:

Linked Tables 2

At this point, PowerPivot first analyzes the table’s data, automatically detecting the columns names and data types, and then adds a new table to the model while maintaining a live link to the source table in Excel:

Linked Tables 3

Notice the name of the table in the PowerPivot window: it defaults to the name of the source table in Excel, as it appears too in the Excel Table field of the PowerPivot window Home ribbon tab. Also notice the link icon next to the table name, which lets users quickly identify the linked tables in a model; this is important because linked tables have some unique capabilities and behaviors, which they do not share with other types of PowerPivot tables.

Now to truly appreciate that, let’s switch back to Excel and see what happens when the source table is updated. The shortest and easiest way to go back directly to the right source table in Excel, is to click the Go to Excel Table button in the PowerPivot window ribbon; note that this, as well as the PowerPivot linked table feature in general, will work only in the context of the current Excel workbook, i.e. the workbook that the PowerPivot model is defined in.

Back in Excel, update the contents of the Quantity column by either typing in new values, or by changing the column’s formula – in the latter case, you will probably want to make sure that the new formula gets propagated to all rows, as shown below:

Linked Tables 4

Next, in the Gemini ribbon tab of Excel, locate the Update Mode for Linked Table split button: it defaults to Automatic Update, meaning that data from all Excel tables in this workbook gets automatically propagated to the corresponding linked tables, as soon as the user switches to the PowerPivot window. In manual mode, the refresh behavior is controlled by means of the Update All or Update Selected buttons; also note that these buttons exist in both the Excel and PowerPivot windows:

Linked Tables 5

Now, to force an immediate update, simply click on the Update Selected button; this automatically switches back the focus to the PowerPivot window and triggers the refresh:

Linked Tables 6

Notice how the linked table now contains the most recent data from Excel; thanks to this live link between Excel and PowerPivot, business users can conveniently add data to their models, which can then easily be edited and maintained as a whole right from within the familiar, self-contained Excel environment.

Once a linked table has been created in a PowerPivot model, it behaves just like any other table and supports renaming any of the existing columns, adding new columns from the source or by writing DAX formulas, participating in relationships with other tables from the same or a different data source, etc.

Thanks to linked tables, sophisticated dashboards that mix and match refreshable data from sanctioned, IT-managed sources, and adhoc, volatile data that captures the business knowledge and assumptions of subject matter experts, are easier to build than ever before; for example, the below report combines data from a SQL Server relational database, a flat file, and a linked table:

Linked Tables 7

Simply put, linked tables are regular Excel tables that are surfaced in a PowerPivot model; by leveraging the editing and computing capabilities of Excel in the PowerPivot add-in, business analysts can build richer models that truly bring together the best of both worlds.

Introducing PowerPivot

At the Microsoft SharePoint Conference yesterday in Las Vegas, we announced the official name for Gemini – “Microsoft SQL Server PowerPivot” or “PowerPivot” for short. PowerPivot consists of two components – PowerPivot for Excel 2010 and PowerPivot for SharePoint 2010.

PowerPivot for Excel is a data analysis tool that delivers unmatched computational power directly within the application users already know and love — Microsoft Excel. Leveraging familiar Excel features, users can transform enormous quantities of data from virtually any source with incredible speed into meaningful information to get the answers they need in seconds. PowerPivot for Excel consists of the following components:

  • The Excel 2010 addin that delivers the seamless PowerPivot user experience integrated within Excel.
  • The VertiPaq engine that compresses and manages millions of rows of data in memory with blazing fast performance.

PowerPivot for SharePoint enables end users to effortlessly and securely share their PowerPivot applications with others and work seamlessly in the browser using Excel Services. PowerPivot for SharePoint also helps IT improve their operational efficiencies by tracking PowerPivot usage patterns over time, discovering mission-critical applications, and improving system performance by adding resources. PowerPivot for SharePoint consists of the following components:

  • PowerPivot Gallery – the Silverlight based gallery where users can share PowerPivot applications with others and visualize and interact with applications produced by others using Excel Services and Reporting Services.
  • PowerPivot Management Dashboard – the dashboard that enables IT to monitor and manage the PowerPivot for SharePoint environment.
  • PowerPivot Web Service – the “front-end” service that exposes PowerPivot data via XML/A to external applications such as Report Builder.
  • PowerPivot System Service – the “back-end” service that manages the PowerPivot application database, load balancing, usage data collection, automatic data refresh, etc.
  • Analysis Services – the Analysis Services server running the VertiPaq in-memory engine and integrated with SharePoint to load and manage the data within PowerPivot workbooks.

To stay up to speed on all things PowerPivot, here are some links you’ll want to track:

And yes, we’ll be “rebranding” this blog as well to reflect the PowerPivot name. Stay tuned…

Copy/Paste

One of the core tenets of Gemini is that the user should be able to import data from virtually any source. The blog post Importing data into Gemini talks about the steps involved in importing data from a typical relational source. Gemini supports all of the popular relational databases – SQL Server, Access, Oracle, Sybase, Informix, DB2, Teradata, etc. Gemini also supports importing from third party OLEDB, ODBC and ADO.NET providers. The blog post Data Feeds talks about importing data from Reporting Services reports and other ATOM based data feed providers. Look for even more data import options in the next CTP of Gemini. Remember that all of these data sources are refreshable, i.e. you can publish the Gemini workbook to SharePoint and schedule a recurring job to automatically refresh the data.

Despite all of the data source options available in Gemini, there will be times where a user sees some data in tabular form that is not accessible via  any data source connectivity API such as OLEDB, ODBC, ADO.NET, ATOM, etc. Suppose you are viewing a web page that contains an HTML table, for example, the exchange rates to/from US Dollars at http://moneycentral.msn.com/investor/market/exchangerates.aspx?selRegion=0&selCurrency=1.

 MSN Money Exchange Rates

In order to import this data into Gemini, you just need to highlight the table contents and copy it to the clipboard. Switching to the Gemini window, you will see that the “To New Table” button in the ribbon is now enabled. Gemini automatically recognizes that a tabular chunk of data has been copied to the clipboard. Pressing the button brings up the following dialog which shows a preview of the data that is about to be pasted. You can choose to use the first row in your data as the column names in the resulting Gemini table.

 Paste Preview 

Press the OK button and you will see a new table in the Gemini window containing the data that you pasted. This is just like any other Gemini table – you can change the table name, the column names, add calculated columns, etc.

Now suppose you’re viewing the same exchange rates web page a day later and you would like your Gemini workbook to contain the latest exchange rates. Just copy the HTML table to the clipboard like before and select exchange rates table in the Gemini window. You will notice that in addition to the “To New Table” button, the “Paste Append” and “Paste Replace” buttons are also enabled. Press the “Paste Replace” button and it will bring up the following preview dialog. Press the OK button and the data in the exchange rates table is now refreshed.

Paste Replace

You can also use the “Paste Append” option to append the data from the clipboard to the Gemini table without touching the existing rows. For both “Paste Replace” and “Paste Append”, Gemini will validate that the structure of the data in the clipboard matches that of the Gemini table. The number of columns and their data types must match. If you have renamed the columns in the Gemini table or added calculated columns, Gemini will still let you paste the data.

In summary, Copy/Paste is a quick and easy way to get data into Gemini. Most applications that deal with tabular data (Excel, Access, Internet Explorer, etc.) do support copying it into the clipboard in a tabular format, and that makes the data easily available for Gemini to paste into a table. Although Copy/Paste is temptingly easy, it is important to note that the data in the table is static, i.e. not refreshable. Hence Copy/Paste should only be used when there is no other refreshable data source option available. For example, it would be easy to Copy/Paste data out of a Reporting Services report, however the recommended way would be to use data feeds to allow automatic data refresh.

Introduction to Data Analysis Expressions (DAX) in Gemini

Gemini and DAX Extend the Power of PivotTables

Excel PivotTables are not new, but Excel PivotTables that are based on multiple tables of data are new with the Gemini add-in for Excel 2010. This new capability in Gemini is a very powerful tool for data analysis when combined with our new DAX expression language.

clip_image002

Looking at the PivotTable above, the values in the data area of this pivot come from a sales table, the customer information comes from a customer table, and the product information comes from a product table. In Excel, without Gemini, this analysis would only be possible by constructing a single flattened table with all of the information. Working with the single, flattened table in Excel introduces a variety of issues around performance, memory consumption, organization of fields in the field list, etc. More importantly, for some scenarios, a single flattened table simply isn’t possible.

What Is DAX?

DAX is an expression language based on Excel formula syntax. Because DAX is designed to work with multiple tables of data, it includes functions that implement relational database concepts. DAX also adds new functionality that allows you to create dynamic aggregations, making DAX formulas smart about calculating values in a PivotTable. Excel users will be happy to learn that DAX includes part of the existing Excel function library, and many functions are the same as Excel functions.

Here are a few simple DAX formulas to show how these look a lot like Excel formulas.

DAX formula

Comment

= [First Name] & “ “ & [Last Name]

concatenation of strings is just like in Excel

=SUM(Sales[Amount])

SUM function takes a column instead of a range of cells

=RELATED (Product[Cost])

new RELATED function follows relationship between tables

Where Is DAX Used?

When you start with a set of data tables, it is unlikely that the raw data already contains all the values that you need. You may need to perform two distinct types of calculations to generate the desired analysis: Calculated Columns and Measures.

Calculated Columns

Calculated columns in Gemini are just like the calculated columns in any Excel table. This is not a new concept.

When you enter the formula which defines a calculated column, that formula is evaluated for each cell in the column, and you immediately get a calculated column which is fully populated with values. For example, if you have one column [Qty] containing the sales quantity, and another column [PRICE] with the sales price, you might want to calculate an AMOUNT column which is simply the QTY times the PRICE. You could write this just as you would in Excel: = [QTY] * [PRICE]

The values that are created in calculated columns may subsequently be used in a PivotTable in any of several ways. A calculated column may contain:

· Numbers which are to be aggregated in the Values area of a PivotTable.

· Values which are to be placed in row labels or column labels of a PivotTable.

· Values which are to appear in a slicer and which will be used to slice the PivotTable results.

Measures

Measures in Gemini are formulas which are placed into the Values area of a PivotTable. Once they’ve been placed into the PivotTable, they are evaluated for each cell in the Values area. Each cell evaluation will be different based on the unique combination of the row labels, column labels, and filters associated with that cell.

There are two types of measures in Gemini: “implicit” measures and DAX measures

Implicit Measures: When you check the checkbox for a numeric field in the PivotTable field list, Gemini will automatically create a measure for you which is the “Sum of <numeric field >” and this measure is placed in the Values area of the PivotTable. You can change the Field Settings, to change the aggregation from SUM to COUNT, AVERAGE, MIN, or MAX. We call these “implicit” measures because all you need to do is select a field (column) from your table(s) and an aggregation (SUM, COUNT, AVERAGE, MIN, or MAX). The formula associated with those measures is implied by the selection and will always be a simple aggregation of a column of values.

DAX Measures: When you create a new measure using the New Measure dialog, you can give the measure a name and then provide any DAX formula you like. You’re not limited to simple aggregation of a column. By using various functions within the DAX expression language, you can perform some pretty sophisticated analysis. It is important to understand that when you enter the formula which defines a measure, the formula is not immediately evaluated. Gemini simply captures the name of the measure and the formula associated with it. Then, when the measure is placed into the Values area of the PivotTable, the measure will be evaluated for each cell of that PivotTable. If you look at the PivotTable at the top of this blog post, you can see that the measure “Margin” is evaluated for each combination of year and customer and customer type.

Of course, whether you check a checkbox to get an “implicit” measure, or whether you use the “New Measure” dialog to enter a DAX formula, after the measure is created, Gemini will also add that measure to the PivotTable.

Calculated Column Examples Including Some Basic DAX Functions

Here are two simple examples of calculated columns in the Sales table used in the PivotTable at the start of this post:

clip_image004

The [Amount] column is defined by a formula which could have been pasted from Excel: = [Qty] *[Sold Price]

clip_image006

The UnitCost column uses a new DAX function, the RELATED() function. RELATED() follows an existing relationship between two tables and returns a value from the related table. In this case, we are going from the Sales table to the related Product table, and returning the value from the [Cost] column in that table.

We also have a new function named RELATEDTABLE() which will follow an existing relationship and return a table which contains all the related rows from the specified table. Here’s an example of how that could be used to return the sum of all the sales for a given product, by creating a calculated column in the Product table.

clip_image008

You’ll notice that this example uses a new SUMX() function instead of the traditional SUM() function. SUMX() takes two arguments: the first argument is a table over which we will be adding up values, and the second argument is the expression to be evaluated for each row in the table. These are the values which will be added up. In this case, we’re saying that for each product in the product table, we want to get the table containing related sales (the sales for this product) and add up the numbers in the [Amount] column of that related table.

DAX Doesn’t Refer to Data by Cell Location

In DAX, we do not have the Excel notion of addressing a single cell of data. We do not identify values by where they are located in the grid. We can’t refer to cell B23 or range B12:C15. DAX functions always operate on columns of data in a table. Instead of viewing our data set as a large rectangle of cells, we think of the data as a set of tables which contain columns and rows.

DAX Aggregation Functions

DAX includes several aggregation functions from Excel including SUM, AVERAGE, MIN, MAX, COUNT, but instead of taking multiple arguments (a list of ranges,) they take a reference to a column. DAX also adds some new aggregation functions which aggregate any expression over the rows of a table.

· SUMX (Table, Expression)

· AVERAGEX (Table, Expression)

· COUNTAX (Table, Expression)

· MINX (Table, Expression)

· MAXX (Table, Expression)

DAX Table Functions

Excel has no functions which return a table. The formula which defines a calculated column must return a scalar value, so that the result can be placed into the column. But having functions which return tables of data can be very powerful when those tables are used as intermediate results which are passed as arguments to other functions (typically the new DAX aggregation functions). You’ve already seen an example of this above, when I returned a “related” table of sales transactions and then used SUMX to add up those transactions, placing a single sum into the products table.

The notion of functions which return tables may be new to Excel users, but hopefully this is a concept that will be easy to pick up. Here are a few examples of functions which return tables:

· RelatedTable (Table) returns table containing related rows of data

· Filter (Table, Condition) returns table filtered to include rows where condition is true

· Distinct (Column) returns one column table containing the distinct values in a column

DAX Measures deserve their own blog post...

The real power of data analysis using DAX formulas comes not so much from calculated columns, but from using DAX to define measures. But that’s much more complex than can be addressed in a single blog post, so I’ll dive deeper into DAX measures in a separate blog post sometime soon.

I hope you’re working with the CTP build of Gemini and that you’ll send us your feedback on Gemini, including any thoughts you may have about DAX!

Thanks!

Howie

Data Feeds

One of the keys to self-service BI is that people need the highest quality, up-to-date data for their analysis. Many people don’t have direct access to query the databases underlying the applications that they use on a day to day basis, both because of the complexity of the data and also the need for adding business logic and security on top of the raw data.

Gemini includes support for data feeds, which lets you get your data from all kinds of interesting places such as applications and data services. One of the most common places where people get their data today is Reporting Services reports. So let’s start with some great news for Gemini users.

Using a Reporting Services report as a data feed

In SQL Server 2008 R2, every single Reporting Services report can be used as a data feed for your Gemini models! This means that users can get refreshable data feeds sourced from all the places where SSRS can get data, without any further work required to make data available for Gemini.

Once you have found a useful report, you can add it to your Gemini model by clicking the new orange “Export to Data Feed” button on the toolbar. This will launch Excel and take you straight into the import wizard. If you already have Excel workbooks open, you will be prompted to either pick an open model to add the feed to, or create a brand new model.

image

The import wizard for data feeds is very similar to the wizard for other types of data. In particular, you can preview data and pick which columns from the feed to add to your model. If the data feed contains multiple tables, such as a Reporting Services report with different data regions, then you can pick which tables to add and specify table names.

image

You can also start from the Gemini client, browse your report server for a useful report and add it to your Gemini model. Whether you start from the report, or start from the Gemini client, you can add as many feeds as you need to build your model, including combining it with data from other sources (see Edward’s blog on Importing data into Gemini).

image

Using a data feed from the Internet

Reporting Services is just one of the many places where you can get data feeds. You can also use data feeds from the Internet or other sources, as long as they follow the REST and Atom model (see next section for technical details on developing a custom data feed). One really cool place for getting data is the Open Government Data Initiative (OGDI) at http://ogdisdk.cloudapp.net/ which is built on the Azure Services Platform.

clip_image002

If you know the URL of the data feed, you can use the From Data Feeds button in the Gemini client to import it. You can also set up advanced properties to have more control over how the data shows up, including choosing options like whether to include all the standard Atom columns like title and author name which are not shown by default.

Developing a custom data feed

If you are a developer, you are probably already wondering how you can enable Gemini users to get data feeds from your applications. The good news is that there are lots of easy ways to support this. The basic technology is REST and Atom, so if you already have web services support you are in good shape to get started. One of the simplest ways to build a data feed is to use the great support for syndication in WCF. You can add your custom columns to the feed using the extensibility mechanisms that they supply, see one of the OGDI feeds mentioned above for an example, such as http://ogdi.cloudapp.net/v1/gsa/ConusPerDiemRates2009/ (you may need to turn off feed reader view in your browser to be able to see the raw XML).

clip_image004

If you are developing on the .NET platform and want to build a flexible and powerful data service, then ADO.NET Data Services (“Astoria”) is your best bet. If you haven’t used this technology yet, you will be surprised at how few lines of code it takes to get a huge amount of functionality, including everything you need for data feeds in Gemini.

Other than the data feeds themselves, if you want to add the orange data feed button to allow users to launch Gemini from your application or save a data feed file for later use, then you will also have to build a service document (.atomsvc file) for your data feeds. ADO.NET Data Services includes a service document already, but if you have built your own data feed then you can use WCF’s ServiceDocument class, or otherwise the format is simple enough that you can build one yourself.

If you developed have a desktop application rather than a server application that could include REST web services, but you still want to get data into Gemini, then you have a similar option. Gemini can import data from regular text files and the clipboard so in many cases users will already be able to use your data. But, if you want to let users launch Gemini right from your application to import data, you can create local Atom XML files and a corresponding data service document and then use the Gemini launcher to provide the same kind of seamless experience you saw in Reporting Services earlier.

Sharing and finding data feeds

If you have developed a custom data feed for your company but don’t have an existing application that users would expect to use for getting data, Gemini’s SharePoint integration includes a great solution. You can add a Data Feed Library to your SharePoint site, which will allow you to publish and edit data service documents as .atomsvc files. This means that a user could do search SharePoint for a term like “Customer data”, find your data service document in a Data Feed Library, and launch into Gemini just by clicking the file to import data from your feed.
Importing data into Gemini

This time around we will take a closer look at ways of getting your data into Gemini model. In particular we will into importing data using “conventional” ways. Using variety of OLEDB/ODBC/.Net client libraries.

Choosing Data Source

To import your data click on the “From Database ” in the Gemini Client Window and enter Import Wizard.

clip_image002

Here you see long pre-populated list of data sources you can import from. If you don’t see your favorite data source, you can click choose “Others” and Import Wizard will let you build your own OLEDB connection string.

Click on the “Build” button in the “Specify a Connection String” page and Import Wizard would launch Data Link Properties dialog. Click on the “Provider” tab in the dialog and you will see even longer list of OLEDB providers registered on your machine:

clip_image003

You can then choose any of the providers, but for the interest sake choose a “Microsoft OLE DB Provider for ODBC Drivers”.

Using this option you could then import data into Gemini model using any of the ODBC drivers you have installed on your machine.

clip_image004

But what about .Net providers? There isn’t a link or any other high level selection you make. The choice for particular provider type is accessible as advanced option once you’ve selected the type of data source.

Let’s go back to the first page of the Import wizard and choose “Microsoft SQL Server” as a source. Click next and in the next page click on the “Advanced” button right below the Database name. clip_image005

The Advanced properties dialog is available for you to control fine details of your connection. You can pick and choose particular provider type from the Providers drop down box. You can change any of the provider properties in the properties window below. You can also see the complete connection string used to establish connection in the box below. If you see empty property box like this:

clip_image006

This means that you don’t have particular provider installed. Go back and install the provider you would like to use.

Choosing data to import

Once you’ve selected a datasource you would see a choice to import data by ether selecting list of tables or by writing your own SQL query.

In CTP2 Gemini Import Wizard does not give you a SQL Query builder. You simply get a text box where you can either write or paste a query you’ve constructed elsewhere. Gemini Import Wizard will try to validate the query and will not let you proceed if you have a syntax error.

If you choose to go with Tables selection, you would see a “Select Tables and Views” page

clip_image008

Here you can pick one or many tables. “Select Related Tables” will cause Gemini Import Wizard to select all tables related to the once you’ve already selected if this information is present in the data source. Import Wizard will also remember table relationships and create these between tables in the Gemini model.

You can also preview, explicitly select the columns, and filter content of particular table

clip_image010

Don’t worry if you don’t see entire content of your table- Preview dialog is only showing the first 50 rows using something like “Select Top 50 <columns> from <table>”.

This dialog also lets you to delete any columns you don’t need and set filters to limit amount of data you’re bringing into Gemini.

It is very good idea to restrict the import to just the data you need. Large number of unnecessary columns or rows will take up unnecessary space in the Gemini model.

Once you’ve made your choice, the last page – Import Summary, lets you review your choices. Just in case you forgot something and need to go back before import starts.

clip_image012

Clicking on the finish button would bring up the Progress Dialog and would start actual import of the data.

clip_image014

Here you see number of rows imported per table and if any errors occurred during the import you could see an error message by following a link that appears in the Message column. All tables are imported serially and speed of data import largely depends on the ability of your data source to deliver the data. While importing large tables you would see progress showing count of rows imported (in increments of 10,000).

The last row in the progress report – “Data preparation” is the stage where Gemini creates relationships between tables.

If you forgot something or import takes too long, you can cancel import. In this case all tables that show “Success” will be imported and any following tables would show error. Also the ability of Gemini import wizard to cancel import operation might be dependent on ability of the relational backend to cancel the query Gemini sends to it. Imagine you’ve constructed complex SQL query to bring data into Gemini and it takes awhile for relational database to process it. If you try and cancel out of import operation, the Gemini Import Wizard would forward cancel request to the relational database but that might take longer to respond, keeping you waiting.

Conclusion.

We discussed various details on how you can import data from various data sources into the Gemini model. If you’ve worked with Excel or other data modeling tools – data import in Gemini should be quite intuitive. Some of the concepts of the importing data are still little rough. But this is the first CTP and we will try to improve on data import experiences further. In the meantime, if you have any specific data import questions or observations, please give us feedback on the Gemini newsgroups.

Gemini Blogs & Videos

It’s been almost a month since we released the first CTP of Gemini and there has been a lot of activity on the newsgroups and in the blogosphere. Here are some interesting blog links…

Kasper de Jonge (http://business-intelligence.kdejonge.net/)

Thomas Ivarsson (http://thomasianalytics.spaces.live.com/)

Chris Webb (http://cwebbbi.spaces.live.com/)

Vidas Matelis (http://www.ssas-info.com/VidasMatelisBlog/)

Marco Russo (http://sqlblog.com/blogs/marco_russo/)

Also check out Donald Farmer’s blog (http://www.beyeblogs.com/donaldfarmer/). It has a somewhat old post (Microsoft Project Gemini links) with lots of links to articles, press reports, analyst reports and blogs about Gemini.

Excel 2010 is a key part of the Gemini experience. Check out the Excel Team Blog (http://blogs.msdn.com/excel) for information about the cool features in Excel 2010 such as slicers, sparklines, improved data bars and charts, etc.

Got a minute? Watch a geminute at http://www.youtube.com/geminute. This is Donald Farmer’s new YouTube channel featuring one minute videos on Gemini. Also check out http://www.microsoft.com/officebusiness/office2010/Default.aspx?vid=Geminiv for an interesting video that highlights the Gemini value proposition.

That’s it for now.

Overview of Gemini features

The key question behind Gemini is: how much value can an Excel user derive from Microsoft’s BI stack without having to pick up a book on OLAP, Dimensional Modeling or Analysis Services?

Significant, as it turns out.

This article provides a quick overview of Gemini features, pausing on the way to point out key design goals. Note that the content of this article is based on August Community Technology Preview (CTP) and will likely change in the future.

Introducing the Gemini Addin for Excel

After installation, Gemini shows up on the Excel 2010 ribbon:

clip_image002

Clicking on ‘Load & Prepare Data’ button launches the Gemini addin window:

clip_image004

Let’s pause here to recognize how Gemini and Excel complement each other:

  • Gemini addin provides an embedded data engine that can:
    • efficiently hold and operate on data sets larger than those possible with Excel
    • import data from a variety of data sources
    • allow modeling operations such as specification of relationships between tables, and formulas expressing business logic
  • Gemini addin also provides a Gemini Client Window (shown above) which is an Excel-oriented user interface over the embedded data engine
  • Excel provides powerful visualization capabilities such as PivotTables, PivotCharts and Slicers. Together with the Gemini addin, these visualizations can also use Gemini data. Remember that with Excel Services, these are also available in a thin browser environment!
  • An Excel document is a great way of packaging up both data and visualization. This means everything you see in the Gemini addin window is saved inside your workbook to ease management and sharing. No separate files to worry about keeping in sync, no separate services to manage.

Importing Data into Gemini

Of course it all starts from data, and because of that, Gemini enables you to bring in data from a variety of different data sources. Some of these are shown below, future CTPs will likely allow you to access other types of data as well:

clip_image005

Based on the type of data source, you can select from a list of tables, preview data, select and filter columns to import, and/or specify a query to use for the import.

Copy/Paste

Users frequently want to combine their data with a small dataset they got in an email, found on the web, or just typed in themselves. As you would expect, Gemini addin allows mashing up this with the rest of the data in Gemini. If the source data is coming from the host Excel document, Gemini also allows created a link between to keep this data in sync.

Data Feeds

In addition to the structured data sources that you expect Gemini would support, Gemini also supports reading data from Atom-based Data Feeds or from SQL Server Reporting Services 2008R2 reports. The beauty of these is that consumer don’t need to have direct access to the underlying data sources and don’t have to replicate any logic built into the report definition either… they just point and consume and data feeds show up as any other table in Gemini!

image

Working with Tables

Once tables have been imported into Gemini, they show up as tabs in the Client Window:

image

If the data source contained relationship between these tables, Gemini data import process would pick them up automatically. Otherwise, you can create these manually:

Why would you want to specify these relationships? Knowledge of relationships allows Gemini to navigate and group data correctly, and this prevents users from having to bring in all columns from related tables into one very wide conglomerate.

Calculations

Imported data is a good starting point. While we’ll have a more drill down into formulas in the future, from an overview perspective, it’s important to recognize that a design point of Gemini is to allow customers to express their business logic without requiring knowledge of dimensional modeling. Excel users expect the following simple formulas to work:

=if([Gross]>100000000, “Blockbuster”, if([Gross]>10000000, “Normal”, “Tragic”))

… and it does with Gemini.

Architecturally, Gemini calculation engine does not push its data to Excel to recalculate. That would take too long and given the Gemini data engine can support data sets much larger than what Excel does natively, could run into scalability limits. Instead, Gemini implements significant number of Excel functions itself. Here’s a snippet from the August CTP list:

clip_image006

You’ll notice that some familiar functions – ABS, AVERAGE, AVERAGEA – show up as expected by Excel users. Some others – ALL, ALLEXCEPT, AVERAGEX – are new. While building Gemini, we wanted to create a tool that’s easy to start with, but kept up when demands grow and users want to push further. Here’s a quick example:

Let’s say we have a database that contains a list of movies in a Movie table and box office sales in the BoxOffice table. Of course, a particular movie may have BoxOffice sales across years, and the user wants to find out what was the total money made by a movie across these years. How would we do it in Gemini? Well, once a relationship has been created between these two tables, all we have to do is for each row in the Movie table, go across to the BoxOffice, restrict the rows for just that movie, add up the Gross column for those rows and return results. Then repeat for all movies. Sounds complicated? Not if you know about the RELATEDTABLE and SUMX functions:

=SUMX(RELATEDTABLE(BoxOffice),BoxOffice[Gross])

We’ll cover this in more depth in the future.

Using Excel Visualizations

Importing and modeling is very useful to shape the data perfectly. In order to allow users to create rich, interactive, BI applications, the Gemini experience depends on Excel for the next step: visualizations.

Excel has a robust arsenal of tools in this area: PivotTables, PivotCharts, and numerous configuration options for these. Excel 2010 also introduces Slicers which provide both slicing of data in other connected controls as well as feedback to other Slicers based on data. For example, if we didn’t have any sales in Canada for 2005, after selecting Canada, 2005 would automatically be colored to reflect the fact that there’s no data there:

image 

Continuing with the principle that end users shouldn’t have to worry about what’s a dimension and what’s a measure and why can I use one as the other, Gemini addin also overrides the PivotTable field list with one that we believe is much easier. Note again the tight integration between Gemini and Excel, Slicers show up as drop areas in the field list:

image

Building a Rich, Appealing BI Application

After spending a few minutes formatting and laying out controls just the way we want them, we end up with a very functional, self-contained workbook which acts as an application:

image

 

Sharing Gemini Applications

While many workbooks are built for personal use, some are worthy of being shared across a workgroup. Here again, Gemini works the way Office users do. Since Gemini data is stored within an Excel document file, any way to move that document – through file shares, emails, publishing to SharePoint, etc. – transport the Gemini contents along as well. Users without the Gemini addin can browse the data, those with the addin get the full experience. Just as Excel and Gemini light up together, Gemini also extends SharePoint capabilities in several ways.

Report Gallery

For the more visually-inclined amongst us, a flat SharePoint list leaves something to be desired. File names, data last updated and by who are useful but only tell part of the story. Gemini provides Silverlight based skins that present different views on document libraries. These views show snapshots of the contents of documents. In the example below, we see two workbooks with two spreadsheets within them:

image 

These snapshots are also live links in that clicking on a thumbnail of the a worksheet will take users directly into ECS with the worksheet loaded.

Scheduled Data Refresh

The Gemini model embedded within the spreadsheet keeps information about where data came from. Once published to SharePoint, users can specify schedules for the data refresh operation so the workbooks use the resources of the server to stay fresh.

 

Using Gemini Applications as Data Sources

Once published to SharePoint, Gemini models embedded within workbooks appear as an Analysis Services databases! This means any AS client tool – Excel, Report Builder, etc. – can connect to this database as if it were on just another AS server. The only difference for these clients is use of a URL to the document stored in SharePoint instead of a server name. Gemini services running on SharePoint handle loading the right database, managing its lifetime, and transparently redirecting client queries to the right database on the right server.

image

Summary

This was a whirlwind tour of Gemini functionality as it is available in August CTP. This link describes how you can get your own hands on the CTP.

In the next few weeks we’ll dig in deeper to specific areas. In the meantime, keep the feedback coming on the Gemini newsgroups!

Getting Access to the Gemini CTP

As indicated in the previous blog post, you can get access to the Gemini CTP if you are a member of the Office 2010 Tech Preview by visiting http://www.microsoft.com/officebusiness/office2010/Default.aspx?vid=Gemini.

If you are not a member of the Office Tech Preview, you can still apply for the Gemini CTP. We have a limited number of seats that we can refer to the Office Tech Preview, so hurry! Here are the instructions:

  • Visit this link in your browser.
  • If you are not registered in Microsoft Connect, you will need to register by supplying a Windows Live ID and some basic information about yourself.
  • Once you are registered, the above link will take you to the SQL Server Connect site within MSConnect.
  • At the top of the page, you should see “Fill out the Gemini CTP2 Nomination Profile – Required” in red. You don’t need to register again. Click on the second link and it will take you the Gemini Nomination page.
  • Click on “Respond to this survey”.
  • Complete the survey and press Finish.

That’s it! We will review your application and if it meets our criteria and Office Tech Preview seats are still available, we will contact you by email.

Welcome to the Gemini Team Blog!

Gemini is the code name for the new breakthrough Self-Service Business Intelligence (BI) capabilities being delivered in the SQL Server 2008 R2 release. Gemini enables end users to build BI applications by integrating data from a variety of sources, modeling, refining and analyzing the data, adding business logic, building reports and visualizations and ultimately sharing it with their coworkers in an environment that is managed and secured by IT. See SQL Server 2008 R2 | Self-Service Business Intelligence for more information.

Following is a sneak preview of Gemini that Donald Farmer (from our team) presented at TechEd 2009 this summer.

 

  

 

The Gemini team is proud to announce our first preview release as part of the SQL Server 2008 R2 August CTP (Community Technology Preview). See the Data Platform Insider Blog for more information on the August CTP. To download the Gemini CTP, visit http://www.microsoft.com/officebusiness/office2010/Default.aspx?vid=Gemini. Gemini requires Office 2010, so you must be a member of the Office 2010 Tech Preview.

In the days, weeks and months to come, folks from the Gemini product team will be blogging here and sharing information about the product.

Enjoy!

Page view tracker