Welcome to MSDN Blogs Sign in | Join | Help

When jokes become reality

Couple years ago Charles Petzold wrote about C# Application Markup Language - XML syntax for C#. Go read it if you've not seen it yet.

Well, it was a fool day's joke of course (the publishing date at the bottom hints about it).

But yesterday, when playing with Windows Workflow Foundation rules, I've looked at the ruleset file, and discovered exactly that - huge XML file that looked familiar :), very close to Charles joke. This is how WWF internally stores the simple one-line rules that one types in the editor. Luckily for us, we don't have to type this XML, and WWF convert the human-readable C# into this monstrosity itself, so it is more an internal details than a language humans would ever use.

Posted by michen | 1 Comments
Filed under: ,

The way NOT to write HTTPS server

Note: posted under rant tag, so you are warned ;)

I've just got a new wi-fi router, DIR-655 from DLink. Seems like a nice router, but at least one feature is just plain horrible broken. And it is security feature, which makes me wonder how secure the rest of the code is.

The router can be accessed and managed using HTTP server, like most any other one. It also has an option to enable HTTPS server, which was probably supposed to make management more secure. This is not very important if you connect from local network, but very important if you do remote management from the internet.

So I decided to enable it and connect using HTTPS. Internet explorer immediately complained that there are problems with certificate and advised me against proceeding :). IE warned me that (1) the certificate cannot be validated, (2) the certificate has expired, and (3) the certificate was issued to incorrect site. I proceeded nevertheless, checked the certificate, and indeed - it is self-signed, expired in September 2008, and issued to site www.dlink.com, which is obviously different from 192.168.0.1 :). Worse, I asked other guys with same router to compare certificate hash - it turned out all the routers are shipped with the same certificate!

I can understand the self-signing nature of the certificate - obviously DLink cannot put real certificate to the router. But why has it expired, shared by all routers, and most bizzar - why it indicates www.dlink.com as site name?

They could have easily generated an individual certificate for each router, issued to correct internal IP address for internal-facing server and to DynDNS name for internet-facing server. It does not add any hardware cost; the software could just generate a random self-signed certificate the very first time the router boots with a new configuration. User could then configure his browser to trust this particular certificate, and know he connects to his own router, not any of the thousands other routers with same certificate.

The way they did this feature, it is totally broken and makes no sense at all.

Posted by michen | 0 Comments
Filed under: ,

Does buffer.NextRow() skips the first row in a buffer?

I got a follow up question to my old post regarding enumerating rows in SSIS buffer, that suggested using following code to process rows in custom SSIS transform:

while (buffer.NextRow())
{
 
// do something with the row
}

Here is the question:

buffer.NextRow() moves the pointer forward, so following your code to the letter will make you skip the first row.

Does it? Actually, no. It behaves like many other enumeration interfaces, e.g. COM's IEnumVariant and .NET's IEnumerator, so I'll quote documentation from IEnumerator::MoveNext:

After an enumerator is created or after the Reset method is called, an enumerator is positioned before the first element of the collection, and the first call to the MoveNext method moves the enumerator over the first element of the collection.

Can you guess why it was designed this way? Let's think what would happen, if the enumerator was positioned on the first element (row in SSIS case) initially. How would we know if the first element exists at all? The enumerator would have to provide another property, like EndOfCollection - making everything more complicated. Note that as explained in my linked post, buffer.EndOfRowset() is not such an indicator. It does not tell you that enumerator finished enumerating rows in current buffer, it tells you that current buffer is the very last buffer you will receive.

With enumerator initially positioned before first row, you call NextRow() which will immediately return false if the collection is empty, and position iterator to first row if it exists. So the code is correct.

A note regarding my original post: the SSIS team found that this change caused too many problems for the users, so the final release of SSIS 2008 reverted back to the SSIS 2005 behavior. Interestingly, it was not a simple undo of code change, as the data flow engine has been substantially rewritten, but a new code to simulate the old behavior. Thus the "wrong" code will keep working in 2008. I still recommend changing it according to my previous blog - I think my loop just looks cleaner :).

Posted by michen | 1 Comments
Filed under: ,

It's Not Easy Being BI

A very funny post by Matthew Roche, for everybody working in Business Intelligence:

http://bi-polar23.blogspot.com/2008/06/it-not-easy-being-bi.html

Posted by michen | 0 Comments
Filed under:

Lookup multiple rows?

Can SSIS Lookup do what this user wants it to do?

I have a problem with a lookup output, I get this warning: The Lookup transformation encountered duplicate reference key values when caching reference data. I know what it is, but I don't like to avoid this warning, I'd like to get all the rows (two in this case) that the lookup output provides me.

Unfortunately, no - the reason is that Lookup transform is synchronous, i.e. it does not add new rows or remove rows*, it just modifies the values - i.e. it can't produce two output rows for one input row.

It would of course be possible to make an asynchronous Lookup, or provide an option, but the current Lookup is complex enough, that I think more options would kill it :)

If you need this functionality, you can use Merge Join transform.

Notes
*What happens with the rows that are redirected to "not found" output in SSIS 2008? They are not deleted from the buffer (synchronous transform can't do it), they are just marked as belonging to the other path, and the components on the main path do not see them.

Posted by michen | 3 Comments
Filed under: ,

Application termination when user logs off

Do you know how windows terminates all the applications when user logs off? I did not think too much about this, and assumed that it is a normal process - after all the WM_QUERYENDSESSION and WM_ENDSESSION processing, the application main window closes, posts WM_QUIT and the application quits in a regular way.

But a recent bug reported for one of my GUI applications caused me to look deeper. The application settings, normally saved at application's exit, were not saves if user logged off, and this made me look closer at what happens at logoff.

This was a managed application with related code in Main() function that looked like this:

static void Main()
{
    LoadPreferences();
    Application.Run(new MainForm());
    SavePreferences();
}

This works great when the application is closed normally. But if user logs off, the application closes, but the preferences are not saved - the code following Application.Run is never called at all.

What happens? Does Application.Run throw any exception that causes the following code to be bypassed? This was ruled out rather quickly by debugger. I then assumed something in Windows Forms calls ExitProcess in response to WM_ENDSESSION, or maybe default message handler does this - but that was proved wrong too. A repro with unmanaged ATL code showed it is not related to managed libraries at all.

Finally my colleguae debugged this issue a little deeper, and found that CSRSS is blatantly terminating the process after it processed the WM_ENDSESSION event. Here are the outlines of the whole sequence (for non-console applications).

The user selects Start/Log off and then selects “OK” in the confirmation dialog. This calls ExitWindowsEx(EWX_LOGOFF); if during the OK button click, the CTRL key was down, it also adds the EWX_FORCE flag which makes it ignore the result of the WM_QUERYENDSESSION message (as if the applications always returned TRUE). ExitWindowsEx(EWX_LOGOFF) causes roughly the following activity to occur in the logon session’s instance of CSRSS:

For each process in the session
{
    For each UI thread in the process
    {
        For each top-level window in the thread
        {
            Send WM_QUERYENDSESSION and get back the result
            Wait with a short timeout and show an “End program” dialog if it takes too long; if the user says “kill”, call TerminateProcess on the process and continue the “for process” loop
            If EWX_FORCE was not specified and WM_QUERYENDSESSION returned FALSE, break (and continue "UI thread loop");
        }
        // Note that even if one window in one thread returned FALSE to WM_QUERYENDSESSION, windows in the other threads in the same process are still sent WM_QUERYENDSESSION
        bool bDoShutdown = (all threads agreed (i.e., all windows returned TRUE to WM_QUERYENDSESSION)) or EWX_FORCE
        For each UI thread in the process
        {
            For each top-level window in the thread
            {
                Send WM_ENDSESSION(bDoShutdown);
                If (bDoShutdown) wait for return (wait with a short timeout and show an “End program” dialog if it takes too long; whatever the user says, consider that the return from WM_QUERYENDSESSION) 
            }
        }
        If (bDoShutdown) TerminateProcess on the process
}
I have omitted some details, in particular related to no-UI and console processes, but the general picture should be clear. Probably the Windows guys decided that graceful cleanup is not needed when user logs off, and all the application are closed.

Since the application is forecefully terminated, the application should not rely on being able to execute any code after the message loop. Even classes like SafeHandle are not finalized. All really important cleanup and termination code should be executed when the main form closes, or by providing a handler for the WM_ENDSESSION message . Another option (although I did not try it) is to catch WM_ENDSESSION message and terminate the message loop, then exit the application gracefully - although this goes contrary to Windows design of killing applications fast to ensure quick log off.

Update: Raymond Chen described the reasons for this behavior:
http://blogs.msdn.com/oldnewthing/archive/2008/04/21/8413175.aspx

Posted by michen | 7 Comments
Filed under:

Configuring .NET for running SSIS packages from custom applications

If you execute SSIS packages from custom applications, you own the application and thus you are responsible for configuring .NET runtime properly to get the maximum performance.

.NET configuration is usually performed using .exe.config files, so it is a just matter of providing good config file. How do you know what is good? The simplest way is to look at the config file that SSIS provides, and copy the appropriate settings. Now let's take a look at DTExec.exe.config provided with SQL Server 2008 and discuss the choices made by SSIS team.

<configuration>
 <
startup>
  <
requiredRuntime version="v2.0.50727"/>
 </
startup>
 <
runtime>
  <
gcServer enabled="true"/>
  <
disableCommitThreadStack enabled="true"/>
  <
generatePublisherEvidence enabled="false"/>
 </
runtime>
</
configuration>

requiredRuntime - this line simply tells which version of .NET you want to use. 2.0 or above should be good.

gcServer - DTEXEC uses the "server" version of garbage collector. Server GC performs better for typical SSIS load, especially on multiprocess machines. This is important if your SSIS package uses managed transforms (e.g. ADO.NET source, script transform, or custom transform written in .NET).

disableCommitThreadStack - by default .NET commits the thread stacks (i.e. reserves memory for stacks - usually from page file). But SSIS creates a lot of threads, while typically uses little stack space, so it performs better if stack memory is not immediately committed when the thread is created. With this option your application might perform a bit better and require smaller page file (note that anyway, the page file is not really used until needed). The drawback of this choice is that application might fail if Windows is totally out of memory, and application can't extend its stack. But in this situation something has already gone bad, and it is probably better to fail fast in this case anyway.

generatePublisherEvidence - this tells .NET runtime the DTEXEC does not use Publisher evidence, and thus .NET does not have to verify authenticode signatures. This increases the startup performance a little bit, but mainly prevents problems that may occur when authenticode checks certificate revocation list.

Now that you know these options made by SSIS team, you may test and decide whether they are appropriate for your application as well, and copy them to your application's config file if needed.

P.S. Also make sure you create SSIS package using MTA thread, see Matt's blog for details:
http://blogs.msdn.com/mattm/archive/2007/09/14/running-packages-from-custom-applications.aspx

Posted by michen | 1 Comments
Filed under: ,

Random() is only random if you are using it right

I like the quote "With great power comes great responsibility" when used in regards to .NET - .NET gives one great powers, but use it wisely and know how this stuff works.

Recently I saw code (it was written by a guy interviewing to our team) that demonstrated interesting problem with incorrect usage of Random class. The code generated a random position for an object, tested if it satisfied some condition; if it did not work - generated another random position, tested the new position and so on until it generated something that did work. The code to generate each position was pretty simple: create new instance of Random class, use it to generate 4 random integers. The percentage of failed position was not very big (but noticeable), the code to test the position was rather fast, and we needed just 10 good positions to create complete configuration (all this was about a variant of Battleship game).

But it took very long to generate the whole configuration - about a second on average.

Playing with this code, I suddenly found that moving Random object to a member variable, so it is re-used, rather than creating new Random object for each position, made the code 10000 times faster! Well, obviously I expected some savings since it creates less objects, but ten thousand times? My first reaction - is the Random really such a heavy object?

It turned out the problem was more interesting, and caused by the usage of default constructor. As MSDN puts it, if you use default constructor the "seed is initialized to a value based on the current time." It actually uses GetTickCount() as the seed. Can you now spot the problem? The Random() constructor was very quick, but during the period of time when GetTickCount() returns the same value (which is about 20ms for most chipsets), all the Random objects created this way will have the same seed and generate the same positions! Instead of trying new positions, the code tried the same position again and again until the GetTickCount() returned a new value. So don't generate too many Random instances - just one will behave much better in most cases.

Posted by michen | 0 Comments
Filed under:

SSIS event handler threading

I've got an interesting question/statement about event handlers:

Tasks fire the same EH at the same time. My understanding is all EHs fire at the same time (Parallel).

If I understand the question correctly, the package has an event handler that can handle multiple events, and these events fire at about the same time. What happens?

Well, I did not work much with event handlers, so let's experiment and test how it works. This is really simple. I've create a package with two dummy script tasks A and B (not connected with any precedence constraints) that fail (by returning failure) and OnError event handler on the Package. The event handler contains a simple script task that prints the name of the event source:

MsgBox(Dts.Variables("SourceName").Value, MsgBoxStyle.Information, "Event Handler")

What will be the result of executing this? Will we get two message boxes at the same time, or sequentially two message boxes?

The answer is - two message boxes popup sequentially one after another, but never at the same time. Ssometimes A is the first, sometimes B. The event handler (like any other task) is not reentrable and only a single instance of an event handler can be running at a time, thus the second event has to wait for the event handler to finish processing the first event before processing the next one.

Posted by michen | 0 Comments
Filed under:

Functional sort in C#

On an internal mailing list, we were discussing functional languages, and this Haskell sort code:

qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

While trying to explain how this code works (which is very different from what it looks like to C++/C# programmers due to lazy evaluation) I've come up with following C# code (with Linq) that is logically similar to Haskell version. Obviously, Haskell code is much nicer though.

static IEnumerable<T> Qsort<T>(IEnumerable<T> s) where T : IComparable<T>

{

    IEnumerator<T> e = s.GetEnumerator();

    if (e.MoveNext())

    {

        T x = e.Current;

 

        foreach (T t in Qsort(s.Skip(1).Where(y => y.CompareTo(x) < 0)))

            yield return t;

        yield return x;

        foreach (T t in Qsort(s.Skip(1).Where(y => y.CompareTo(x) >= 0)))

            yield return t;

    }

}

 

 

Posted by michen | 0 Comments
Filed under: ,

SQL 2008 & VS 2008

Currently SQL Business Intelligence Development Studio (BIDS) and all the project types (AS, IS and RS) live in Visual Studio 2005. So don't try to open a solution that contains IS project in VS 2008 yet. What about final SQL 2008 - now that Visual Studio 2008 is released - what are the plans for BIDS and support of BI projects VS 2008?

Note that I have to comment on unreleased software, so the plans may change, and this information is provided as-is (as everything else in this blog, but even more so :)). Still, we got a lot of questions, I'll try to summarize this discussion.

The SQL 2005 is not going to change, SSIS 2005 is hosted in VS 2005, and you can't open SSIS projects in VS 2008 until you install SSIS 2008.

 

For SQL 2008 the plans are different:

  1. SSIS 2008 RTM will work with VS 2008 (and only with VS 2008 - there will be no option to use SSIS 2008 project in VS 2005 IDE).
  2. It is not final yet when this will be implemented, maybe next CTP or a CTP refresh soon after it. Update: starting with CTP6, SSIS 2008 lives in VS 2008.
  3. SSIS 2008 designer (hosted in VS 2008) will convert the packages to SSIS 2008 format. SSIS 2005 runtime will not be able to execute new packages after this conversion.

Now regarding setup: how do I remove VS 2005 without impacting BIDS installation? If you have both VS 2005 and BIDS, there should be at least two entries for Visual Studio SKUs in Add/Remove Programs:

  1. VS 2005 Premier Partner Edition - this is the one installed by SQL 2005 (and current SQL 2008 CTP). Do not remove it if you want to keep SQL 2005 BIDS
  2. The actual VS, e.g. VS 2005 Team System, or VS 2005 Pro - this is the one you want to remove if you want to uninstall Visual Studio 2005.

 

That was about designing SSIS packages. If you want to write SSIS custom components (tasks, transforms, etc), you can use either VS 2005 or VS 2008 to develop either SSIS 2005 or SSIS 2008 components - there is a lot more flexibility here. If you use VS 2008 and want to target SSIS 2005, open project properties and make the project target .NET 2.0 runtime.

Posted by michen | 5 Comments
Filed under: ,

VALVe/Steam horrors

(Update: VALVe has removed the worst recommendations that I described below. I don't know if my article had any influence on this. They still want to run the Steam as a service though).

I originally planned to use this blog for work-related stuff only, but the VALVe drove me mad, so I decided to write this.

I run Vista on my home PC (and at work too, actually), and all of us run as non-admins (not Vista's protected admin, but real non-admin). My son plays Counter Strike, and thus has to run Steam. It worked mostly fine, until recently it started to ask to install Steam as a service. Well, a service for internet-facing software seems not good. Before installing, I decided to find out more about it, and the finding are much worse than I've ever expected. Here is an official FAQ from VALVe that I've found:

http://support.steampowered.com/cgi-bin/steampowered.cfg/php/enduser/std_adp.php?p_faqid=460

Here are some of the recommendations from this FAQ:

  • Go to: Start > Run and type in: cmd
  • Type in the following:
    net localgroup Administrators /add Local service
  • Restart your computer.

To put it simply, they ask to add Local service account to Administrators group! Apparently, they did not bother to find out the list of specific permissions they need, but simply figured out that administrator permissions "fix" the problem. That alone would be pretty bad, but they did even worse. Instead of creating special account for Steam or using some account that already has administrative permissions, they want to add account shared by multiple services to Administrators group, thus making not only their service, but all the services running on the machine much more dangerous and less secure.

Then VALVe gives even stranger recommendation:

If during this process you receive the error: System error 5 has occurred. Access is denied Please follow these instructions and then try the above steps again:

  1. Go to Start > Control Panel > System & Maintenance > Administrative Tools > Local Security Policy.
  2. In the left pane, expand Local Policies > Security Options.
  3. Double-click Network Security LAN Manager Authentication Level.
  4. In the drop down list, change the default setting (NTLMv2 only) to Send LM & NTLM - use NTLMv2 session if negotiated.

Changing LanMan level for Vista Home users to completely unsecure level? Maybe I'm missing something, but I can't see any reason to do this. This setting affects authentication to remove computers (mostly affects NT domain, should be of no concern to internet game), it would not help one to avoid local Access Denied while performing the steps above. It is not only completely unsecure, but IMO it also does not make any sense at all.

So these guys
1) don't understand anything about security,
2) can't write software that does not require administrative privileges,
3) don't even try to use the specific minimum list of permissions,
4) ask user to make completely unreasonable and unsecure configuration changes to OS configuration, that affect not just Steam service, but the whole machine,
5) want to run network-facing service with admin rights.

Now would you trust the guys who are clueless about security, obviously can't write secure code, and apparently don't care about YOUR machine security at all, to run network-facing service with admin rights on your machine? I don't. I think I'll try stay away from them from now on.

P.S. As with everything else in this blog, this post represents my personal opinion, and does not represent the view of my employer.

Posted by michen | 0 Comments
Filed under: ,

SQL 2008 November CTP is available

Get it from https://connect.microsoft.com/SQLServer

One of the improvements in this build is the ability to persist Lookup reference data and use non-OLEDB sources for Lookup.

Posted by michen | 0 Comments
Filed under: ,

Deploying packages

How can one deploy packages programmatically? Here is the original question -

Is it possible to deploy a package programmatically? We have an application which has a work flow for approval of object. If the object (ssis package) is approved by the concerned authority it has to be deployed to sql server. How can this code be integrated into the application?

First, what does it mean to deploy a package? The package is simply a document (DTSX file). SSIS utilities (DTEXEC, DTUTIL and other) natively support three locations for such files: the actual file in the OS file system, a row in SQL table, or virtualized file in SSIS storage (which wraps the previous two). If you use custom application to run your packages, you can store them anywhere you want (e.g. application resources) - and use LoadFromXML to load it.

Since package is just a file, in most cases simply copying a file to destination directory location, or to SQL table is all it takes to deploy a package. One can use SSIS API (like SaveToSqlServer) to do this, run DTUTIL, or copy file using Win32 or .NET APIs.

In some cases it is necessary to make some adjustments to the package, e.g. if you have multiple packages calling each other - modify the parent package to correctly reference new location of child package(s). This can be done using SSIS API as well, but it is easier and usually better for maintainability to do this using SSIS configurations - edit the configuration values instead of the package itself.

Posted by michen | 0 Comments
Filed under:
More Posts Next page »
 
Page view tracker