<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Alik Levin's : Operations</title><link>http://blogs.msdn.com/alikl/archive/tags/Operations/default.aspx</link><description>Tags: Operations</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Enterprise Architect's Best Friend Is Production System Engineer</title><link>http://blogs.msdn.com/alikl/archive/2009/02/10/enterprise-architect-s-best-friend-is-production-system-engineer.aspx</link><pubDate>Tue, 10 Feb 2009 16:31:08 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9410539</guid><dc:creator>alikl</dc:creator><slash:comments>9</slash:comments><comments>http://blogs.msdn.com/alikl/comments/9410539.aspx</comments><wfw:commentRss>http://blogs.msdn.com/alikl/commentrss.aspx?PostID=9410539</wfw:commentRss><wfw:comment>http://blogs.msdn.com/alikl/rsscomments.aspx?PostID=9410539</wfw:comment><description>&lt;table cellspacing="5" cellpadding="2" width="557" border="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="265"&gt;&amp;#160;&lt;a href="http://practicethis.com/" target="_blank"&gt;&lt;img title="Alik Levin" height="50" alt="Alik Levin" src="http://blogs.microsoft.co.il/blogs/mcs/WindowsLiveWriter/d20b00ba5cce_FD44/image_5.png" width="50" border="0" /&gt;&lt;/a&gt;&amp;#160;&amp;#160;&amp;#160; Dear software architect! When you build your new system. Do you think about end users? Of, course you are! That is why you build this new system - your end users demand it.           &lt;br /&gt;If you think about end users - can you tell me who are they? Right, the people who will actually use it. Don't you think something is missing? Don't you think there is another guy who&amp;#160; will be using your system? Are you taking Production System Engineer into account?&lt;/td&gt;        &lt;td valign="top" width="275"&gt;&amp;#160;&lt;a href="http://blogs.msdn.com/blogfiles/alikl/WindowsLiveWriter/EnterpriseArchitectsBestFriendIsProducti_D787/image_6.png"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="image" src="http://blogs.msdn.com/blogfiles/alikl/WindowsLiveWriter/EnterpriseArchitectsBestFriendIsProducti_D787/image_thumb_2.png" width="164" border="0" /&gt;&lt;/a&gt;           &lt;br /&gt;          &lt;p&gt;&lt;em&gt;&lt;font size="1"&gt;by &lt;/font&gt;&lt;/em&gt;&lt;a href="http://www.flickr.com/photos/21276832@N02/"&gt;&lt;b&gt;&lt;em&gt;&lt;font size="1"&gt;dariorug&lt;/font&gt;&lt;/em&gt;&lt;/b&gt;&lt;/a&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;h3&gt;Why Should You Care About Production System Engineer&lt;/h3&gt;  &lt;p&gt;Production system engineer is the guy you should give special treatment. Why? Because she is to maintain your system in production. If you build a better system from operations perspective, it'd get a better treatment during the production by the system engineer. If the system will have better treatment it'd better treat it's end users. Connecting the dots?&lt;/p&gt;  &lt;h3&gt;What Production System Engineer Cares The Most?&lt;/h3&gt;  &lt;p&gt;From my observations this is what Production System Engineers Care the most:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;What do I check when end users ask me&amp;#160; the following questions?      &lt;ul&gt;       &lt;li&gt;Why it is not working? &lt;/li&gt;        &lt;li&gt;Why it works so slow? &lt;/li&gt;        &lt;li&gt;Why I am not allowed to do this operation? &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;How do I configure this? &lt;/li&gt;    &lt;li&gt;What alerts your system raises when it fails? &lt;/li&gt;    &lt;li&gt;Where all alerts are sent? &lt;/li&gt;    &lt;li&gt;How do I roll back the version? &lt;/li&gt;    &lt;li&gt;What should I do when I see specific alert? &lt;/li&gt;    &lt;li&gt;How do I distribute patches for your system? &lt;/li&gt;    &lt;li&gt;How do I know what is the source of the incident? &lt;/li&gt;    &lt;li&gt;How do I get detailed information regarding the incident? &lt;/li&gt;    &lt;li&gt;How do I recognizes the trends that usually lead to incident? &lt;/li&gt;    &lt;li&gt;How do I back up the configuration? &lt;/li&gt; &lt;/ul&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;Make friends with Production System Engineer. Ask her tons of questions, know her pains, offer the solutions that relives the pain, or bettor off removes it completely.&lt;/p&gt;  &lt;p&gt;What else should an Architect take into account when thinking about operations and Production System Engineer? What's your take?&lt;/p&gt;  &lt;h3&gt;Related Materials&lt;/h3&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2008/01/20/design-for-operations-dfo-problems-and-solution-frame.aspx"&gt;Design For Operations [DFO] &amp;#8211; Problems And Solution Frame&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/ace_team/archive/2008/02/14/do-you-really-need-a-distributed-architecture.aspx"&gt;Do You Really Need A Distributed Architecture?&lt;/a&gt; &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;strong&gt;This post is made with &lt;a href="http://practicethis.com/" target="_blank"&gt;PracticeThis.com&lt;/a&gt; plugin for Windows Live Writer&lt;/strong&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9410539" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/alikl/archive/tags/Operations/default.aspx">Operations</category><category domain="http://blogs.msdn.com/alikl/archive/tags/Architecture/default.aspx">Architecture</category></item><item><title>Design For Operations [DFO] – Problems And Solution Frame</title><link>http://blogs.msdn.com/alikl/archive/2008/01/20/design-for-operations-dfo-problems-and-solution-frame.aspx</link><pubDate>Sun, 20 Jan 2008 20:02:06 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7173960</guid><dc:creator>alikl</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/alikl/comments/7173960.aspx</comments><wfw:commentRss>http://blogs.msdn.com/alikl/commentrss.aspx?PostID=7173960</wfw:commentRss><wfw:comment>http://blogs.msdn.com/alikl/rsscomments.aspx?PostID=7173960</wfw:comment><description>&lt;p&gt;patterns &amp;amp; practices team maintains &lt;a href="http://www.codeplex.com/dfo"&gt;Design for Operations [DFO] project on codeplex&lt;/a&gt;. The goal of the project focuses on:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&amp;#8220;Developing tools and guidance to help enable the development of highly manageable applications on the Windows platform.&amp;#8221; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This post summarize my understanding of the project&amp;#8217;s problems and solutions frame. Most of the content is direct copy paste from more than 300 pages Manageability Guidance document found &lt;a href="http://www.codeplex.com/dfo/Release/ProjectReleases.aspx?ReleaseId=2770" target="_blank"&gt;here&lt;/a&gt; and few interpretations of mine.&lt;/p&gt;  &lt;p&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;&lt;font size="4"&gt;Problems Frame&lt;/font&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Active players and their concerns&lt;/b&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;b&gt;End User&lt;/b&gt;.       &lt;ul&gt;       &lt;li&gt;Why it is not working? &lt;/li&gt;        &lt;li&gt;Why it works so slow? &lt;/li&gt;        &lt;li&gt;Why I am not allowed to do this operation? &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Operator&lt;/b&gt;.       &lt;ul&gt;       &lt;li&gt;How do I configure this? &lt;/li&gt;        &lt;li&gt;Why it failed without alerts? &lt;/li&gt;        &lt;li&gt;Where all alerts are sent? &lt;/li&gt;        &lt;li&gt;How do I roll back the version? &lt;/li&gt;        &lt;li&gt;What should I do when I se this alert? &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Developer&lt;/b&gt;.&lt;b&gt;&lt;/b&gt;       &lt;ul&gt;       &lt;li&gt;How come end users do not understand exception message? &amp;#8211; it is simple call stuck dump! &lt;/li&gt;        &lt;li&gt;What do I do with this &amp;#8220;Unspecified error&amp;#8221; thing? &lt;/li&gt;        &lt;li&gt;What component throws this exception? &lt;/li&gt;        &lt;li&gt;Here is the patch &amp;#8211; just drop it to fix the problem. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Operations Challenges&lt;/b&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;How do I know what is the source of the incident? For example, &amp;#8220;It is IIS authentication&amp;#8221;. &lt;/li&gt;    &lt;li&gt;How do I get detailed information regarding the incident? For example, &amp;#8220;SPN is not configured for IIS Application account&amp;#8221;. &lt;/li&gt;    &lt;li&gt;How do I recognizes the trends that usually lead to incident? &amp;#8220;Yesterday we had 10% CPU utilization and today it is 20% - it must mean something&amp;#8221;. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;b&gt;&lt;font size="4"&gt;Solution Frame&lt;/font&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Representing Applications as Managed Entities&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;A &lt;i&gt;managed entity &lt;/i&gt;is any logical part of an application that a system administrator needs to configure, monitor, and create reports about while managing that application or service. Examples of managed entities are a Web service, a database, an Exchange routing group, an Active Directory site, a computer, a server role, a network device, a hardware component, or a subnet.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/alikl/WindowsLiveWriter/5d7032e04ef4_65E0/clip_image002_2.gif"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="243" alt="clip_image002" src="http://blogs.msdn.com/blogfiles/alikl/WindowsLiveWriter/5d7032e04ef4_65E0/clip_image002_thumb.gif" width="379" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Model Comprehensive Management Models&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;Creating a comprehensive management model consists of modeling in a variety of different areas to provide a total system view, including the following:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;b&gt;Configuration modeling&lt;/b&gt;. This involves encapsulating all the settings that control the behavior or functionality of an application or system component. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Task modeling&lt;/b&gt;. This involves cataloging the complete list of tasks that administrators have to perform to administer and manage a software system or application. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Instrumentation modeling&lt;/b&gt;. This involves capturing the instrumentation used to record the operations of a system or application. Instrumentation provides information to the operations team to increase understanding about how the application functions, and to diagnose problems with an application. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Health modeling&lt;/b&gt;. This involves defining what it means for a system or application to be healthy (operating normally) or unhealthy (operating in a degraded condition or not working at all). A health model represents logically the parts of an application or service the operations team is responsible for keeping operational. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Performance modeling&lt;/b&gt;. This involves capturing the expected baseline performance of an application. Performance counters can then be used to report and expose performance on an ongoing basis, and a monitoring tool can compare this performance to the expected performance. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;b&gt;Building Effective Health Models&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;An application is considered healthy if it is operating within a series of defined parameters. A number of factors may result in a change in application health, including the following:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Change in application configuration &lt;/li&gt;    &lt;li&gt;An application update &lt;/li&gt;    &lt;li&gt;A change in an external dependency &lt;/li&gt;    &lt;li&gt;A hardware change &lt;/li&gt;    &lt;li&gt;A network change &lt;/li&gt;    &lt;li&gt;Bad input to the application &lt;/li&gt;    &lt;li&gt;Scalability problems &lt;/li&gt;    &lt;li&gt;Operator error &lt;/li&gt;    &lt;li&gt;Change in deployment &lt;/li&gt;    &lt;li&gt;Malicious attack &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;b&gt;Steps to handle the problem&lt;/b&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Detect a problem. &lt;/li&gt;    &lt;li&gt;Verify that the problem still exists. &lt;/li&gt;    &lt;li&gt;Diagnose the cause(s) of the problem. &lt;/li&gt;    &lt;li&gt;Resolve the problem. &lt;/li&gt;    &lt;li&gt;Verify that the problem was resolved. &lt;/li&gt;    &lt;li&gt;[My addition] Log the incident and convert it into Knowledge Base gem. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/alikl/WindowsLiveWriter/5d7032e04ef4_65E0/image_2.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="167" alt="image" src="http://blogs.msdn.com/blogfiles/alikl/WindowsLiveWriter/5d7032e04ef4_65E0/image_thumb.png" width="467" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;There are few key terms mentioned above - &amp;quot;Modeling&amp;quot;, &amp;quot;Design&amp;quot;, &amp;quot;Building&amp;quot;, &amp;quot;Maintain&amp;quot;, &amp;quot;Testing&amp;quot;. To me it is absolutely clear that Design For Operations is no different from Security Development Lifecycle or Performance Development Lifecycle. &amp;quot;Operations&amp;quot; is just another important non-functional requirement that needs to be taken throughout the whole development lifecycle to be successfully implemented and deployed in production. It had to be called Operations Development Lifecycle.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;My related posts&lt;/strong&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/07/16/use-sysinternals-debugview-to-diagnose-the-application.aspx"&gt;Use Sysinternals DebugView To Diagnose The Application&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/05/02/asp-net-health-monitoring-means-logging-and-auditing.aspx"&gt;ASP.NET Health Monitoring Means Logging And Auditing&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/05/07/security-engineering-big-rocks.aspx"&gt;Security Engineering Big Rocks&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/04/27/threat-modeling-big-chunks.aspx"&gt;Threat Modeling Big Chunks&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/07/04/t-shooting-kerberos.aspx"&gt;T-Shooting Kerberos&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/04/03/who-access-my-file.aspx"&gt;Who Access My File?&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2008/01/14/chain-of-responsibility-design-pattern-focus-on-security-performance-and-operations.aspx"&gt;Chain Of Responsibility Design Pattern &amp;#8211; Focus On Security, Performance, And Operations&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/11/02/identify-asp-net-web-services-and-wcf-performance-issues-by-examining-iis-logs.aspx"&gt;Identify ASP.NET, Web Services, And WCF Performance Issues By Examining IIS Logs&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/alikl/archive/2007/08/15/use-performance-counters-templates-to-streamline-performance-analysis.aspx"&gt;Use Performance Counters Templates To Streamline Performance Analysis&lt;/a&gt; &lt;/li&gt; &lt;/ul&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7173960" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/alikl/archive/tags/Practices/default.aspx">Practices</category><category domain="http://blogs.msdn.com/alikl/archive/tags/Planning+Phase/default.aspx">Planning Phase</category><category domain="http://blogs.msdn.com/alikl/archive/tags/Operations/default.aspx">Operations</category></item></channel></rss>