Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Nobody ever reads the event logs…

Nobody ever reads the event logs…

Rate This
  • Comments 19

In my last post, I mentioned that someone was complaining about the name of the bowser.sys component that I wrote 20 years ago.  In my post, I mentioned that he included a screen shot of the event viewer.

What was also interesting thing was the contents of the screen shot.

“The browser driver has received too many illegal datagrams from the remote computer <redacted> to name <redacted> on transport NetBT_Tcpip_<excluded>.  The data is the datagram.  No more events will be generated until the reset frequency has expired.”

I added this message to the browser 20 years ago to detect computers that were going wild sending illegal junk on the intranet.  The idea was that every one of these events indicated that something had gone horribly wrong on the machine which originated the event and that a developer or network engineer should investigate the problem (these illegal datagrams were often caused by malfunctioning networking hardware (which was not uncommon 20 years ago)).

But you’ll note that the person reporting the problem only complained about the name of the source of the event log entry.  He never bothered to look at the contents of this “error” event log entry to see if there was something that was worth reporting.

Part of the reason that nobody bothers to read the event logs is that too many components log to the eventlog.  The event logs on customers computers are filled with unactionable meaningless events (“The <foo> service has started.  The <foo> service has entered the running state.  The <foo> service is stopping.  The <foo> service has entered the stopped state.”).  And they stop reading the event log because there’s never anything actionable in the logs.

There’s a pretty important lesson here: Nobody ever bothers reading event logs because there’s simply too much noise in the logs. So think really hard about when you want to write an event to the event log.  Is the information in the log really worth generating?  Is there important information that a customer will want in those log entries?

Unless you have a way of uploading troublesome logs to be analyzed later (and I know that several enterprise management solutions do have such mechanisms), it’s not clear that there’s any value to generating log entries.

  • I think actionable messages should be given to the user by more noticeable means. The log is not a way of communicating with the user, it's a troubleshooting tool. And when you get to troubleshoot some weird misbehavior of the system, and you've got nothing but the logs, there is no such thing as "too much information". It may be more or less convenient to browse the logs, but it is up to the using good tools for log analysis.

  • Isn't that what the filter function is for? That's always the first thing I click, filter out information messages so I can look at the relevant warning/error/critical entries. Once something interesting has been identified that way, it makes sometimes sense to remove the filter and look at the information entries around the same time.

    The real issue with the event log is that nobody looks into it if they don't think they have a reason to look at it. Many quite important messages can get lost that way.

    What I would like is a program that shows an icon and bubble hint in the notification area if an warning or error entry gets added to the system log. But I haven't found anything like that yet. Guess I'll have to write one myself sooner or later.

  • Log in XML. Provide XSL that filters only error messages.

  • The problem is that you are trying to log information for two different groups of people.  Users generally only want to see error messages, maybe the occasional informational message.  Developers (or tech support) want to see everything (or at least more than error messages).  Since there is only one log sink it is inevitable that one of those groups is not going to be happy.  We ship a product that defaults to error-only logging.  The problem is that at this level there is not enough context to figure out what went wrong, so we almost always have to get the customer to turn up the log level and reproduce the issue to get usable log files.  The flipside is that the verbose logs are filled with so much stuff that it can be hard to figure out what is relevant for the issue you are tracking.

  • This post is just plain wrong. In a enterprise environment the events logs are a wealth of valuable information , that we use on a daily basis to debug all sorts of problems. Even the  "The <foo> service is stopping.  The <foo> service has entered the stopped state" entries are of great importance. However for the average Mr Joe , I agree , that the logs are of little use. But for enterprise , we need all the logging we can get.

  • This post is just plain wrong. In a enterprise environment the events logs are a wealth of valuable information , that we use on a daily basis to debug all sorts of problems. Even the  "The <foo> service is stopping.  The <foo> service has entered the stopped state" entries are of great importance. However for the average Mr Joe , I agree , that the logs are of little use. But for enterprise , we need all the logging we can get.

  • Kim: You're right and you're wrong.  I did call out that enterprise management environments have mechanisms to log this.  But the windows event log isn't the right place for such operational events.  The system and application logs should only be for actionable errors.  Other errors should go into component specific error logs where they can be enabled or disabled as needed.

  • I'll jump in here and make the claim that the writers/maintainers of the event viewer at Microsoft do not actually use the tool themselves.  If they did, it wouldn't be so difficult and un-useful to use, and would have improved it a bit in the course of 15 years.  These people could learn a lot of from how the interface for many of the NirSoft tools work.

  • mpbk: Actually the event viewer was totally redesigned in Windows Vista and is dramatically better than it was in XP and before.  If you haven't used Vista yet, you should try it.  

  • Unfortunately the mess in the event log is being leveraged by scammers who are calling up unsuspecting average (i.e. non-technical) PC users claiming to be from Microsoft/their ISP/whatever saying "we believe that your computer is causing a lot of errors on the internet. Let's look in the event log to see if that's the case". They then walk the customer through opening event viewer, use the resulting overwhelming amount of info (including both info, warnings and errors) is then used to convince said user to allow the scammer to remotely connect to their machine to "fix it". Or sell them a bogus product to "fix it". In either case maliciousness ensues. I hear from these people (victims) all the time. Have a peek at the comments here: ask-leo.com/what_is_the_event_viewer_and_should_i_care.html (it's an old article, but you can see people are finding it after they've been called), and here for one person's "transcript" of his experience: ask-leo.com/is_my_isp_calling_me_to_clear_up_my_problems_with_windows.html

    Leo

  • "If you haven't used Vista yet, you should try it" - You meant "try Windows 7", right? ;-)

  • Actually no I meant Vista.  The rewritten eventviewer came online in Vista, not Win7.

  • "Nobody ever reads the event logs…" <-- almost true

    "Nobody ever bothers reading event logs because there’s simply too much noise in the logs." <-- it's like saying nobody reads Wikipedia because it's too much information in there. Sometimes it's better to have more information than no information at all. More information/noise shouldn't stop users investigate or monitor issues. With a powerful search/filtering mechanism the impact of noise could be reduced considerably.

    The main problem I found related to logs is not knowing what to search for, which events are relevant, and which are not, and this especially when performing troubleshooting or monitoring.

  • I agree that the primary event logs (System and Application) contain way too much noise.  

    New features such as filtering help, but that assumes all events are logged with the right level (definitely not always the case).  These new features are also offset by the SLOW nature of the new event log interface.  On my 8 core primary workstation they still take significantly longer to interact with than an older system running XP.  I remember a (I think) PDC presentation by Mark Russinovich where he tried to demonstrate something in the event logs.  He hit Filter and waited... and waited... and waited... and eventually just moved on.  An inevitable price of progress, I suppose.

    I admit wondering if it wouldn't be better to advice individual programs to create their own event logs for their own purposes.  If Symantec and National Instruments had their own logs, for example, it would cut down on a lot of the noise.  System should be for the operating system and Application is just a bit bucket :)

  • In System management (I work with System Center Operations Manager) we face every day the issue of choosing the right events to act upon and what to ignore as simply noise... or more generally, the issue with inconsistent instrumentation that makes it hard to tell if an application is healthy or if it isn't... and more importantly *what to do* if it is not healthy.

    It's a tough problem to solve, as it essentially boils down to educate developers in moving away from "debug" logs (easy to write and good when you are testing your code) to reliable instrumentation to tell if the app is "behaving" correctly, which is what the sysadmin is worried about. I am NOT blaming the developers here - finger pointing is NOT solving anything! - they might not have the mindset for "monitoring" their application and that's why this type of instrumentation only improves over time when the application developers AND the monitoring guys work hand in hand for a few release cycles, IMHO.

    So, I second the "only a few useful and actionable events" in EVT movement I read in between Larry's lines. Everything else can be moved to ETW/ETL, for example.

Page 1 of 2 (19 items) 12