Techspeak for the socially diminished

I’ve been building a little SNMP Management Pack in the past few days to discover and monitor a bunch of PowerWare UPS’s, which turned out to take quite a lot more energy and time than expected. Mostly due to the facts that I am really bad with SNMP and how it works, I’ve never really looked into the inner working of building an SNMP management pack and also because we ran into a couple of errors preventing the discovery process to work alright.

To make it clear right away, this is not going to be a “Building an SNMP Management Pack Tutorial” since there’s plentiful good ones out there already, and to be extra helpful I’m gonna include a few links right away:

It’s the second, the NetApp one, I’ve used as a guide to building the UPS management pack since it goes through the process of building your own filtered discovery using SystemOID to identify your hardware-classes and then building the monitors on top of those.

Let’s get to it

When building the discovery of my hardware classes I ran into problems. The discovery simply did not work. At first I got some strange errors about “invalid queries”, something that turned out to be related to me reading two guides–seriously though, pick one guide that is closest to what you want to achieve and stick to it–and mixing up the XPathQuery variables. Silly me.
I got those errors to go away and I was able to get a few objects to my base-class, but none of the hardware classes who was populated through the return value of an SNMP OID got discovered.
The only error I got this time was the following:

Log Name:      Operations Manager
Source:        Health Service Modules
Date:          2010-09-02 11:19:12
Event ID:      11001
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      CENSORED
Description:
Error sending an SNMP GET message to IP Address XX.XX.XX.XX, Community String:=CENSORED, Status 0x6c.

One or more workflows were affected by this.

Workflow name: CENSORED.MP.CLASS.DISCOVERY
Instance name: CENSORED_DEVICENAME
Instance ID: {5C7EFB30-D885-8843-0DD7-EA86B4FD2311}
Management group: CENSORED
I went through all the other logical steps of troubleshooting an error like that which include double-checking firewall settings, OIDs, IP-addresses, allowed hosts and so forth. It wasn’t until I loaded the PowerMIB into a MIB Browser installed on the proxy machine (in this case a Management Server) I realized that there was no problem sending an SNMP GET to the UPS from that server. I launched Wireshark and had it listen to SNMP traffic between the UPS and the Management Server. The thing that struck me right-away was the fact that I could see the a bunch of “SNMP Get-Request” but no “SNMP Get-Response” which means that Operations Manager did send an SNMP GET but there was no response.
After a bit of intense staring i noticed what you see in the screenshot.

SNMP Error in Wireshark

For some reason Operations Manager does not care about what SNMP version you configure when you do the initial discovery of a network device. Even if you do specify SNMP v1, you probes may very well be using SNMP v2c instead and in many cases that will result in these SNMP GET errors in the Operations Manager event log.
To avoid this, you haves to specify which SNMP version to use in your System.SnmpProbe according to the information provided here: http://msdn.microsoft.com/en-us/library/ee809331.aspx
Since I am such a nice guy, here’s an example of the working probe with the added line highlighted.
<IsWriteAction>false</IsWriteAction>
<IP>$Config/IP$</IP>
<CommunityString>$Config/CommunityString$</CommunityString>
<Version>1</Version>
<SnmpVarBinds>
	<SnmpVarBind>
		<OID>1.3.6.1.4.1.534.1.1.1.0</OID>
		<Syntax>0</Syntax>
		<Value VariantType="8"></Value>
	</SnmpVarBind>
	<SnmpVarBind>
		<OID>1.3.6.1.4.1.534.1.1.2.0</OID>
		<Syntax>0</Syntax>
		<Value VariantType="8"></Value>
	</SnmpVarBind>
	<SnmpVarBind>
		<OID>1.3.6.1.4.1.534.1.1.3.0</OID>
		<Syntax>0</Syntax>
		<Value VariantType="8"></Value>
	</SnmpVarBind>
</SnmpVarBinds>

That’s it. Working perfectly now.

Best of luck to you too.

Getting “ESENT Kerys are required to install this application” when you are trying to modify/change an agent installation?

image

This seems to be  most common on Windows 2008 and i guess it’s because of the AUC and the fact that opening the Control Panel isn’t running in administrative mode.

To work around this you need to run the msiexec command on the correct installation GUID from an administrative command prompt.

Besides running through the registry to find the GUID, one of the easier ways is this:

  1. Open an administrative command prompt.
  2. run wmic product
  3. Locate your product by its name, the GUID (looks a bit like this {25097770-2B1F-49F6-AB9D-1C708B96262A}) directly after that is the one you want. Copy it.
  4. run msiexec /i <PASTEYOURGUIDHERE>
  5. Modify the agent as pleased

That’s pretty much it. Good luck.

What do you do when you cannot delete a file or folder on a windows server?

Check the file permissions! And if that doesn’t help?

Check the share permissions! Yes, if it is a shared folder. And if that doesn’t help?

Check the file ownership! Great! But then what?

Well, the file could be in use, and then you would have to shut the locking process down and perhaps kick a user out. In a really bad scenario it could also be a symptom of a broken filesystem, a reserved filename (like “lpt1” or “PRN”) or even an invalid name (silly things like a space in the beginning or the end of a filename).
Another possible reason could actually be that the path to the file or folder is too long. You won’t actually get an error telling you that the filepath exceeds the 255 characters Windows can handle but a simple “Acces Denied”.

There are some, more or less tedious, work-arounds for the problem. Like renaming, starting from the root, all the directories to shorter ones or using the old DOS (8.3, like “dokume~1.doc”) names that windows can auto-generate for you. Personally, I have two favourite ways of handling this.

  1. Map the parent-directory of the file/folder you are trying to access/delete as a network drive and access your files that way.
    This is particularly useful if the folder you are trying to access a DFS-share or perhaps a share on the central fileserver filepaths like “\\servername01\Central Projects\Central Services\IT Department\Develop Methods for Automatically Deploying New Central Servers\2.2.1 Auto-Deploying SQL-Server 2005 Cluster\Documents\Preparations\Whitepapers\SQL Server 2005 Failover Clustering White Paper.doc”
  2. Create a new share to a folder further down the hierarchy. This works locally too if you are logged on to, say, SRV01, you create a new share on “D:\Fileshares\Central Projects\Central Services\IT Department\Develop Methods for Automatically Deploying New Central Servers\” called “Autodeploymethods” and access it from “\\SRV01\Autodeploymethods\”. That way the filepath doesn’t exceed 255 characters.

Now. When designing fileservers, you really should think about how deep the filepaths may get. This is especially true on DFS-shares since you might have to deal with the full FQDN too, and not only the actual folder structure. Many big corporations I know uses “codes” for departments and assign a project ID (quite simply a number or maybe an abbreviation) to each project and uses theese for the fileshares too. Another scenario that could lead to similar problems are intranet sites where users can create and manage their own subsites and where filenames and folders are not stored in a database.

I have only seen this phenomena on Windows systems so far, and I’ve actually used a linux Live-CD on occasion when admin access is denied.

Read More:
http://support.microsoft.com/kb/320081

I’ve wrestled a bit with a critical status on one of the Organization States at a clients site that wont go back to green despite all the underlying monitors have gone back to green. And apparently I am not alone on this one. Others, like me, has read and re-read the MP-guide i search for a monitor/rule/discovery for overrides forgotten, and I don’t know how many times I’ve made a small change and tried resetting the health once again. Anyhow.
Marius Sutara posted an answer on TechNet forums last week with a “fix” (-ish), or rather the acknowledgement that the problem is not a 40c. The problem might be related to other MP as well, but I’ve only seen it on the new Exchange MP so far. In that same post, Pete Zerger provided some links to two nifty little tools that will help you reset the health of the monitor.

In case you wonder why on earth I post when there’s allready a “solution” out there; Pagerank, baby!
Not for me, but for the forum post making it show up earlier on google.

Just wanted to raise a word of caution about the TCP Port Check in Operations Manager 2007.

Some customers have notices the the system-logs on some Unix machines are completely swamped with “connection error”, “TCP Connect failed”, “TCP Session Lost” and similar and after a bit och research the problematic servers were narrowed down to those monitored by Operations Manager. Specifically, those who are targeted by a TCP Port Check.

It would seem like the TCP-connection never fully initializes on the target server. Kind of like knocking on your neighbours door and then hiding. Then when the door opens, no one is there.

Maybe there’s a setting somewhere to modify how “deep” a Port Check should go before closing. Perhaps fully initializing and then sending a proper “Close” instead of just cutting the connection. In a few extreme cases we have noticed that the target server even goes so far as to start a session, but never ending it since there’s no closure and finally having no sessions to spare for the real users. But on most servers it’s just an annoyance since the “real” errors is very hard to be found in all the connection related logs.

Anyway. Just a good thing to keep in mind when running TCP Port Checks from Operations Manager 2007. Keep an eye on the logs when implementing the port checks.

UPDATE: This problem seems to be fixes in the latest update!

The MSMQ Management Pack seems to have a few problems with it’s discovery script that can lead to the following error showing up in the logs:

The process started at 13:34:40 failed to create System.Discovery.Data. Errors found in output:

C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 49\9788\DiscoverQueues.vbs(107, 4) Microsoft VBScript runtime error: Subscript out of range: '[number: 0]'

Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DiscoverQueues.vbs" {615D37C9-477D-62E2-0833-6ECBF0E89A87} {A176AC83-CC31-01C3-5DE9-E2DFF64E7CC7} "MASKED.server.fqdn" "MSMQ" "true" "true" "False" "false"
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 49\9788\

One or more workflows were affected by this.

Workflow name: Microsoft.MSMQ.2003.DiscoverQueues

Instance name: MASKED.server.fqdn

Instance ID: {A176AC83-CC31-01C3-5DE9-E2DFF64E7CC7}

Management group: MASKED

This seems to be related to the discovery of public queues on some servers that has none. One quick fix, or rather work-around, is to override the discovery on these servers to set DiscoverPublic to False.
Screenshot of Override

I have seen this error popping up every now and then at multiple customer sites and haven’t really been able to solve it yet. It does not look like I am alone either.
The error message usually looks like this:

Error doing IIS Discovery

Error: 0x80070002
Details: The system cannot find the file specified.

One or more workflows were affected by this. 

Workflow name: Microsoft.Windows.InternetInformationServices.2003.DiscoverBase
Instance name: Microsoft.Windows.InternetInformationServices.2003.ServerRole
Instance ID: {A81E4808-4D05-9BFE-4043-DC668527F2D0}
Management group: MASKED

Or…

Error doing IIS Discovery

Error: 0x80070006
Details: The handle is invalid.

One or more workflows were affected by this. 

Workflow name: Microsoft.Windows.InternetInformationServices.2000.DiscoverWebSites26to50
Instance name: IIS Web Server
Instance ID: {D36DA76A-027F-8F3E-4160-115279A1E23A}
Management group: MASKED

I have been trying to figure out what file is missing and/or if the “invalid handle” is related. Possibly a file-handle? Could be but not neccesary since these two errors occur on different servers with increasing repeat-count (atleast once-a-day). The IIS MP does call the IIS*.VBS Scripts in %windir%\System32 but as far as I can tell, on the systems I have tried it on, the scritps return valid data. This does by no means mean that there is no error and evidently I am missing something. But what? Does anyone have a clue to this?

References and other victims:

And no, neither of these provides even a hint to a working solution.

Here’s my summary of the problems with the NetworkAdapterCheck.vbs script in the Windows Server 2000 Operating System Management Pack för Operations Manager 2007 that is causing the failed to create System.PropertyBagData error i wrote about earlier.
This information in also available on https://connect.microsoft.com/feedback/ViewFeedback.aspx?FeedbackID=432627&SiteID=446

Symptoms

This “research” comes from getting an obscene amounts of Script or Executable Failed to run in the Operations Console. Each time it was the NetworkAdapterCheck.vbs script that could not create PropertyBagData. The error message copied from one of the alerts looks like this:

The process started at 14:29:26 failed to create System.PropertyBagData, no errors detected in the output. The process exited with 0

Command executed: "C:WINNTsystem32cscript.exe" /nologo "NetworkAdapterCheck.vbs" MASKEDCOMPUTERNAME 0 false true false
Working Directory: C:Program FilesSystem Center Operations Manager 2007Health Service StateMonitoring Host Temporary Files 2882781

One or more workflows were affected by this.

Workflow name: Microsoft.Windows.Server.2000.NetworkAdapter.NetworkAdapterConnectionHealth
Instance name: 0
Instance ID: {F4C478D3-38E5-8C29-3957-E3B7F486216E}
Management group: MASKED

This error repeats almost as often as the script is scheduled to run and appears on almost every Windows 2000 server.

Read more…

On several systems at different sites I’ve noticed an abundance of the following errors in the Operations Manager 2007 logs:

Event Type:    Warning
Event Source:    Health Service Modules
Event Category:    None
Event ID:    21405
Date:        2009-04-21
Time:        09:00:26
User:        N/A
Computer:    MASKED
Description:
The process started at 09:00:26 failed to create System.PropertyBagData, no errors detected in the output.  The process exited with 0</pre>
Command executed:    "C:\WINNT\system32\cscript.exe" /nologo "NetworkAdapterCheck.vbs" MASKEDCOMPUTER.DOMAIN 7 false true false Working Directory:    C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 1\50958\

One or more workflows were affected by this.

Workflow name: Microsoft.Windows.Server.2000.NetworkAdapter.NetworkAdapterConnectionHealth
Instance name: 7
Instance ID: {A6F89C78-1217-578E-B03D-5ED377A9A40B}
Management group: MASKED

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Always the “NetworkAdapterCheck.vbs”. Always “The process exited with 0″ which usually means that life is A-OK.

This may be a result of your AV-software blocking the scripts.
The first thing to do would be to exclude HealthService.exe and MonitoringHost.exe. If this doesn’t work, try excluding the entire C:\Program Files\System Center Operations Manager 2007\Health Service State\ directory.

There is a bug-report on Microsoft Connect that you can add your vote to if you cannot get rid of the problem. But ofcourse, if you have a solution, do post it here or on the bug-report. :D