Improving the ESB Toolkit: Fixing the "endless loop" bug when creating Fault messages#

The ESB Toolkit is great. However, it's not without its fair share of bugs. I've been meaning to blog about this since I first came across this bug a year or so ago, but it's taken me a while.

If you use the ExceptionManagementDb functionality, you're probably familiar with putting something like this in your exception handling logic in an orchestration:

FaultMessage = Microsoft.Practices.ESB.ExceptionHandling.ExceptionMgmt.CreateFaultMessage();

// Set properties

FaultMessage.FailureCategory = "Error";

FaultMessage.FaultCode = "1";

FaultMessage.FaultDescription = ex.Message

FaultMessage.FaultSeverity = Microsoft.Practices.ESB.ExceptionHandling.FaultSeverity.Critical;

 

// Add original request

Microsoft.Practices.ESB.ExceptionHandling.ExceptionMgmt.AddMessage(FaultMessage, RequestMessage);

One of the more difficult bugs is the "endless loop" bug you get when you use the above code either:

a) outside of an exception handler

OR

b) when catching a BizTalk exception (i.e. any exception which inherits from Microsoft.XLANGs.BaseTypes.XLANGsException)

 

The first case is documented here whilst the second is something I came across the first time we hit an EmptyPartException in a catch block.

In these cases (and possibly others) then you find that the Host Instance running the orchestration ends up pegging the CPU at 100% and stops responding: your code effectively stops at the CreateFaultMessage statement.

The reason for this is that there is a bug in the ESB code that causes an endless loop to occur in certain scenarios. At the point the bug occurs, the code walks through the exception context segments in the orchestration, attempting to find the last exception. It's supposed to keep looping until it finds the last exception segment. Instead, in certain situations, it never exits the loop.

I found this out by decompiling the Microsoft.Practices.ESB.ExceptionHandling.dll assembly using .NET Reflector ILSpy.

I recompiled the assembly and then traced through the code as it executed.

The problem lies in a method called GetServiceXlangInfo().

The actual code that causes the problem is highlighted in yellow below:

try

{

   int exceptionSegmentIndex = segmentIndex;

   object successorSegment = null;

   exception = null;

   while (exception == null && exceptionSegmentIndex > -1)

   {

     exception = Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._exception;

     if (exception == null)

     {

        object successorSegment = Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._successorSegment;

        if (successorSegment == null)

        {

           break;

        }

        exceptionSegmentIndex = (int)successorSegment;

     }

   }

Looking carefully, you can see that a while loop is entered, which will only exit if (successorSegment == null) or if an exception is found.

However, the logic is faulty: what's supposed to happen is that starting at the current segment in the orchestration it looks for the exception object. If it doesn't find it, then it moves down the segments looking for the exception object, and then exits the loop when it finds the exception object, or if there are no more segments to search.

However it appears that if you're not in an exception handler, or you're catching an exception that inherits from XLANGsBaseException, then you end up with a situation where not only is no exception object found, but where successorSegment is always equal to the current segment i.e. you stop moving down the tree of segments, and just stay iterating over the same segment, never finding the exception.

The fix is to break out of the loop if no successor segment is found OR if the current segment is the same as the successor segment.

i.e. replace the line: if (successorSegment == null)

with this: if ((successorSegment == null) || (exceptionSegmentIndex == (int)successorSegment))

Whilst you're at it, you may as well also check for out-of-bound indexers, as I can foresee other bugs arising. The entire bit of code to replace would therefore look like this:

try

{

   int exceptionSegmentIndex = segmentIndex;

   object successorSegment = null;

   exception = null;

   while ((exception == null) && (exceptionSegmentIndex > -1))

   {

     // FIX: Added code to check if exceptionSegmentIndex is out of bounds before using it as an indexer

     if (exceptionSegmentIndex <= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments.Length)

     {

        exception = Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._exception;

     }

     if (exception == null)

     {

        // FIX: Added code to check if exceptionSegmentIndex is out of bounds before using it as an indexer

        if (exceptionSegmentIndex <= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments.Length)

        {

           successorSegment = Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._successorSegment;

        }

        // FIX: Fixes the endless loop error that happens occasionally

        if ((successorSegment == null) || (exceptionSegmentIndex == (int)successorSegment))

        {

           break;

        }

        exceptionSegmentIndex = (int) successorSegment;

     }

   }

This isn't a simple issue for an end-user to fix as Microsoft doesn't supply you with the source code for this assembly (although the source for much else of the ESB toolkit is supplied).

The solution is to pull the CreateFaultMessage() and GetServiceXlangInfo() methods (and any other methods/member variables required) into your own class, and then call your fixed version of the CreateFaultMessage() method. I'm unsure how much Microsoft would frown upon this, but if it fixes a production issue then I don't see many other choices.

I was hoping that the toolkit v2.1 would fix this bug, but it hasn't – here's hoping a future release will.

I appreciate that the current thinking is not to use this code outside of an exception block - but if you do, it shouldn't bring BizTalk to its knees!

In the meantime, I logged a bug report on this with Microsoft, although from what I can see they're aware of the issue.

Friday, May 06, 2011 1:03:10 PM (GMT Daylight Time, UTC+01:00) #    Comments [1]  |  Trackback

 

Tuesday, August 23, 2011 7:04:14 AM (GMT Daylight Time, UTC+01:00)
Hi Daniel,

You're legend! Hopes microsoft would release a patch to address this. In the mean while, I am sticking with your suggestion to call my own CreateFaultMessage() method to address this.

Keep it up!
Andy
Andy Dang
Comments are closed.
All content © 2019, Daniel Probert
On this page
This site
Calendar
<July 2019>
SunMonTueWedThuFriSat
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910
Archives
Sitemap
Blogroll OPML
Disclaimer

Powered by: newtelligence dasBlog 2.3.12105.0

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Send mail to the author(s) E-mail

Theme design by Jelle Druyts


Pick a theme: