The ESB Toolkit is great. However, it's not without its fair
share of bugs. I've been meaning to blog about this since I first came across this
bug a year or so ago, but it's taken me a while.
If you use the ExceptionManagementDb functionality,
you're probably familiar with putting something like this in your exception
handling logic in an orchestration:
FaultMessage
= Microsoft.Practices.ESB.ExceptionHandling.ExceptionMgmt.CreateFaultMessage();
//
Set properties
FaultMessage.FailureCategory
= "Error";
FaultMessage.FaultCode = "1";
FaultMessage.FaultDescription = ex.Message
FaultMessage.FaultSeverity
= Microsoft.Practices.ESB.ExceptionHandling.FaultSeverity.Critical;
//
Add original request
Microsoft.Practices.ESB.ExceptionHandling.ExceptionMgmt.AddMessage(FaultMessage,
RequestMessage);
One of the more difficult bugs is the "endless loop"
bug you get when you use the above code either:
a) outside of an exception handler
OR
b) when catching a BizTalk exception (i.e. any exception
which inherits from Microsoft.XLANGs.BaseTypes.XLANGsException)
The first case is documented here
whilst the second is something I came across the first time we hit an EmptyPartException in a catch block.
In these cases (and possibly others) then you find that the
Host Instance running the orchestration ends up pegging the CPU at 100% and
stops responding: your code effectively stops at the CreateFaultMessage statement.
The reason for this is that there is a bug in the ESB code
that causes an endless loop to occur in certain scenarios. At the point the bug
occurs, the code walks through the exception context segments in the
orchestration, attempting to find the last exception. It's supposed to keep
looping until it finds the last exception segment. Instead, in certain
situations, it never exits the loop.
I found this out by decompiling the Microsoft.Practices.ESB.ExceptionHandling.dll assembly using .NET
Reflector ILSpy.
I recompiled the assembly and then traced through the code
as it executed.
The problem lies in a method called GetServiceXlangInfo().
The actual code that causes the problem is highlighted in yellow
below:
try
{
int exceptionSegmentIndex = segmentIndex;
object successorSegment = null;
exception = null;
while (exception == null
&& exceptionSegmentIndex > -1)
{
exception
= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._exception;
if (exception == null)
{
object successorSegment = Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._successorSegment;
if
(successorSegment == null)
{
break;
}
exceptionSegmentIndex
= (int)successorSegment;
}
}
Looking carefully, you can see that a while loop is entered,
which will only exit if (successorSegment == null) or if an exception is found.
However, the logic is faulty: what's supposed to happen is
that starting at the current segment in the orchestration it looks for the exception
object. If it doesn't find it, then it moves down the segments looking for the exception
object, and then exits the loop when it finds the exception object, or if there
are no more segments to search.
However it appears that if you're not in an exception
handler, or you're catching an exception that inherits from XLANGsBaseException,
then you end up with a situation where not only is no exception object found,
but where successorSegment is always equal to the current segment i.e.
you stop moving down the tree of segments, and just stay iterating over the same
segment, never finding the exception.
The fix is to break out of the loop if no successor segment
is found OR if the current segment is the same as the successor segment.
i.e. replace the line: if (successorSegment
== null)
with this: if ((successorSegment == null) || (exceptionSegmentIndex == (int)successorSegment))
Whilst you're at it, you may as well also check for
out-of-bound indexers, as I can foresee other bugs arising. The entire bit of
code to replace would therefore look like this:
try
{
int exceptionSegmentIndex = segmentIndex;
object successorSegment = null;
exception
= null;
while ((exception == null)
&& (exceptionSegmentIndex > -1))
{
// FIX: Added code to check if exceptionSegmentIndex is out
of bounds before using it as an indexer
if (exceptionSegmentIndex <= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments.Length)
{
exception
= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._exception;
}
if (exception == null)
{
// FIX: Added code to check if exceptionSegmentIndex is out
of bounds before using it as an indexer
if (exceptionSegmentIndex <= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments.Length)
{
successorSegment
= Context.RootService.RootContext.__MyService._stateMgrs[index].__MyService._segments[exceptionSegmentIndex].ExceptionContext._successorSegment;
}
// FIX: Fixes the endless loop error that happens
occasionally
if ((successorSegment == null)
|| (exceptionSegmentIndex == (int)successorSegment))
{
break;
}
exceptionSegmentIndex
= (int) successorSegment;
}
}
This isn't a simple issue for an end-user to fix as Microsoft
doesn't supply you with the source code for this assembly (although the source
for much else of the ESB toolkit is supplied).
The solution is to pull the CreateFaultMessage() and GetServiceXlangInfo()
methods (and any other methods/member variables required) into your own class, and
then call your fixed version of the CreateFaultMessage() method. I'm unsure
how much Microsoft would frown upon this, but if it fixes a production issue
then I don't see many other choices.
I was hoping that the toolkit v2.1 would fix this bug, but
it hasn't – here's hoping a future release will.
I appreciate that the current thinking is not to use this code outside of an exception block - but if you do, it shouldn't bring BizTalk to its knees!
In the meantime, I logged a bug report on this with
Microsoft, although from what I can see they're aware of the issue.