Aggregating messages and removing duplicates in a BizTalk Map#
Aggregating messages is a fairly common task in BizTalk.
By "aggregating" I mean taking two separate messages with repeating elements and combining them into a new message which contains the elements of both messages - the same as doing a Union in SQL.

However, what if you want to remove duplicates?
It's not as easy as it seems, and in truth the only way I have found to do this is via custom XSLT.

Combining two messages
This is actually fairly easy: you use a single Looping functoid, with two inputs and one output.
You can then either link the elements, or use the Mass Copy functoid to copy the element data across:
Standard BizTalk Aggregation Map which allows duplicates (Click to enlarge)

So if I had these two messages:

Message 1:
<ns0:Employees xmlns:ns0="http://TestIndexMap.Employees">
  <Employee firstName="Karin" lastName="Smith" dept="Managers" empNumber="100" />
  <Employee firstName="Daniel" lastName="Smith" dept="Staff" empNumber="101" />
</ns0:Employees>

Message 2:
<ns0:Employees xmlns:ns0="http://TestIndexMap.Employees">
  <Employee firstName="Heidi" lastName="Klum" dept="Models" empNumber="200" />
  <Employee firstName="Elle" lastName="MacPherson" dept="Models" empNumber="201" />
  <Employee firstName="Daniel" lastName="Smith" dept="Staff" empNumber="101" />
  <Employee firstName="Naomi" lastName="Campbell" dept="Models" empNumber="203" />
</ns0:Employees>

I would end up with this message:
<ns0:Employees xmlns:ns0="http://TestIndexMap.Employees">
  <Employee firstName="Karin" lastName="Smith" dept="Managers" empNumber="100" />
  <Employee firstName="Daniel" lastName="Smith" dept="Staff" empNumber="101" />
  <Employee firstName="Heidi" lastName="Klum" dept="Models" empNumber="200" />
  <Employee firstName="Elle" lastName="MacPherson" dept="Models" empNumber="201" />
  <Employee firstName="Daniel" lastName="Smith" dept="Staff" empNumber="101" />
  <Employee firstName="Naomi" lastName="Campbell" dept="Models" empNumber="203" />
</ns0:Employees>
Note that the Employee with empNumber 101 is repeated.

Removing Duplicates
What if I wanted to remove duplicates from the messages?
i.e. instead of the above Combined Message, suppose that I wanted this:
<ns0:Employees xmlns:ns0="http://TestIndexMap.Employees">
  <Employee firstName="Karin" lastName="Smith" dept="Managers" empNumber="100" />
  <Employee firstName="Daniel" lastName="Smith" dept="Staff" empNumber="101" />
  <Employee firstName="Heidi" lastName="Klum" dept="Models" empNumber="200" />
  <Employee firstName="Elle" lastName="MacPherson" dept="Models" empNumber="201" />
  <Employee firstName="Naomi" lastName="Campbell" dept="Models" empNumber="203" />
</ns0:Employees>

When you use the Looping Functoid with two inputs, you will end up with two separate <xsl:for-each> loops in the XSLT.
The XSLT for the above map looks like this:
<?xml version="1.0" encoding="UTF-16"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var" exclude-result-prefixes="msxsl var s0" version="1.0" xmlns:s0="http://schemas.microsoft.com/BizTalk/2003/aggschema" xmlns:ns0="http://TestIndexMap.Employees">
  <xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
  <xsl:template match="/">
    <xsl:apply-templates select="/s0:Root" />
  </xsl:template>
  <xsl:template match="/s0:Root">
    <ns0:Employees>
      <xsl:for-each select="InputMessagePart_0/ns0:Employees/Employee">
        <Employee>
          <xsl:copy-of select="./@*" />
          <xsl:copy-of select="./*" />
        </Employee>
      </xsl:for-each>
      <xsl:for-each select="InputMessagePart_1/ns0:Employees/Employee">
        <Employee>
          <xsl:copy-of select="./@*" />
          <xsl:copy-of select="./*" />
        </Employee>
      </xsl:for-each>
    </ns0:Employees>
  </xsl:template>
</xsl:stylesheet>

What you need to do is put some sort of condition over one of the loops that says "only copy the current item if it doesn't exist in the other message".

It's this "if it doesn't exist in the other message" that can be tricky.
If you use an XPath statement, then you incur the penalty of a full document scan each time you iterate through the loop.
Depending on the size of your messages, this can be costly.

The best way of doing it would be to build an index of unique IDs (i.e. primary key values!) you can use to check if the item exists.
Luckily, there's a dedicated XSLT function for this: the <xsl:key> element.

This builds an index of items which you can search.
And it's very very fast.
You can read more about it here and here
(Unfortunately, there's no functoid for this element).

Expanding the above sample, we will use the empNumber attribute as our unique ID.
The pseudo code for the map will be:
  1. Build an index of all the empNumber attributes in Message 1
  2. Loop through all the items in Message 1, and copy them all to the Combined Message
  3. Loop through the items in Message 2: if there is no empNumber in our index which matches the current empNumber, then copy the item across
The XSLT for this is:
(note: in order to get the base XSLT, I created the above map with a single looping functoid and two mass copy functoids, and exported the XSLT using the Validate Map command. I then modified this XSLT file, and set the Custom XSL Path property on the map to point to my modified file)
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var" exclude-result-prefixes="msxsl var s0" version="1.0" xmlns:s0="http://schemas.microsoft.com/BizTalk/2003/aggschema" xmlns:ns0="http://TestIndexMap.Employees">
  <xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
  <!-- This next line generates an index of the empNumber values in the first message-->
  <xsl:key name="duplicates" match="InputMessagePart_0/ns0:Employees/Employee" use="@empNumber"/>
  <xsl:template match="/">
    <xsl:apply-templates select="/s0:Root" />
  </xsl:template>
  <xsl:template match="/s0:Root">
    <ns0:Employees>
      <!-- Loop through the Employee elements in the first message -->
      <xsl:for-each select="InputMessagePart_0/ns0:Employees/Employee">
        <Employee>
          <!-- Copy across all elements and attributes in this element -->
          <xsl:copy-of select="./@*" />
          <xsl:copy-of select="./*" />
        </Employee>
      </xsl:for-each>
      <!-- Loop through the Employee elements in the second message -->
      <xsl:for-each select="InputMessagePart_1/ns0:Employees/Employee">
              <!-- We query the index to see if there is an Employee element with
              this empNumber value in the first message.
              If not, then we copy across this Employee element -->
              <xsl:if test="count(key('duplicates', @empNumber)) = 0">
                 <Employee>
                     <!-- Copy across all elements and attributes in this element -->
                     <xsl:copy-of select="./@*" />
                     <xsl:copy-of select="./*" />
                 </Employee>
        </xsl:if>
      </xsl:for-each>
    </ns0:Employees>
  </xsl:template>
</xsl:stylesheet>

The <xsl:key> element takes three parameters:
<xsl:key name="" match="" use="">

name is a unique name for this index (can be anything you want)
match is the XPath to the element you want to create an index of (and is relative to the input message)
use is the XPath to a value on the element (in match) that you want to search on.

(match and use are actually more powerful than I've described, but that's beyond the scope of this post - see the links above for further reading on what you can do with these parameters).

So in my example, match points to the Employee element (i.e. I will create an index of Employee elements), and use is the empNumber attribute (as this is what I want to search on).
In fact, the <xsl:key> element is very powerful, and you can use it to create quite complicated indexes.

To perform a lookup in the index, you use the key() function.
This function takes two parameters: key (name, value)
name is the name of the index to use
value is the value to lookup in the index (i.e. the value referred to in the use parameter)

I've put together a sample solution which shows how this works.
You can download it here:    TestAggregateMaps Solution.zip (25.78 KB)

Wednesday, May 21, 2008 2:42:26 PM (GMT Daylight Time, UTC+01:00) #    Comments [1]  |  Trackback

 

Monday, September 15, 2008 5:33:48 PM (GMT Daylight Time, UTC+01:00)
unable to delete duplicates using the given xslt. please can you send me the solution.
ram
Comments are closed.
All content © 2020, Daniel Probert
On this page
This site
Calendar
<January 2020>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678
Archives
Sitemap
Blogroll OPML
Disclaimer

Powered by: newtelligence dasBlog 2.3.12105.0

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Send mail to the author(s) E-mail

Theme design by Jelle Druyts


Pick a theme: