Exclude last occurance of duplicate nodes. Duplicates nodes are nodes with multiple attributes with same values

I have been trying to write an XPath/XSLT for my problem of detecting and eliminating duplicate nodes. In my case, duplicate nodes are nodes with multiple attributes with same values. The way I want to eliminate duplicate is by excluding the last occurrence of the duplicate node. Please advice if there is any other method.

Pls Note: Duplicate nodes = Nodes with same values of operator1, operator2 and operator3 attributes.

XML:

<data id = "root">
  <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/>
  <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="8" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/>
</data>

Output I need:

<data id = "root">
  <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>

I closest I have come is with this Xpath, but it doesn't work exactly.

"//record[(./@operator1 = following-sibling::record/@operator1) and (./@operator2 = following-sibling::record/@operator2) and (./@operator3 = following-sibling::record/@operator3)]".

I have searched the whole internet but without any luck. Any help is really really appreciated. Thanks alot.

Answers


I have been trying to write an XPath/XSLT for my problem of detecting and eliminating duplicate nodes. In my case, duplicate nodes are nodes with multiple attributes with same values. The way I want to eliminate duplicate is by excluding the last occurrence of the duplicate node. Please advice if there is any other method.

Pls Note: Duplicate nodes = Nodes with same values of operator1, operator2 and operator3 attributes.

This is a conflicting definition of duplicate nodes elimination.

You are not eliminating duplicate nodes by just removing the last of a sequence of duplicates. Your desired result:

<data id = "root">
    <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>

still contains duplicates such as records with id (1, 4, and 6) and (2, 3, and 7)

Proper duplicates elimination, also called deduplication, requires to leave only one item from all duplicate items. This is traditionally accomplished in XSLT 1.0 by using the Muenchian method for grouping:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kRecByAtts" match="record"
  use="concat(@operator1,'***',
              @operator2,'***',
              @operator3)"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "record
      [not(generate-id()
      =
       generate-id(key('kRecByAtts',
                       concat(@operator1,'***',
                              @operator2,'***',
                              @operator3)
                       )[1]
                   )
            )
       ]
  "/>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<data id = "root">
    <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>

the wanted, correct result is produced:

<data id="root">
   <record id="1" operator1="xxx" operator2="yyy" operator3="zzz"/>
   <record id="2" operator1="abc" operator2="yyy" operator3="zzz"/>
</data>

Here is an example XSLT 1.0 stylesheet that eliminates the last of duplicate 'record' elements:

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:key name="k1" match="record" 
           use="concat(@operator1, '|', @operator2, '|', @operator3)"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="record[generate-id() 
                              = 
                              generate-id(key('k1', 
                                              concat(@operator1, '|', 
                                                     @operator2, '|', 
                                                     @operator3))[last()])]"/>

</xsl:stylesheet>

Need Your Help

Should I Refactor This Code?

c# asp.net refactoring

I am working on modifying a control on a existing site. All controls from the site inherit form a base class. I have a requirement to hide several links on the master page so I wrote this method ...

Row sums of matrix over/under diagonal

r matrix diagonal rowsum

I want to calculate row/col sums for upper/lower triangle matrix (with diagonal). Example:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.