It's all about the answers!

Ask a question

How does Find Similar Work Items filtering work?

Erin Schnabel (6667) | asked Mar 25 '14, 11:48 a.m.
 We're trying to use the "find similar work items" function (the action available from a failing test in a build result) in automation to identify known issues. What we have found is that (sometimes) the results are completely unpredictable.

We have determined (from experimentation) that munging the subject line will reduce the % of the match, i.e. if you have a defect summary like this: "Test Failure (xxxxxxxx-xxxx): com.whatever.package.Test", it will usually match 100% with the same failure on a subsequent build (which is good). If you add text to the summary (fore or aft), it will reduce the match. This makes sense. However, over the past few days, we've seen a case where the match was 100% for a closed item, but an identical item (Identical summary sans actual build label, and identical description) that was open only matched 86%. We have no idea why this would be. We opened the latter item because the former had been closed as fixed. The problem re-occurred, so we created the new item, but the RTC "find similar work items" is not matching the new item. Then, oddly, another item was opened for the same problem (which was a duplicate).. THAT item also matched 100%, superceding the first one, but bypassing the second one. ALL THREE have identical summary lines (aside from the build label), and identical leading description text.

What the heck? Can anyone shed some light on how the "find similar work items" algorithm works so we can do the right thing to ensure accurate matching for known issues?

M Holder commented Mar 25 '14, 12:03 p.m. | edited Mar 25 '14, 12:29 p.m.

What was even stranger was initially when we had 2 work items A & B then they had the following matches:
A:100% (but closed)
I then raised item C - by right clicking on test failure and creating a work item and then had these matches:
Then when I cancelled C as a duplicate of B:
A: 56%
B: 52%
C: 100%

All 3 defects had the same summary (excluding build id) and identical descriptions - auto generated. So I don't know why they've different matches but I really don't know why creating a new defect would change the match factor of the others.

One answer

permanent link
Eric Jodet (6.3k5111120) | answered Apr 01 '14, 8:03 a.m.
When the work item save is complete the work item summary, description and Comments are passed to the lucene for indexing.
When there is a call to find out the potential duplicates,
the Summary, Description and comments of the work item for which duplicates needs to be found is sent to search in the lucene index.

So (full) text search criteria are:
- summary
- description
- comments

Hope it helps,

Your answer

Register or to post your answer.