Jazz Forum Welcome to the Jazz Community Forum Connect and collaborate with IBM Engineering experts and users

How does Find Similar Work Items filtering work?

 We're trying to use the "find similar work items" function (the action available from a failing test in a build result) in automation to identify known issues. What we have found is that (sometimes) the results are completely unpredictable.

We have determined (from experimentation) that munging the subject line will reduce the % of the match, i.e. if you have a defect summary like this: "Test Failure (xxxxxxxx-xxxx): com.whatever.package.Test", it will usually match 100% with the same failure on a subsequent build (which is good). If you add text to the summary (fore or aft), it will reduce the match. This makes sense. However, over the past few days, we've seen a case where the match was 100% for a closed item, but an identical item (Identical summary sans actual build label, and identical description) that was open only matched 86%. We have no idea why this would be. We opened the latter item because the former had been closed as fixed. The problem re-occurred, so we created the new item, but the RTC "find similar work items" is not matching the new item. Then, oddly, another item was opened for the same problem (which was a duplicate).. THAT item also matched 100%, superceding the first one, but bypassing the second one. ALL THREE have identical summary lines (aside from the build label), and identical leading description text.

What the heck? Can anyone shed some light on how the "find similar work items" algorithm works so we can do the right thing to ensure accurate matching for known issues?

3 votes

Comments

What was even stranger was initially when we had 2 work items A & B then they had the following matches:
A:100% (but closed)
B:86%
I then raised item C - by right clicking on test failure and creating a work item and then had these matches:
A:38%
B:34%
C:100%
Then when I cancelled C as a duplicate of B:
A: 56%
B: 52%
C: 100%

All 3 defects had the same summary (excluding build id) and identical descriptions - auto generated. So I don't know why they've different matches but I really don't know why creating a new defect would change the match factor of the others.



One answer

Permanent link
When the work item save is complete the work item summary, description and Comments are passed to the lucene for indexing.
When there is a call to find out the potential duplicates,
the Summary, Description and comments of the work item for which duplicates needs to be found is sent to search in the lucene index.

So (full) text search criteria are:
- summary
- description
- comments

Hope it helps,
Eric

0 votes

Your answer

Register or log in to post your answer.

Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.

Search context
Follow this question

By Email: 

Once you sign in you will be able to subscribe for any updates here.

By RSS:

Answers
Answers and Comments
Question details
× 12,024

Question asked: Mar 25 '14, 11:48 a.m.

Question was seen: 4,065 times

Last updated: Apr 01 '14, 8:03 a.m.

Confirmation Cancel Confirm