Welcome to the Jazz Community Forum
Memory leak from RTC's java process?

A few weeks again our 2.0 RTC server running within WebSphere 6.0 on Linux crashed due to memory crunch.
After the restart, I start to monitor the RAM usage to see if I can catch anything.
Based on the data from running top, the RAM used by the Jazz's java process on this particular server keep going up afer the restart and added almost 1 GB RAM to its pocket in less than 10 days.
What happened since Jan. 15 is that SDS did some change on the SMTP server setting and it broke the email notification process on RTC.
What is intersting about this is that it caused a 1 GB drop in RAM usage on this RTC server between Jan. 15 and 18:
Jan. 15:Java:VIRT 3370m RES 2.9g
Jan. 18: Java:VIRT 2309m RES 2.0g
And then it starts to crawl back again after the email notification works again:
Jan. 19: Java: VIRT 2581m RES 2.2g
Before the break, we see the following errors in the jazz log:
WARN bm.team.repository.common.transport.ServerHttpUtil - CRJAZ1228I An error occurred while trying to marshal a different error back to the client.
During the break when the email notification was not working, we only see errors like:
Could not connect to SMTP host: asc-bh-01.swg.usma.ibm.com, port: 25;
And WARN:
WARN net.jazz.ajax.service.internal.http.ProxyHandler - Attempt made to make a remote request to URI http://cobertura.sourceforge.net/, but no whitelist is configured
After the break is fixed, the pre-break errors come back, so rise the RAM usage:
WARN bm.team.repository.common.transport.ServerHttpUtil - CRJAZ1228I An error occurred while trying to marshal a different error back to the client.
Two interesting observation:
1. When the email notification service stopped working, it somehow released back almost 1 GB RAM back;
2. As soon as the email notification service started working again, we start to see the error trying to marshal a different error back to the client. and the RAM consumption start to crawl back, 200 MB overnight.
Can we conclude that there is some kind of memeory leak? How do I go further to collect more evidence?
After the restart, I start to monitor the RAM usage to see if I can catch anything.
Based on the data from running top, the RAM used by the Jazz's java process on this particular server keep going up afer the restart and added almost 1 GB RAM to its pocket in less than 10 days.
What happened since Jan. 15 is that SDS did some change on the SMTP server setting and it broke the email notification process on RTC.
What is intersting about this is that it caused a 1 GB drop in RAM usage on this RTC server between Jan. 15 and 18:
Jan. 15:Java:VIRT 3370m RES 2.9g
Jan. 18: Java:VIRT 2309m RES 2.0g
And then it starts to crawl back again after the email notification works again:
Jan. 19: Java: VIRT 2581m RES 2.2g
Before the break, we see the following errors in the jazz log:
WARN bm.team.repository.common.transport.ServerHttpUtil - CRJAZ1228I An error occurred while trying to marshal a different error back to the client.
During the break when the email notification was not working, we only see errors like:
Could not connect to SMTP host: asc-bh-01.swg.usma.ibm.com, port: 25;
And WARN:
WARN net.jazz.ajax.service.internal.http.ProxyHandler - Attempt made to make a remote request to URI http://cobertura.sourceforge.net/, but no whitelist is configured
After the break is fixed, the pre-break errors come back, so rise the RAM usage:
WARN bm.team.repository.common.transport.ServerHttpUtil - CRJAZ1228I An error occurred while trying to marshal a different error back to the client.
Two interesting observation:
1. When the email notification service stopped working, it somehow released back almost 1 GB RAM back;
2. As soon as the email notification service started working again, we start to see the error trying to marshal a different error back to the client. and the RAM consumption start to crawl back, 200 MB overnight.
Can we conclude that there is some kind of memeory leak? How do I go further to collect more evidence?
6 answers

We are on 2.0.
After the email notification working again, the java process is adding about 200MB RAM consumption each day.
What made me suspecious about the email notification is that first when it stopped working, there were 1GB RAM released, and then right after it started to work again, the java process' RAM usage start to come back again, increasing 200MB each day, according to top data.
After the email notification working again, the java process is adding about 200MB RAM consumption each day.
What made me suspecious about the email notification is that first when it stopped working, there were 1GB RAM released, and then right after it started to work again, the java process' RAM usage start to come back again, increasing 200MB each day, according to top data.
Thats weird, the work item e-mail service is a stateless task that
should not increase memory consumption in any way. Whats the exact
version, is it 2.0.0.2?

Can you estimate how many mails are sent out for work item changes per day?
It is hard to say but I counted the connection error entries in the log when the SMTP server was down:
Jan. 16: there were about 150.
Jan. 17: there were over 300.
One thing worth to mention is that I found in the log that the Jazz server was restarted aound middle day on Jan 17 (we run RTC within WebSphere and Jazz can be restarted as an application in WebSphere). So I guess the RAM usage drop is from that, not just the email notification service is down.
Still the daily 200MB RAM usage increase after the Jazz server restart is very alarming and scary.

I changed this topic's title as it is clear now that due to a auto-patch setup, our RTC server was rebooted on Jan 17th, so that the 1GB RAM usage drop is not completely from the email notification process's down.Sorry for the confusion.
Still, given the memory crunch we had weeks ago, we are alarmed by the fact that after the reboot, the RTC java process's RAM usage keep rising at over 100MB per day, starting with 2.0 GB, and now standing 2.5GB.
The only error in the jazz.log is about CRJAZ1228I An error occurred while trying to marshal a different error back to the client.
Any tips on how to break the RTC java process' memory allocation fregments so to track which subprocess(es) have been taken more and more RAM and failed to release back would be appreciated very much.
Still, given the memory crunch we had weeks ago, we are alarmed by the fact that after the reboot, the RTC java process's RAM usage keep rising at over 100MB per day, starting with 2.0 GB, and now standing 2.5GB.
The only error in the jazz.log is about CRJAZ1228I An error occurred while trying to marshal a different error back to the client.
Any tips on how to break the RTC java process' memory allocation fregments so to track which subprocess(es) have been taken more and more RAM and failed to release back would be appreciated very much.

Just update what we find on this issue.
We sent the java coredump files to WAS support when I raised PMR against WebSphere as our RTC server run within WAS. And I was told that their analysis of the java coredump from us point to a RTC component called ClientMessage which may not handle their memory allocation cleanly.
Then I move to work with RTC support and dev. team and got confirmation today that this is a known issue (https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/73406 ClientMessages will keep classloaders in memory) of RTC 2.0. It has been addressed in RTC 2.0.0.2.
We sent the java coredump files to WAS support when I raised PMR against WebSphere as our RTC server run within WAS. And I was told that their analysis of the java coredump from us point to a RTC component called ClientMessage which may not handle their memory allocation cleanly.
Then I move to work with RTC support and dev. team and got confirmation today that this is a known issue (https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/73406 ClientMessages will keep classloaders in memory) of RTC 2.0. It has been addressed in RTC 2.0.0.2.