JBEs randomly dying after upgrade from RTC 4.0.7 to 5.0.2
After an RTC server upgrade, I had to update the java build engines from using 4.0.7 to 5.0.2, along with the java version (running Java 1.6). There are a number of build engines all running with unique engine names (no duplicates).
Before the upgrade, builds were running without error. After the update, as a build submission is seen by the JBE, the JBE and java process appears to immediately die (the jbe process no longer is running on the system). The RTC server does report that the job is 'running' on the that JBE (reports the job running in the build_queue).
I will add that the build definition is called many times before and doesn't fail. This seems to be random.
I'm still trying to narrow down this problem as it is difficult to troubleshoot with few logs. Also, I am adding -verbose to the jbe start command to get more info.
From JBE system a:
2021-09-08 16:22:57 [Jazz build engine] Exception occurred in build loop: CRJAZ0215E The following record was not found in the database: com.ibm.team.build.internal.common.model.impl.BuildResultHandleImpl@fb266ca8 (stateId: [UUID _p0eaoBDcEey0jLveLaq5Rg], itemId: [UUID _py3REBDcEey0jLveLaq5Rg], origin: , immutable: true)
com.ibm.team.repository.common.ItemNotFoundException: CRJAZ0215E The following record was not found in the database: com.ibm.team.build.internal.common.model.impl.BuildResultHandleImpl@fb266ca8 (stateId: [UUID _p0eaoBDcEey0jLveLaq5Rg], itemId: [UUID _py3REBDcEey0jLveLaq5Rg], origin: , immutable: true)
com.ibm.team.repository.common.ItemNotFoundException: CRJAZ0215E The following record was not found in the database: com.ibm.team.build.internal.common.model.impl.BuildResultHandleImpl@fb266ca8 (stateId: [UUID _p0eaoBDcEey0jLveLaq5Rg], itemId: [UUID _py3REBDcEey0jLveLaq5Rg], origin: , immutable: true)
From JBE system b:
2021-09-08 11:20:53 [Jazz build engine]
2021-09-08 11:20:53 [Jazz build engine] Sleeping for 5 seconds...
2021-09-08 11:20:59 [Jazz build engine] Exception occurred in build loop: UpdateItemCurrentRow failure: Params=[_8v-7ABDfEey0jLveLaq5Rg,_Nc8H4BDdEey0jLveLaq5Rg,_8mLewBDfEey0jLveLaq5Rg]
com.ibm.team.repository.common.StaleDataException: UpdateItemCurrentRow failure: Params=[_8v-7ABDfEey0jLveLaq5Rg,_Nc8H4BDdEey0jLveLaq5Rg,_8mLewBDfEey0jLveLaq5Rg]
2021-09-08 11:20:53 [Jazz build engine] Sleeping for 5 seconds...
2021-09-08 11:20:59 [Jazz build engine] Exception occurred in build loop: UpdateItemCurrentRow failure: Params=[_8v-7ABDfEey0jLveLaq5Rg,_Nc8H4BDdEey0jLveLaq5Rg,_8mLewBDfEey0jLveLaq5Rg]
com.ibm.team.repository.common.StaleDataException: UpdateItemCurrentRow failure: Params=[_8v-7ABDfEey0jLveLaq5Rg,_Nc8H4BDdEey0jLveLaq5Rg,_8mLewBDfEey0jLveLaq5Rg]
There are about 16 JBE systems running (a single jbe instance per system), and all JBEs are running with different JBE names.
The RTC server is running at version 6.0.6.1. The systems running are x8664 linux running JBE version 5.0.2 and java 1.6 -32bit version.
Also, there are about ~16 virtual machines running SLES 11.3 with each running a single JBE instance.
Anyone know why the errors might occur?
One answer
Consider upgrading all JBE's and all uses BuildSystem Toolkits to the same version your server runs on. Use the jre shipped with the JBE. If you think you have a reason why not, state it here.
Consider to open a case with support. There might be an issue with the database - hence the ItemNotFound Exceptions and the stale data exception. Item not found is missing items, stale data is that another process has updated the item in the mean time.
It is unclear what the issue the JBE has is. If it persists after you upgraded to the same version, consider to stop the scheduler, abandon the builds and restart the scheduler again.
Comments
Brock Rother
Sep 09 '21, 12:53 p.m.Brock Rother
Sep 09 '21, 12:53 p.m.