Build Forge not reporting status back to RTC - intermittent issue
RTC 3.0.1.2
BF 7.1.2.2
We are running the RTC - BF integration (our build definitions have a Build Forge tab) and are starting builds from RTC.
Every so often the build result in BF is not reported back to RTC. We are still charting things and are trying to identify a pattern. My initial guestimate is that it happens for 5 to 10 builds out of around 250 to 300 a day and it just could be that the cluster is when onshore starts work.
We had a spike of them when we needed to run a massive job purge in BF.
Since this is prod, turning on significant logging is not an option - very targeted logging could be an option. Reproducing this in a test env is also not feasible in my opinion since we'd have to reproduce the load of about 600 worldwide users or have to have a fairly good idea just what we need to put load on.
We have a PMR open, but the default path forward is to turn on more logging or reproduce in a test env isn't really that feasible.
I am hoping that somebody has some hints or guesses.
BF 7.1.2.2
We are running the RTC - BF integration (our build definitions have a Build Forge tab) and are starting builds from RTC.
Every so often the build result in BF is not reported back to RTC. We are still charting things and are trying to identify a pattern. My initial guestimate is that it happens for 5 to 10 builds out of around 250 to 300 a day and it just could be that the cluster is when onshore starts work.
We had a spike of them when we needed to run a massive job purge in BF.
Since this is prod, turning on significant logging is not an option - very targeted logging could be an option. Reproducing this in a test env is also not feasible in my opinion since we'd have to reproduce the load of about 600 worldwide users or have to have a fairly good idea just what we need to put load on.
We have a PMR open, but the default path forward is to turn on more logging or reproduce in a test env isn't really that feasible.
I am hoping that somebody has some hints or guesses.
One answer
We refactored the integration in the 4.x line so there definitely could be something happening. For targeted log4j it would have to be the BuildForgeEventPollerRunnable in com.ibm.rational.buildforge.team.internal.service.
~Spencer
Comments
Thanks a bunch
I have managed to reproduce it in UAT and we'll turn on tracing and see whether we can catch it.
It happens when more than 10 (after a RTC DB server change more than 15) builds are started at the same time.
We manage to do that regularly, some with build automation but also with users kicking of a series of builds.
These builds are using the same Build Engine.
Is something like that in your stress testing for 4.x?