It's all about the answers!

Ask a question

How to troubleshoot BuildForge performance?


Jirong Hu (1.5k9290258) | asked Jul 02 '15, 10:35 p.m.
I have a BF mgmt console server and shared DB2 server. Everything was normal until they move the servers around and completely virtualized the BF server. Now every click takes 10 seconds to respond. The ping from the BF server to DB2 server returns in 1ms so it's not network issue. Can you give me some doc, or ideas?

Thanks
Jirong

5 answers



permanent link
Tiffany Pei (139523) | answered Jul 29 '15, 11:45 p.m.
edited Jul 29 '15, 11:46 p.m.
The distinction of "root" and "LDAP/domain user" leads to the possible cause that LDAP is the culprit.
On the BF server, if LDAP cache service is running, then after first LDAP query, subsequent LDAP lookups should be done via local LDAP cache.
In your case, local LDAP cache service is most likely not running, hence every LDAP call will go to LDAP server, giving you the extra 5 sec delay.

Comments
Howard Hsiao commented Jul 29 '15, 11:52 p.m.
JAZZ DEVELOPER

@hujirong please check if nscd is running on BF server.
nscd stands for Name Service Cache Daemon, which provides a cache for the most common name service requests.


Jirong Hu commented Jul 30 '15, 10:02 a.m.

 Our BF server is on Windows, how to start this daemon? I also find this post, because we are using ClearCase on Windows too. Where (on ClearCase server?) and how to check this service on Windows? http://www-01.ibm.com/support/docview.wss?uid=swg21500565


permanent link
Donald Nong (14.5k414) | answered Jul 03 '15, 12:06 a.m.
For a visualized machine to have performance identical to or close to that of a physical machine, make sure that the assigned resources are all _dedicated_, not _shared_. The resources can include CPUs, memory and hard disks. When the resources are shared, you just don't know what you are actually getting - just think what happens when Windows has to heavily utilizes the "virtual memory".

permanent link
Howard Hsiao (5.5k17) | answered Jul 03 '15, 1:00 a.m.
JAZZ DEVELOPER
What does your virtualised environment look like?
How many virtual machines are running on the physical host?
How CPU/memory intensive are other virtual machines using?
On this BuildForge virtual machine, is BuildForge the only application that you notice slowness?

If you are using virtualised solution from VMware, here is the general guideline to optimise performance
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008360

permanent link
Jirong Hu (1.5k9290258) | answered Jul 03 '15, 10:03 a.m.
The above sounds like system admin's work. Is there anything I can do as the BF admin, while waiting for the SA to troubleshoot? What can I check at BF iteself?

Jirong

Comments
Howard Hsiao commented Jul 03 '15, 10:16 a.m.
JAZZ DEVELOPER

As a BF admin, you can find out what operations that significantly take longer than expected and what operations that take about the same time.

After you got two groups of operations (one group is normal and the other group takes longer time), you can find out the distinctions between those two groups of operations and then come up with probable causes.


Jirong Hu commented Jul 03 '15, 10:40 a.m. | edited Jul 03 '15, 10:49 a.m.

 Everything is wrong. In the web pages, before it may take 1 sec to response now all take 10 sec, I mean every operation/clicking on the web page. And it's my testing environment, there is no load at all. Initially I was suspecting the connection to DB2 is slow, but not all operstions/clicking will have to connect to DB2, right? How can I do a trace from click on the web page all the way to the DB2?


Jirong Hu commented Jul 03 '15, 9:02 p.m.

I just installed a brand new Tomcat 7 on this server, it performs perfectly normal, every click gets back instantly.


Donald Nong commented Jul 05 '15, 10:01 p.m.

Did you mean a brand new Tomcat 7 without BF running on it? In this case it would be quite different since it may only serves "static" contents.
At this stage, I would suggest you run a network trace on the BF server, so that it can capture both the traffic between the browser and BF server, and between the BF server and the DB2 server. But again, that will be a job for the system/network administrator.
If you're still struggling, you'd better contact Support.


Jirong Hu commented Jul 06 '15, 9:33 a.m.

 It's a new Tomcat with no BF. I will ask the SA to do a trace and also submit a PMR to IBM support, but support usually won't provide this type of help, and they will ask us to buy a consultant from IBM?


Meanwhile, if I want to do this trace myself, can you tell me how? Which command and things like this?

Thanks
Jirong


Donald Nong commented Jul 06 '15, 8:03 p.m.

One of the popular network capture and analysis tools is Wireshark.
If you're using Windows, just use Wireshark to capture and analyze the packets. The tools is quite easy to use.
If you're using Linux, the command "tcpdump" is usually bundled, and you can use it to capture and then use Wireshark to analyze (with its nice GUI).
http://www.thegeekstuff.com/2010/08/tcpdump-command-examples/
(I usually use "tcpdump -XX -i eth0 -w capture.pcap")
The most difficult part is the analysis as you need to have sufficient network knowledge to understand that packets.


Spencer Murata commented Jul 09 '15, 8:14 a.m.
FORUM MODERATOR / JAZZ DEVELOPER

 Also if the problem is on click you can try something like Firebug to see exactly where the time goes.  If it is a problem with the network then you can proceed with a Wireshark trace, but if its a problem with the scripting then it will narrow the problem down for support.


~Spencer

showing 5 of 7 show 2 more comments

permanent link
Jirong Hu (1.5k9290258) | answered Jul 29 '15, 11:16 p.m.
We may have identify the issue: the LDAP is slow. The LDAP has added a lot entries after a few companies are merged.

This is what we have observed:
1. If I login with user "root", everything is normal, each click in web GUI responses with a second.
2. if I login with an LDAP/domain user, then it's very slow.
3. If I use the Softerra LDAP Browser to search the user, it takes about 5 seconds, kind of matching to the response time we get for each click in BF web GUI.

If it's the LDAP causing the issue, there is one thing I don't understand. If the LDAP's slowness affects login, I can understand, but why after login, the click on Project, Step, Server whatever menu is still slow? BF has to search LDAP for each action? Shouldn't be a token or something like that for the next half an hour? We do have access groups created to control the access to projects and steps.

Your answer


Register or to post your answer.


Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.