Jobs queued for one server in 7.1.2 do not seem to be FIFO

David Parrott

(21●2●3) Oct 13 '11, 6:57 a.m.

We have been investigating the order of execution of build jobs that have been submitted to Build Forge. Our builds typically take several hours each, so it is important to us that the builds be executed in the order in which they were submitted.

However, instead of seeing this First In First Out (FIFO) behaviour, we are seeing a much more random order of execution which is very far from FIFO.

An experiment is described below, in which we essentially fire jobs into Build Forge at 2 minute intervals where each job lasts about 15 minutes and can only run on one possible server, with the experiment configured to ensure each job runs to completion before the next job can start execution.

We would expect to see BUILD_1 run to completion, then BUILD_2, then BUILD_3, then BUILD_4 and so on.

The actual execution order we are seeing is clearly demonstrated by a time-ordered list of the job directories created by Build Forge:

Oct 13 09:40 BUILD_1
Oct 13 09:55 BUILD_4
Oct 13 10:10 BUILD_14
Oct 13 10:23 BUILD_13
Oct 13 10:38 BUILD_8
Oct 13 10:53 BUILD_25
Oct 13 11:09 BUILD_3
Oct 13 11:24 BUILD_11
Oct 13 11:39 BUILD_21
Oct 13 11:54 BUILD_9
Oct 13 12:09 BUILD_15
Oct 13 12:24 BUILD_20
Oct 13 12:39 BUILD_6
Oct 13 12:54 BUILD_24
Oct 13 13:09 BUILD_18
Oct 13 13:24 BUILD_5
Oct 13 13:39 BUILD_2
and so on...

We would welcome any comments. FOr example:
Is Build Forge supposed to execute jobs as FIFO?
Have we misunderstood what Build Forge is supposed to do?
Have we made a fundamental mistake in the setup used in the experiment (see below for the details)?

Thanks
David Parrott

Experiment

In order to test Build Forge job queueing, we have set up a very simple arrangement as follows:

One server called "nosem1", with:
Max Jobs=1

The collector for this server is called "nosem1" and sets the following:
BF_RESERVE
STREAM: type=Set Value; value=#davep000#

We have a selector called "davep000" with the following condition:
STREAM must contain #davep000#

We have a project called "davep000" with the following properties:
Max Threads=unlimited
Run Limit=unlimited
Sticky=Sticky
Selector=davep000

The project has two simple steps:
Step 1: execute "set" to show the environment
Step 2: sleep 15 minutes

Finally we have the Scheduler set to run project "davep000" every 2 minutes. We make this schedule active for around an hour for this experiment then make the schedule inactive again.

So essentially we are firing a new "davep000" job every two minutes. Only one job can run at a time, because of the combination of "max jobs = 1", BF_RESERVE and Sticky. Each job takes a little over 15 minutes thanks to the sleep in step 2.

0 votes

2 answers

7,108 views

0 votes

2 answers

Permanent link

Jonas Gryder

(116●6●3) Oct 13 '11, 10:04 a.m.

We have been investigating the order of execution of build jobs that have been submitted to Build Forge. Our builds typically take several hours each, so it is important to us that the builds be executed in the order in which they were submitted.

However, instead of seeing this First In First Out (FIFO) behaviour, we are seeing a much more random order of execution which is very far from FIFO.

An experiment is described below, in which we essentially fire jobs into Build Forge at 2 minute intervals where each job lasts about 15 minutes and can only run on one possible server, with the experiment configured to ensure each job runs to completion before the next job can start execution.

We would expect to see BUILD_1 run to completion, then BUILD_2, then BUILD_3, then BUILD_4 and so on.

The actual execution order we are seeing is clearly demonstrated by a time-ordered list of the job directories created by Build Forge:

Oct 13 09:40 BUILD_1
Oct 13 09:55 BUILD_4
Oct 13 10:10 BUILD_14
Oct 13 10:23 BUILD_13
Oct 13 10:38 BUILD_8
Oct 13 10:53 BUILD_25
Oct 13 11:09 BUILD_3
Oct 13 11:24 BUILD_11
Oct 13 11:39 BUILD_21
Oct 13 11:54 BUILD_9
Oct 13 12:09 BUILD_15
Oct 13 12:24 BUILD_20
Oct 13 12:39 BUILD_6
Oct 13 12:54 BUILD_24
Oct 13 13:09 BUILD_18
Oct 13 13:24 BUILD_5
Oct 13 13:39 BUILD_2
and so on...

We would welcome any comments. FOr example:
Is Build Forge supposed to execute jobs as FIFO?
Have we misunderstood what Build Forge is supposed to do?
Have we made a fundamental mistake in the setup used in the experiment (see below for the details)?

Thanks
David Parrott

Experiment

In order to test Build Forge job queueing, we have set up a very simple arrangement as follows:

One server called "nosem1", with:
Max Jobs=1

The collector for this server is called "nosem1" and sets the following:
BF_RESERVE
STREAM: type=Set Value; value=#davep000#

We have a selector called "davep000" with the following condition:
STREAM must contain #davep000#

We have a project called "davep000" with the following properties:
Max Threads=unlimited
Run Limit=unlimited
Sticky=Sticky
Selector=davep000

The project has two simple steps:
Step 1: execute "set" to show the environment
Step 2: sleep 15 minutes

Finally we have the Scheduler set to run project "davep000" every 2 minutes. We make this schedule active for around an hour for this experiment then make the schedule inactive again.

So essentially we are firing a new "davep000" job every two minutes. Only one job can run at a time, because of the combination of "max jobs = 1", BF_RESERVE and Sticky. Each job takes a little over 15 minutes thanks to the sleep in step 2.

I just recently posed a similar question in PMR form, and got a pretty good explanation. If jobs are waiting, whether for a server, a queue, or a semaphore, the jobs poll every so often to see if the slot is open. First job to find it open gets the slot, so when you have things waiting, the order can appear random. You would need to adjust the run queue size, the number of jobs the server can run simultaneously, or find some other way to control how many jobs are waiting for the same resource. If there is more than one waiting, the run order can appear random.

As a result, you would want the wait to be as close to job completion as possible, so other things are not held up. For example, you might first make sure that there is no wait on the run queue, then look to the project, then the server, then any semaphores. Taking into account, of course, that you still don't want to overload any single resource.

0 votes

Permanent link

David Parrott

(21●2●3) Oct 13 '11, 12:27 p.m.

Thanks for that response Jonas. Interesting.

We have previously used semaphores to control access to servers in a much more complex environment, and observed the same 'random' behaviour there. So we had hoped that by returning to the most basic BF queueing we would escape that. However, your explanation makes it clear that they all suffer a similar fate.

The Run Queue size may not be something we can change much due to the volume of builds that pass through BF and the large number of server definitions we have. The number of jobs is always one per server for our own operational reasons. However, we can certainly take another look at our design in light of your explanation and suggestions. It would be a shame if we had to resort to an external queueing mechanism as we would prefer to avoid bespoke add-ons. IMHO not having a FIFO queue is a fundamental flaw in the design of BF.

0 votes

Your answer

Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here.

By RSS:

Answers

Answers and Comments

Question details

rational-build-forge

× 1,192

Question asked: Oct 13 '11, 6:57 a.m.

Question was seen: 7,108 times

Last updated: Oct 13 '11, 6:57 a.m.

Jobs queued for one server in 7.1.2 do not seem to be FIFO

David Parrott

2 answers

7,108 views

0 votes

2 answers

Jonas Gryder

David Parrott

Your answer

Follow this question

Question details

Related questions