Jazz Forum Welcome to the Jazz Community Forum Connect and collaborate with IBM Engineering experts and users

implementation of checkin, accept

Can somebody respond about the implementation of these operations in RTC?

1) When a file is checked in, which of the following is happening:

a) The client determines the delta with respect to the previous version, uploads it to the server, which applies it to the repository.

b) The client uploads the file to the server, which determines the delta and applies it to the repository.

I tested how long the checkin takes for 100 text files of 500 KB each after creating a small delta (adding one line to every file) and after creating a big delta (previous version empty). The times are as follows:

small delta: 40 seconds
big delta: 80 seconds

(I subtracted 20 seconds for the overhead of invoking scm.)

If (a) is happening, I would have expected the small delta checkin to be much faster, since hardly any data is being uploaded.

2) Same question with respect to accept: When files are downloaded to the sandbox, is it the deltas that are downloaded or the full contents of the files?

3) When binary files are uploaded or downloaded are they split up and the pieces transmitted separately in parallel? I'm thinking here of a tool like Free Download Manager, which greatly optimizes downloading files by downloading sections in parallel.

0 votes



5 answers

Permanent link
1) When a file is checked in, which of the following is happening:

b) The client uploads the file to the server, which determines the delta and applies it to the repository.


We currently upload the full text. This costs more in terms of bandwidth, but it lowers the on-disk footprint of the sandbox.

...
small delta: 40 seconds
big delta: 80 seconds

(I subtracted 20 seconds for the overhead of invoking scm.)


That's very charitable of you.

I suspect this has to do with the back-end storage of the repository, but I don't have experience in that area, so I can't speak knowledgeably on it. I can say that the back end store will likely have a fairly large impact. If you were testing against Derby, then I think you can expect much better run times when dealing with a more expensive database.

2) Same question with respect to accept: When files are downloaded to the sandbox, is it the deltas that are downloaded or the full contents of the files?


We download the full text. The content should be compressed with HTTP's compression, however.

3) When binary files are uploaded or downloaded are they split up and the pieces transmitted separately in parallel? I'm thinking here of a tool like Free Download Manager, which greatly optimizes downloading files by downloading sections in parallel.


We use a number of connections when uploading or downloading content. By default we open up to 10 connections, and pull files across those individually. We don't multiplex files across connections, however.

If you want to change the number of download connections in RTC, see Window > Preferences > Team > Jazz Source Control and modify the "Maximum number of download threads".

You can do the same thing in the CLI by modifying ~/.jazz-scm/preferences.properties and adding the line "content.threads={integer-of-your-choice}".

e

0 votes


Permanent link
On 8/13/2009 5:08 AM, David.Sedlock.infineon.com wrote:
Can somebody respond about the implementation of these operations in
RTC?

1) When a file is checked in, which of the following is happening:

a) The client determines the delta with respect to the previous
version, uploads it to the server, which applies it to the
repository.

b) The client uploads the file to the server, which determines the
delta and applies it to the repository.

b. The client doesn't currently cache or use the previous state to
optimize what is sent over the wire. Instead, we perform the delta
compression on the server.

I tested how long the checkin takes for 100 text files of 500 KB each
after creating a small delta (adding one line to every file) and
after creating a big delta (previous version empty). The times are as
follows:

small delta: 40 seconds
big delta: 80 seconds

(I subtracted 20 seconds for the overhead of invoking scm.)

If (a) is happening, I would have expected the small delta checkin to
be much faster, since hardly any data is being uploaded.

2) Same question with respect to accept: When files are downloaded to
the sandbox, is it the deltas that are downloaded or the full
contents of the files?

Same as above. Full content.

3) When binary files are uploaded or downloaded are they split up and
the pieces transmitted separately in parallel? I'm thinking here of a
tool like Free Download Manager, which greatly optimizes downloading
files by downloading sections in parallel.

We pipeline at the file level, so during check-in/load/accept, etc...
we use N threads to transfer files in parallel. But we don't chunk large
files into smaller pieces yet.

Cheers,
Jean-Michel

0 votes


Permanent link
Many thanks to Jean-Michel and echughes. I'm wondering if you see some room for optimization here.

I think the RTC decision to go for a non-replicated respository is right. (ClearCase multisite is a misery the likes of which I hope never to experience in my life again if we can get rid of CC.) But the burden of proof is on RTC to make sure that every ounce of fat is squeezed out of the client-server data transfers.

Large binary files are the output of chip design; I can't ignore this. If we can copy a file over a poor WAN connection in 10 minutes, we can't live with RTC copying it over in 20. Even 12 is too much. (11 is ok!) Getting it into and out of the RDBMS creates inevitable overhead (see my latest posting here http://jazz.net/forums/viewtopic.php?p=21305#21305), so the transfer has to be really clever. Multiplexing larger files is probably a good optimization.

As for text files, maybe the http compression and piplining files in parallel is enough and nothing would be gained by using deltas. Wouldn't it be a good idea to baseline that yourselves?

0 votes


Permanent link
I've created bug 90173 to track this.

David: Please describe your use case(s) and environment fully in the bug.

0 votes


Permanent link

If you want to change the number of download connections in RTC, see Window > Preferences > Team > Jazz Source Control and modify the "Maximum number of download threads".


In Eclipse the maximum allowed value is 25. Is this because performance degrades with the number of threads?

If the user is waiting anyway till all the files are loaded, he may as well tax his client fully with this task.

Of course, if higher values do not bring any overall increase in speed of transfer because the threads don't have the cpu time to do the job there is no point.


You can do the same thing in the CLI by modifying ~/.jazz-scm/preferences.properties and adding the line "content.threads={integer-of-your-choice}".


Are values great than 25 ignored?

0 votes

Your answer

Register or log in to post your answer.

Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.

Search context
Follow this question

By Email: 

Once you sign in you will be able to subscribe for any updates here.

By RSS:

Answers
Answers and Comments
Question details

Question asked: Aug 13 '09, 4:55 a.m.

Question was seen: 4,746 times

Last updated: Aug 13 '09, 4:55 a.m.

Confirmation Cancel Confirm