Is there a way to split the same RTC java query Results in batches for different threads or processes
![](http://jazz.net/_images/myphoto/d995808d01818c6edd8c62eddbcf5289.jpg)
Here is the query
Fetch all DEFECT workitem with processing status of CLASSIFIED (custom attribute) within a specified PROJECTAREA
That by itself does not seem to offer a way of splitting the query to multiple entities.
So here is the usecase/requirement
I need to implement parallelism in my batch program so multiple subjobs can split and work with or on results list from one RTC query.
Ideally if the query was more complex I would split them so that each subjob runs a part of the query but that is not quite
Here is what I am hoping is feasible and I need help
- Run the same query and determine the number of results available say 3000 items in some sorted order
-Split the number of results among my parallel so If I have 3 jobs I want to have each of them fetch 1000 results with a flow such as this
JOB1 - Rerun the query and fetch the first1000 results so 1 to 1000
JOB2 - Rerun the query and OR if the results can be stored at the RTC side, fetch the next 1000 results 1001 to 2000
JOB3 - Rerun the query and OR if the results can be stored at the RTC side, fetch the next 1000 results 2001 to 3000
Is this possible ?
Is there a mechanism to store query results in a sorted order so the multiple entities can fetch different part/regions of the results,
I am bit lost
Please help
One answer
![](http://jazz.net/_images/myphoto/d995808d01818c6edd8c62eddbcf5289.jpg)
Comments
![](http://jazz.net/_images/myphoto/d995808d01818c6edd8c62eddbcf5289.jpg)
Ralph,
I am a bit unclear with the process described . How can I differentiate what is going to each process ... Or rather how does each process pick a different part of the unresolved set
![](http://jazz.net/_images/myphoto/e5e63d5878217b64611c1df9401b7cd3.jpg)
Mark, have you looked at the post? Section Process Paged Results should explain how you can get paged results and how you could send each paged result sub set over to some thread. I think I remember you can paginate unresolved results also.
![](http://jazz.net/_images/myphoto/e5e63d5878217b64611c1df9401b7cd3.jpg)
I have to look if I can find the code and make it available for download. That'll take a while. But the post shows the main items you need to know. You can get paged results that have a number of items and you can process each page independently.
![](http://jazz.net/_images/myphoto/d995808d01818c6edd8c62eddbcf5289.jpg)
Thanks Ralph,
Yes I am going the article . I saw the paged results section. My problem is the subjob or processes do not run sequentially so I am not clear how they will fetch different results. I will wait on your code, that may make things clearer
The paged results looks good but how can I tell job3 to point the page 3 of the results.
I assume the following as a sample process
get the total number of results ... say 3000
Set the pagesize to 1000
I can job1 to run query to get resolved results and get firstpage - straightforward
My dilemma is how to point to the second or third page ..unless I have each process page through the all the results and stop where they need that page
![](http://jazz.net/_images/myphoto/e5e63d5878217b64611c1df9401b7cd3.jpg)
I think what I did back then is pass each page to a separate thread. I have some code up here: https://hub.jazz.net/project/rschoon/Jazz%20In%20Flight in com.ibm.js.team.workitem.automation.examples There is a class SynchronizeAttributesParallel, but it is pretty much a mess and all commented out. I did the code at a trade fair and had no time to consolidate.
![](http://jazz.net/_images/myphoto/d995808d01818c6edd8c62eddbcf5289.jpg)
So I am back to try to get this to work. Increasingly I realize need the parallelism for performance. I understand the scoped page results better but does not fit so nicely .. So my use case
ll job will run the same query independently but I have to find a way for them to fetch different pages of the same cached results.
Unfortunately using websphere batch , I can only pass them simple objects as property keys ( ie number, simple string) from the "master job" to the subjobs instead of say complex objects(ie partial list of result ).
The main will set page scope by query num of results and divide by num subjobs
My current is pass each subjob a number , 1,2,3 ... so job has key 1 , job 2 key 2 , etc .. Since each will run the same job I still need a way to tell , say job 2, to run the same query as other sub job but only fetch 2nd page
If 6 jobs , knowing page size , job 6 should fetch page 6 results
![](http://jazz.net/_images/myphoto/d995808d01818c6edd8c62eddbcf5289.jpg)
Rather long winded but I hope it makes sense. There are other more different approaches but too sloppy . For example from master job get all worlkitem ids , split the results and write to separate file . Sent each job one filename.
Each job will be responsible for getting workitem id and then use that to fetch the resolved workitem.
I would rather not use that if I can avoid it
![](http://jazz.net/_images/myphoto/e5e63d5878217b64611c1df9401b7cd3.jpg)
Sorry, I have no more information to provide. I used the pages to do work in parallel threads and it just worked well for me. Otherwise you would have to iterate the unresolved results and pass a collection of those entries to the thread to work on them in parallel.