It's all about the answers!

Ask a question

Is there a way to split the same RTC java query Results in batches for different threads or processes

mark owusu-ansah (5659) | asked Jun 27 '13, 12:02 p.m.
edited Jun 27 '13, 1:49 p.m.
The requirement is a  bit  odd   because the  query is very simple and  I cannot find a way to  split the query  itself so I am looking at the results  as a possible  option
Here  is the  query
Fetch all  DEFECT  workitem   with   processing status of  CLASSIFIED (custom attribute)   within a specified  PROJECTAREA
That  by itself does not seem to offer a way of  splitting the query to multiple  entities.

So  here is the usecase/requirement
I need to implement  parallelism in my batch program  so  multiple subjobs  can split  and  work with or on  results list from one RTC  query.
Ideally  if the  query was more complex I would  split them so that each subjob runs a part of the query but that is not quite

Here is what I am hoping is feasible and I need help
- Run the same query and determine  the number of results available   say 3000 items  in some sorted order
-Split the number of results  among my  parallel so   If I have 3  jobs  I want to have each  of them   fetch 1000 results  with a flow such  as this
 JOB1 -   Rerun the query and  fetch  the first1000  results  so  1 to 1000
 JOB2 -  Rerun  the query and OR if the results can be stored at the RTC side, fetch the next 1000 results 1001 to 2000
  JOB3 -  Rerun  the query and OR if the results can be stored at the RTC side, fetch the next 1000 results 2001 to 3000
Is this  possible ?   
Is there a mechanism  to store query results in a  sorted order so the multiple entities can fetch  different part/regions of the  results,
I am bit lost
 Please  help

One answer

permanent link
Ralph Schoon (62.9k33645) | answered Jun 28 '13, 3:00 a.m.
I have described what I think is possible here: You can probably get the unresolved result set and then pass the results in batches to other processes as well. That is all I know.

mark owusu-ansah commented Jun 28 '13, 11:04 a.m.

   I am a bit   unclear  with the process  described .  How  can I differentiate  what is going to each process ...  Or rather  how does  each process pick a different  part of the  unresolved  set

Ralph Schoon commented Jun 28 '13, 11:17 a.m. | edited Jun 28 '13, 11:18 a.m.

Mark, have you looked at the post? Section Process Paged Results should explain how you can get paged results and how you could send each paged result sub set over to some thread. I think I remember you can paginate unresolved results also.

Ralph Schoon commented Jun 28 '13, 11:29 a.m.

I have to look if I can find the code and make it available for download. That'll take a while. But the post shows the main items you need to know. You can get paged results that have a number of items and you can process each page independently.

mark owusu-ansah commented Jun 28 '13, 11:51 a.m.

Thanks  Ralph,
Yes  I  am going the  article .    I saw the paged  results  section. My problem  is the    subjob or  processes  do not  run sequentially  so  I am not clear   how  they will fetch different results.  I will wait on  your   code, that may make  things clearer
 The paged  results  looks good but  how  can I  tell  job3   to  point  the   page 3 of the results. 
I assume the following as a sample  process
 get the total  number of results ...  say  3000
 Set  the pagesize to 1000
I can job1  to  run query to get resolved results  and  get firstpage - straightforward
My dilemma is  how  to point to the second  or third page ..unless  I have each process page through the  all the results  and  stop  where they  need that page

Ralph Schoon commented Jun 28 '13, 11:59 a.m.

I think what I did back then is pass each page to a separate thread. I have some code up here: in There is a class SynchronizeAttributesParallel, but it is pretty much a mess and all commented out. I did the code at a trade fair and had no time to consolidate.

mark owusu-ansah commented Sep 17 '13, 9:47 p.m.

So  I am back to try to get this to work.  Increasingly I realize need the parallelism for  performance.  I understand the scoped page results   better  but does not fit so nicely  .. So my  use case 
  ll  job  will run the same  query independently  but I have to  find a way  for them to fetch different  pages  of  the same  cached results.
Unfortunately using websphere batch ,  I can only pass them simple objects  as property keys ( ie  number, simple string) from the  "master job" to the  subjobs  instead of say complex  objects(ie  partial list of  result ).
The main will set  page scope by  query num of results and divide by  num subjobs
My current is pass  each subjob   a number  , 1,2,3 ... so job  has key 1 , job 2 key 2 , etc ..  Since each will run the same job I still need a way to tell , say  job 2,  to run the same query as other sub job but only fetch 2nd page
If 6 jobs , knowing page size , job 6 should fetch page 6 results

mark owusu-ansah commented Sep 17 '13, 9:51 p.m.

Rather  long winded but I hope it makes sense. There are other more different approaches but too sloppy . For example   from master job  get all worlkitem ids , split the results and write to separate file . Sent each job one filename.
Each job will be responsible for  getting workitem id and then use that to fetch the  resolved workitem.
I would rather not use that  if I can avoid it

Ralph Schoon commented Sep 18 '13, 2:52 a.m.

Sorry, I have no more information to provide. I used the pages to do work in parallel threads and it just worked well for me. Otherwise you would have to iterate the unresolved results and pass a collection of those entries to the thread to work on them in parallel.

showing 5 of 8 show 3 more comments

Your answer

Register or to post your answer.