How to push a large number of RDF into JFS repository, is there any best practice?
I need a better performance to import RDF. I have such a requirement that I need to import 10,000 + RDF data into JFS repository as fast as I can. I have to query the data immediately after it is imported because that I have to do some validation. Now I use JFS storage RESTful services API to push the data and query them by SPARQL.
According to this : https://jazz.net/wiki/bin/view/Main/DavidJohnson_IndexingAndQuery
The index process is asynchronous. I use the way in https://jazz.net/wiki/bin/view/Main/JFSIndexStoreQueryAPI#Using_If_Modified_Since_Last_Mod
But I think it is not so efficient. And a mass of HTTP requests is too heavy. And the index process are really slow behind the insert action. Those data should always takes hours which is not acceptable for me. Is there any other efficiency way to write RDF data into JFS ? like directly write DB or others. I think 10,000 is not so large. Thanks a lot.
According to this : https://jazz.net/wiki/bin/view/Main/DavidJohnson_IndexingAndQuery
The index process is asynchronous. I use the way in https://jazz.net/wiki/bin/view/Main/JFSIndexStoreQueryAPI#Using_If_Modified_Since_Last_Mod
But I think it is not so efficient. And a mass of HTTP requests is too heavy. And the index process are really slow behind the insert action. Those data should always takes hours which is not acceptable for me. Is there any other efficiency way to write RDF data into JFS ? like directly write DB or others. I think 10,000 is not so large. Thanks a lot.
Accepted answer
You can save time spent in the HTTP layer by using https://jazz.net/wiki/bin/view/Main/JFSBulkOperations.
If you find the indexing slow, you might want to look at the speed of the disk where the indices are stored (./conf/<your application>/indices)
If you find the indexing slow, you might want to look at the speed of the disk where the indices are stored (./conf/<your application>/indices)
One other answer
There is no way to explicitly control the batching of index activity, and make it transactional as you suggest. There is internal batching when there is a large backlog of resources to index.
We had been thinking of batching JFS changes when committed, but this was never requested by any current consumer.
You should file an enhancement request with usecase at https://jazz.net/jazz/web/projects/Jazz%20Foundation#action=com.ibm.team.workitem.newWorkItem.
Knowing the time of the last committed resource, you can hold queries until indexing status has reached that point, or use conditional queries (https://jazz.net/wiki/bin/view/Main/JFSIndexStoreQueryAPI#Using_If_Modified_Since_Last_Mod) in the interim.
We had been thinking of batching JFS changes when committed, but this was never requested by any current consumer.
You should file an enhancement request with usecase at https://jazz.net/jazz/web/projects/Jazz%20Foundation#action=com.ibm.team.workitem.newWorkItem.
Knowing the time of the last committed resource, you can hold queries until indexing status has reached that point, or use conditional queries (https://jazz.net/wiki/bin/view/Main/JFSIndexStoreQueryAPI#Using_If_Modified_Since_Last_Mod) in the interim.