It's all about the answers!

Ask a question

Custom fields and the Jazz DW


Jeremy Hughes (2164837) | asked Oct 08 '08, 6:54 a.m.
Hi, is there any way to have custom fields 'data warehoused' so that
they can be used in reports? Sorry if I just haven't found the
instructions to do that.

Thanks,
Jeremy

One answer



permanent link
James Moody (3.3k24) | answered Oct 08 '08, 2:36 p.m.
JAZZ DEVELOPER
Jeremy Hughes wrote:
Hi, is there any way to have custom fields 'data warehoused' so that
they can be used in reports? Sorry if I just haven't found the
instructions to do that.

Thanks,
Jeremy

Hi Jeremy.

There are two ways I could imagine doing this. Here are some comments on
each:

1. We already aggregate and store a large amount of data about work
items in the data warehouse. You might want to simply add the custom
fields to this collected data. However, as you've probably noticed, it's
not possible to do this today. This is for a few reasons: (A) We don't
know about these fields. The collection code is written in Java and
knows only about the core attributes of a work item. (B) Collecting data
in a data warehouse isn't as simple as just specifying which fields to
store. Adding new fields (in this case, called "dimensions" in data
warehouse-speak) can have very serious space implications on the
existing tables; allowing our mechanism to be extensible in some way
would open the door to people being able to explode the size of the data
warehouse by many orders of magnitude. (C) The schema for the data
warehouse, once created, is not mutable. So we can't dynamically add new
fields to the database as new custom fields are created (what would
happen with the existing data?).

As a followon to (B) above, imagine a case where you want to add two
custom fields to our data collection. One represents a "Platform" field,
which has ten possible values (Windows, Linux, etc). Another represents
the type of ice cream that the person who fixes the bug gets (Vanilla,
Chocolate, etc). There are ten possible ice cream flavours. Adding these
two fields to our existing tables has the potential to increase the size
of the data warehouse (the work items states table, specifically) 100
times, or 2 orders of magnitude. That's because there are now 10 * 10 =
100 times more permutations of dimensions for each fact, and the states
table stores a row for each permutation (omitting those for which the
count is zero, of course). So this looks like a trivial thing to want to
do but would have a disastrous impact on the performance of both the
data collection and of the generation of reports.

Having said that...

2. Instead of piggy-backing on the existing tables, create some new
tables from scratch. The data warehouse is an extensible mechanism, and
you can define your own tables and write your own snapshot collection
code. It may turn out that your needs for the tables don't include much
of the other stuff that we collect, so you can include your custom
fields, and the other fields that you need, without consuming the known
universe. There are a few resources that might help you in this task:

- Our article on extending the data warehouse, found here:
https://jazz.net/wiki/bin/view/Main/DataWarehouseExtending
- The code in our work items snapshot, found in
com.ibm.team.datawarehouse.service.workitems. It's kind of hairy, but it
shows how we collect all work item changes, store them in the changes
table (where each row represents a single work item change) and then
from the changes table generate the states table (where each row
represents a point-in-time fact about the number of work items which
satisfy some permutation of criteria).

In addition, there's an excellent article on data warehousing in
general, which describes the facts and dimensions and discusses space
implications:

http://ciobriefings.com/Publications/WhitePapers/DesigningtheStarSchemaDatabase/tabid/101/Default.aspx

Hope this helps.

james
Jazz Reports Team Lead

Your answer


Register or to post your answer.