Setting up totals
Inside the
Dimension Info
step you'll find a mention to the total:
/* SET THE APPROXIMATE TOTAL FOR THE VALUES */
var total = 5000;
Setting up dimensions
This is done within the same file you setup the dimensions. You can have as much as you want, but you must pay attention to the fact that if you use lots of dimensions / high cardinality we can quickly end up with a huge database.
While there's nothing particularly wrong with that, it's up to you doing specific optimizations like indexes or even aggregate tables. That's outside the scope of CDG.
Here's a sample dimension definition:
var countries = [
{countryName: "Italianos", proportion: 30},
{countryName: "Portugueses", proportion: 18},
{countryName: "Alemães", proportion: 12},
{countryName: "Espanhóis", proportion: 10},
{countryName: "Japoneses", proportion: 15},
{countryName: "Coreanos", proportion: 8},
{countryName: "Chineses", proportion: 3},
{countryName: "Outros", proportion: 5}
];
The sample provided in CDG is written in Portuguese, in order to specifically test character encoding support. The generated files are encoded in UTF-8 and we always recommend using this standard in databases too.
By defining this object, CDG will create a dimension with 8 members and one level called
countryName
. You can have other properties in there and CDG will create a Mondrian schema with different levels. The provided example has only one.
There's a special property in there called
proportion
- it will be used by CDG to do the breakdown of the total. In the example, roughly 30% of the total will be assigned to Italians and so on. In every aspects of the code there's a random factor in place.
Configuring Date Dimension
The date dimension is always a specific case, since it usually acts as a snapshot dimension.
Since configuring all possible members of this dimension would take too much time, we provide an utility function that generates all the dates between 2000 and 2012 down to the month.
This is standard JavaScript, so feel free to change this function in order to change the date range, month names or even adding the day level (be aware that adding the day level will substantially increase the number of values in the fact table).
/* CONFIGURE THE DATE DIMENSION. */
var dateDim = [];
var months = [
[1,"Jan","Janeiro"], [2,"Fev","Fevereiro"], [3,"Mar","Março"], [4,"Abr","Abril"],
[5,"Mai","Maio"], [6,"Jun","Junho"], [7,"Jul","Julho"], [8,"Ago","Agosto"],
[9,"Set","Setembro"], [10,"Out","Outubro"], [11,"Nov","Novembro"],
[12,"Dec","Dezembro"]
];
range(2000,2012).map(function(year){
range(0,12).map(function(month){
var m = months[month];
dateDim.push(
{"year":year ,"monthNo": m[0], "monthAbbrev":m[1], "monthDesc": m[2]}
);
});
});
;
Final Configuration
In the end of the script there's the final configuration that will be used by CDG:
/* MAKE THE FINAL CONFIGURATION. DIMENSIONS CAN EITHER BE SNAPSHOT OR REGULAR BREAKDOWNS */
var outputArray = [
{name: "Date", dimension: dateDim, toBreakdown: false, increment: 0.05 },
{name: "Provices", dimension: provinces, toBreakdown: true },
{name: "Countries", dimension: countries, toBreakdown: true },
{name: "Dates", dimension: gender, toBreakdown: true }
]
There we define the names and types of the dimension. The property
toBreakdown
should be true for normal dimensions and false for snapshot dimensions.
If it's a snapshot dimension, you need to specify the increment property. The value 0.05 basically means that we'll have roughly 5% increase each month.
After that, you can run the transformation.
Output
After running the transformation, we get this output:
- A .ddl file to create the database and the table;
- A file with SQL inserts to populate the Data Warehouse;
- A Mondrian schema file to use within Mondrian or to import to Pentaho.
cdg/kettle/output
├── cdgsample.ddl
├── cdgsample.mondrian.xml
└── cdgsample.sql
Result
The result, after declaring this new datasource and registering the cube in Mondrian, is a new cube free to use.