Skip to content

pentaho/pentaho-data-refinery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pentaho Data Refinery

This project contains several PDI Job and Transformation steps for use in building and publishing analysis models. The job steps include Build Model and Publish Model. The transformation steps include Annotate Stream and Shared Dimension.

Build Model creates an analytic model and stores it in a variable called ${JobEntryBuildModel.Mondrian.Schema.Model Name} where Model Name is the name you specified in the step.

Publish Model uses the model and connection information generated by Build Model and publishes a Data Source to the selected BA Server

Annotate Stream allows you to instruct the Build Model step how to use a particular field when generating the model.

Shared Dimension allows you to specify a separate dimension table that can later be linked to the fact table.

Troubleshooting

Model Annotations When your published schema doesn't reflect the Model Annotations you have specified through Annotate Stream and Shared Dimension, first you should check your PDI job logs. Each annotation that is applied will either print a success message or a failure message. Failures in Annotate Stream will only prevent that particualr annotation from getting applied. Any failure in a Shared Dimension annotation will also cause the Link Dimension annotation to fail, which means the dimension will not be available in your model. For failed annotations, review the annotation properties to ensure they are correct and consistent with eachother. For example when specifying parent attributes, the parent must be in the same dimension and hierarchy. All the names are case sensitive.

Dealing with Auto Modeled schema elements Auto Modeling will create an Attribute for every field in your data source. Annotations that are specified for a given field will remove any elements created by the auto modeler. Auto modeled fields are identified by a dimension with a single hierarchy and a single level where the names of all three are equal. This means if you create a field annotation that has those same properties, it could be removed if you have a second annotation on that same field. For example, let's assume you have a field product_id. The auto modeler will create an attribute with Dimension, Hierarchy and Level all named Product ID. You want to keep the level, but also create a measure, Product Count, on the product_id field. In this case you will have to specify two annotations, one Create Attribute and one Create Measure. The Create Attribute annotation should be created after the Create Measure annotation because Create Measure would identify the attribute as being auto-modeled and remove it from the model.

Known Issue - Dimension with Multiple Hierarchies When using model annotations to create a dimension with multiple hierarchies, you should not name any of the hierarchies the same as the dimension name and you should not create an empty hierarchy name. There will be no errors, but your published model will be incorrect.

How to build

Pentaho Data Refinery uses the maven framework.

Pre-requisites for building the project:

  • Maven, version 3+
  • Java JDK 11
  • This settings.xml in your /.m2 directory

Building it

This is a maven project, and to build it use the following command

$ mvn clean install

Optionally you can specify -Drelease to trigger obfuscation and/or uglification (as needed)

Optionally you can specify -Dmaven.test.skip=true to skip the tests (even though you shouldn't as you know)

The build result will be a Pentaho package located in target.

Running the tests

Unit tests

This will run all unit tests in the project (and sub-modules). To run integration tests as well, see Integration Tests below.

$ mvn test

If you want to remote debug a single java unit test (default port is 5005):

$ cd core
$ mvn test -Dtest=<<YourTest>> -Dmaven.surefire.debug

Integration tests

In addition to the unit tests, there are integration tests that test cross-module operation. This will run the integration tests.

$ mvn verify -DrunITs

To run a single integration test:

$ mvn verify -DrunITs -Dit.test=<<YourIT>>

To run a single integration test in debug mode (for remote debugging in an IDE) on the default port of 5005:

$ mvn verify -DrunITs -Dit.test=<<YourIT>> -Dmaven.failsafe.debug

To skip test

$ mvn clean install -DskipTests

To get log as text file

$ mvn clean install test >log.txt

IntelliJ

  • Don't use IntelliJ's built-in maven. Make it use the same one you use from the commandline.
    • Project Preferences -> Build, Execution, Deployment -> Build Tools -> Maven ==> Maven home directory