Skip to content

markmcdowall/ensj-healthcheck

 
 

Repository files navigation

EnsEMBL HealthCheck
===================


REQUIREMENTS
============

1. Java 6 JDK (v1.6 or later - see http://java.com/en/download/); this is already
   installed on the Farm in /software/jdk1.6.0_14.  Make sure that your
   JAVA_HOME environment variable is pointing to the correct directory and that the
   *correct* Java executables are in your path; put something like the following in
   your .cshrc:

     setenv JAVA_HOME /software/jdk1.6.0_14
     setenv PATH ${JAVA_HOME}/bin:${PATH} 

   Note that if you get errors indicating that the java executable can't be found,
   check that $JAVA_HOME is set correctly by doing 

     which java

   and setting $JAVA_HOME to the directory in which bin/java resides.


INSTALLATION
============

1. Obtain the source files by checking out the ensj-healthcheck module from Git.
git clone https://github.com/Ensembl/ensj-healthcheck.git
Use the -r option to check out a specific tag if required.

2. cd ensj-healthcheck

3. Edit database.defaults.properties to contain values that correspond to the database
server which you want to connect to. You can also overwrite any values in the config file
on the command line.


RUNNING
=======

A number of shell scripts (with a .sh extension) are provided to aid in running
healthchecks. These are summarised below; the main one you will use is called
run-configurable-testrunner.sh; note that this script actually passes all of its
command-line options through to the ConfigurableTestRunner class.

 Usage: ./run-configurable-testrunner.sh -d my_db -h my_host -g healthcheck_group
 
 Options:
        [--conf -c value...]               : Name of one or many configuration files. Parameters in configuration files override each other. If a parameter is provided in more than one file, the first occurrence  is used.

        [--databaseURL value]              : Parameter used in org.ensembl.healthcheck.testcase.EnsTestCase

        [--dbtype value]                   : If set, this will be used as the type for all databases.

        [--driver value]                   : Parameter used in org.ensembl.healthcheck.testcase.EnsTestCase

        [--driver1 value]                  : Driver for server 1 (support for multiple staging servers in Ensembl)

        [--driver2 value]                  : Driver for server 2 (support for multiple staging servers in Ensembl)

        [--endSession value]               : Flag to run an empty testrunnerUsed to mark the end of a parallel run

        [--exclude_groups -G value...]     : Specify which groups of tests should not be run. Fully qualified class names can be used as well as their short names.

        [--exclude_tests -T value...]      : Specify which tests should not be run. Fully qualified class names can be used as well as their short names.

        [--file.separator value]           : Parameter used in org.ensembl.healthcheck.testcase.EnsTestCase

        [--funcgen_schema.file value]      : Parameter used only in and org.ensembl.healthcheck.testcase.funcgen.CompareFuncgenSchema

        [--help -h]                        : display help

        [--host -h value]                  : The host for the database server you wish to connect to.

        [--host1 value]                    : Host for server 1 (support for multiple staging servers in Ensembl)

        [--host2 value]                    : Host for server 2 (support for multiple staging servers in Ensembl)

        [--ignore.previous.checks value]   : Parameter used only in org.ensembl.healthcheck.testcase.generic.ComparePreviousVersionExonCoords, org.ensembl.healthcheck.testcase.generic.ComparePreviousVersionBase and org.ensembl.healthcheck.testcase.generic.GeneStatus

        [--include_groups -g value...]     : Specify which groups of tests should be run. Fully qualified class names can be used as well as their short names.

        [--include_tests -t value...]      : Specify which tests should be run. Fully qualified class names can be used as well as their short names.

        [--master.funcgen_schema value]    : Parameter used in org.ensembl.healthcheck.testcase.funcgen.CompareFuncgenSchema

        [--master.schema value]            : Parameter used only in org.ensembl.healthcheck.testcase.generic.CompareSchema, and org.ensembl.healthcheck.testcase.funcgen.CompareFuncgenSchema

        [--master.variation_schema value]  : Parameter used only in master.variation_schema

        [--output -o value]                : Specify the level of output that will be used. The allowed options are "All", "None", "Problem", "Current", "Warning" and "Info", .

        [--output.database value]          : The name of the database where the results of the healthchecks are written to, if the database reporter is used.

        [--output.driver value]            : The driver for the database where the results of the healthchecks are written to, if the database reporter is used.

        [--output.host value]              : The name of the database where the results of the healthchecks are written to, if the database reporter is used.

        [--output.password value]          : The password for the database where the results of the healthchecks are written to, if the database reporter is used.

        [--output.port value]              : The port of the database where the results of the healthchecks are written to, if the database reporter is used.

        [--output.release value]           : Gets written into the session table for describing the test session, if the database reporter is used.

        [--output.schemafile value]        : If output.database does not exist, it will be created automatically. This file should have the SQL commands to create the schema. Please remember that hashes (#) are not allowed to start comments in SQL. Use two dashes "--" at the beginning of a line instead. If the configuratble testrunner can't find this file from the current working directory, it will search for it in the classpath.

        [--output.user value]              : The user name for the database where the results of the healthchecks are written to, if the database reporter is used.

        [--password value]                 : Parameter used in org.ensembl.healthcheck.testcase.EnsTestCase

        [--password1 value]                : Password for server 1 (support for multiple staging servers in Ensembl)

        [--password2 value]                : Password for server 2 (support for multiple staging servers in Ensembl)

        [--perl value]                     : Parameter used only in org.ensembl.healthcheck.testcase.AbstractPerlBasedTestCase

        [--port -P value]                  : The port for the database server you wish to connect to.

        [--port1 value]                    : Port for server 1 (support for multiple staging servers in Ensembl)

        [--port2 value]                    : Port for server 2 (support for multiple staging servers in Ensembl)

        [--production.database value]      : The name of the Ensembl production database to use to retrieve division information. Assumed to be on the same server as the output databases.

        [--repair value]                   : Allow the tests to try to repair the database (if they can)

        [--reporterType -R value]          : Specify the reporter type that will be used. The allowed options are "Database" and "Text".

        [--schema.file value]              : Parameter used only in org.ensembl.healthcheck.testcase.generic.CompareSchema,

        [--secondary.database value]       : Some tests require a second database containing the previous release. This configures the database name for the second database server.

        [--secondary.driver value]         : Some tests require a second database containing the previous release. This configures the driver for the second database server.

        [--secondary.host value]           : Some tests require a second database containing the previous release. This configures the hostname of the second database server.

        [--secondary.password value]       : Some tests require a second database containing the previous release. This configures the password for the second database server.

        [--secondary.port value]           : Some tests require a second database containing the previous release. This configures the port of the second database server.

        [--secondary.user value]           : Some tests require a second database containing the previous release. This configures the user name for the second database server.

        [--sessionID value]                : The session to add these results forUsed in parallel run

        [--species value]                  : If set, this will be used as the species for all databases, overriding anything thename or meta table of the database may indicate.

        [--testRegistryType -r value]      : Specify the type of test registry that will be used. The allowed options are "Discoverybased" and "ConfigurationBased"

        [--test_databases -d value...]     : Name of databases that should be tested (e.g.: ensembl_compara_bacteria_5_58). If there is more than one database, separate with spaces. Any configured tests will be run on these databases. Does not support same format as output.databases!

        [--test_divisions -D value...]     : Names of division to which databases to test should belong e.g. EPl or EnsemblPlants. This option requires the production database to be set up.

        [--user value]                     : Parameter used in org.ensembl.healthcheck.testcase.EnsTestCase

        [--user.dir value]                 : Parameter used in org.ensembl.healthcheck.testcase.EnsTestCase

        [--user1 value]                    : User for server 1 (support for multiple staging servers in Ensembl)

        [--user2 value]                    : User for server 2 (support for multiple staging servers in Ensembl)

        [--variation_schema.file value]    : Parameter used only in org.ensembl.healthcheck.testcase.variation.CompareVariationSchema

Here is an example commandline : 

      ./run-configurable-testrunner.sh -d homo_sapiens_core_80_38 --dbtype core --species homo_sapiens --output problem -g PostGenebuild

Test Groups
-----------

It is possible to run a single test or a group of tests.
Groups of tests are defined in src/org/ensembl/healthcheck/testgroup
and represent sets of healthchecks that are usually run together

Other Utilities
---------------

Run each of these with the -h option to show usage.
   
  database-name-matcher.sh
  Shows which database names match a particular regular expression.
  
  compile-healthcheck.sh
  Only used if you've made changes to the source, e.g. when writing your own
  tests.

WRITING YOUR OWN TESTS
======================

If you want to write your own healthchecks, rather than running the pre-defined
ones, see the file README-writing-tests.txt.

About

Ensembl's Automated QC Framework

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 97.3%
  • Perl 2.1%
  • Other 0.6%