Skip to content

kurator-org/kurator-web

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kurator Web Application

Kurator-Web is a Play implementation of the web application front-end for kurator-Akka

Quickstart

The following quickstart describes the process of setting up a local instance of the web app for testing and demonstration of workflows.

Download the latest distribution zip file from the releases page on GitHub https://github.com/kurator-org/kurator-web/releases, and unzip:

unzip kurator-web-1.0.0.zip

Rename the template config file at conf/application.conf.example to application.conf. Edit the file and uncomment the line ``kurator.autoInstall = true` to enable the autoinstall.

cp conf/application.conf.example conf/application.conf
nano conf/application.conf

Run bin/kurator-web to perform the initial auto installation of jython. This will also automatically create the packages and workspace directories.

Obtain the latest release of the kurator-validation packages zip file from https://github.com/kurator-org/kurator-validation/releases. This file contains the python actors, workflow yaml and web application descriptors. Copy the downloaded zip file to the packages directory.

cp ~/Downloads/kurator-validation-1.0.0-packages.zip packages

Use the pip installer at jython/bin/pip to install any python dependencies (check documentation for python workflows).

Re-run bin/kurator-web to start the play server and auto unpack/deploy the workflows in packages. Once the server starts the web app should be accessible at http://localhost:9000/kurator-web/. Login with the default "admin" account using password "admin".

Building and Testing Kurator-Web

Follow these instructions to set up a build environment and run the application using the embedded Play server for development and testing purposes. This is the default method of deployment for the firuta.huh.harvard.edu test instance as well as the kurator.acis.ufl.edu production environments.

Prerequisites

Kurator-Web requires Java version 1.8 or higher. To determine the version of java installed on your computer use the -version option to the java command. For example,

$ java -version
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

See the following for instructions regarding the installation of Oracle Java 8 in Debian: http://www.webupd8.org/2014/03/how-to-install-oracle-java-8-in-debian.html

Or download and manually install via the Oracle website: http://www.oracle.com/technetwork/java/javase/downloads/index.html

Other development prerequisites include maven and git. If you do not currently have them installed you can use the following command.

sudo apt-get install git maven

Check that your maven is at least version 3.0.

$ mvn --version
Apache Maven 3.0.4
Maven home: /usr/share/maven
Java version: 1.8.0_73, vendor: Oracle Corporation
Java home: /usr/local/java/jdk1.8.0_73/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.2.0-6-686-pae", arch: "i386", family: "unix"

For production environments the default database used is MySQL. If MySQL is not already installed, install it now via apt-get:

sudo apt-get install mysql-client mysql-server

Create the kurator user and database with privileges in MySQL:

CREATE DATABASE kurator;
GRANT ALL PRIVILEGES ON kurator.* TO 'kurator'@'localhost' IDENTIFIED BY 'password';

Kurator-web support sending of email notifications upon new user registration and activation. In order to use this feature, the server hosting the web app will need an outgoing smtp server.

Debian has the exim4 mail server installed by default. To configure the mail server for kurator-web run:

dpkg-reconfigure exim4-config

When you reach the "Mail Server configuration" dialog, select "internet site" as the option. Next, when prompted to enter the FQDN or system mail name enter your domain (e.g. firuta.huh.harvard.edu, kurator.acis.ufl.edu, etc)

Since the web app will access smtp via local host only, the default list of ip addresses to listen on are sufficient if kurator-web is the only application requiring use of the mail server (127.0.0.1 : ::1)

For the rest of the configuration process deault values will work unless you wish to change them.

Test the mail server via:

mail -s "Test Subject" user@example.com < /dev/null

See https://www.digitalocean.com/community/tutorials/how-to-install-the-send-only-mail-server-exim-on-ubuntu-12-04 for more info

Kurator web requires a kurator home directory that will contain the workspace and Python packages. Create a kurator user via the useradd command and use this user's home directory:

sudo useradd -m -U kurator
sudo passwd kurator

The Python workflows that use the native actor require Python 2.7 and the pip installer. Additionally, the kurator native libraries provided by kurator-akka require the python-dev and r-base packages:

sudo apt-get install python python-pip python-dev r-base 

Python workflows that use the Jython actor require installation of Jython. Download the Jython 3.7.1.b3 installer jar and run the installer from the command line as root.

sudo java -jar jython-installer-2.7.1b3.jar

Select the standard installation when prompted (option 2) and when asked to provide the target directory enter "/opt/jython". This will install jython to "/opt/jython".

Log in as the kurator user created previously and create a directory for the projects and another directory for deployments in the user's home directoy:

cd /home/kurator
mkdir projects
mkdir deployments

Clone projects from GitHub

Login as the kurator user for the following steps:

Place the git clones of each project within the /home/kurator/projects directory (these can be elsewhere, instructions which follow about symbolic links assume the projects are inside the projects directory).

cd /home/kurator/projects 

Clone the kurator-akka project:

git clone https://github.com/kurator-org/kurator-akka.git

Clone prerequisites for ffdq and dq reports (optional, only if you want to do development with them):

git clone https://github.com/kurator-org/ffdq-api.git
git clone https://github.com/kurator-org/kurator-ffdq.git

and the event_date_qc and geo_ref_qc projects (optional, only if you want to do development with them):

git clone https://github.com/FilteredPush/event_date_qc.git
git clone https://github.com/FilteredPush/geo_ref_qc.git

Clone the kurator-validation and kurator-fp-validation projects containing the workflows:

git clone https://github.com/kurator-org/kurator-validation.git
git clone https://github.com/kurator-org/kurator-fp-validation.git

The kurator-fp-validation project also depends on the FP-CurationServices project:

 git clone https://github.com/FilteredPush/FP-KurationServices.git

Finally clone the web app project found in this repository:

git clone https://github.com/kurator-org/kurator-web.git

FFDQ and QC actor libraries

See NOTE below, these are available from Maven Central, and only need local builds if you are going to do development on them.

The projects that make up kurator and the set of workflows standard to the production deployments are shown below with links between them to indicate the dependency graph. Projects are listed from left to right in the build order.

The first projects to build are the ffdq library and api projects as well as the event_date_qc and geo_ref_qc projects that depend on ffdq:

ffdq-api --> kurator-ffdq --> event_date_qc
                              geo_ref_qc

Starting from the projects directory in the kurator user's home directory (/home/kurator/projects/), build these projects using maven install:

cd ffdq-api
mvn clean install

cd kurator-ffdq
mvn clean install

cd event_date_qc
mvn clean install

cd geo_ref_qc
mvn clean install

NOTE: the latest stable version of all of the projects above are also available via maven central and local clones of the projects are not required if working only on the other projects in a development environment (ie. kurator-akka, kurator-validation, kurator-fp-validation and kurator-web below). These dependencies and the qc libraries are downloaded automatically when running maven install on kurator-validation.

  1. https://mvnrepository.com/artifact/org.datakurator
  2. https://mvnrepository.com/artifact/org.filteredpush

Kurator-akka and workflows

Second is the kurator-akka top level project and the kurator-validation/kurator-fp-validation projects that contain the Python and Java actors

kurator-akka --> kurator-validation --> kurator-fp-validation

Starting from the projects directory (/home/kurator/projects/), build these projects using maven install:

cd kurator-akka
mvn clean install

cd kurator-validation
mvn clean install

The packages directory of the kurator-validation project contains all the Python actors and configuration that are currently deployed in production. In order to install the Python dependencies via pip, use the requirements.txt file provided in packages/kurator_dwca as an argument to pip:

pip install -r kurator-validation/packages/kurator_dwca/requirements.txt

In order to build the kurator-fp-validation workflows via maven install, first build the FP-CurationServices dependency followed by kurator-fp-validation (you may need to build FP-CurationServices skipping the tests, as some tests invoke network services and may fail):

cd FP-CurationServices
mvn clean install -DskipTests

cd kurator-fp-validation
mvn clean install

Configuration

Once you have successfully built the dependencies the next step is to configure the web app.

A template web application configuration file can be found at conf/application.conf.template. Make a copy of this file named application.conf in the same directory and edit to set the database and smtp server connection information:

cd kurator-web/conf
cp application.conf.example application.conf
vi application.conf

By default the play application is configured to use the embedded in memory H2 database. If you plan on using MySQL as the production database comment out the two lines in conf/application.conf that configure the h2 database and uncomment the lines for mysql configuration instead.

Set the values of db.default.user and db.default.password to the username and password used when creating the kurator database in MySQL.

Next configure the host and user for smtp (localhost, and the kurator user if exim4 was configured according to the prerequisites) in the mailer section of the config. These are the settings the web app will use when sending the notification emails.

Also set the kurator.email property to the user@hostname according to settings that your mail server is using (e.g. kurator@kurator1.acis.ufl.edu). This property is used as the sender email for notifications that the web app sends out.

The python.path property by default should point to the packages directory of kurator-validation or kurator-fp-validation created earlier via git clone (e.g. home/kurator/projects/kurator-validation/packages).

Lastly, if you would like the play server to accept connections from all hosts instead of just localhost, set the value of the http.address property to 0.0.0.0.

Build and Run

Once the web application is configured, build a distribution zip file via the included activator utility:

cd kurator-web/
bin/activator dist

Unzip the distribution archive to the deployments directory in /home/kurator and create a symbolic link in the deployment that points to the packages directory in the kurator-validation project to deploy workflows. By default, kurator-web expects to find the "packages" directory relative to the deployment root directory (e.g. /deployments/kurator-web/packages). This link ensures that when the kurator-validation project is updated via git pull, any updates to the python workflows are automatically redeployed. You will need to recreate this symbolic link any time you unzip a new kurator-web...zip file.

cd /home/kurator
unzip projects/kurator-web/target/universal/kurator-web-1.0.2.zip -d deployments
cd /home/kurator/deployments/kurator-web-1.0.2/
ln -s /home/kurator/projects/kurator-validation/packages 

NOTE: in order to update the Java workflows, which are contained in the kurator-validation jar file, rebuild and redeploy the web app via bin/activator dist by repeating the steps described above. Or, to redeploy, use the -u option on unzip (which will retain the symbolic link). ( unzip -u projects/kurator-web/target/universal/kurator-web-1.0.2.zip -d deployments ).

Create a symbolic link "kurator-web" for the current deployment:

cd /home/kurator/
ln -s /home/kurator/deployments/kurator-web-1.0.2 deployments/kurator-web

NOTE: The instructions which follow assume that your latest kurator-web-x.x.x deployment is found at the symbolic link /home/kurator/deployments/kurator-web. If you build a new kurator-web version higher than 1.0.2, you will need to update the kurator-web symbolic link as well as creating the packages symbolic link.

Run the play production server from the distribution directory unzipped within deployments (reference the kurator-validation jar you wish to use (if this jar doesn't exist, workflows will fail with a single line error message about not being able to find a class). Use:

cd deployments/kurator-web
bin/kurator-web -Dhttp.port=80 -Dkurator.jar=/home/kurator/projects/kurator-validation/target/kurator-validation-1.0.3-SNAPSHOT-jar-with-dependencies.jar

By default the Play server will listen on port 9000 however the -Dhttp.port used in the command above to set the port to 80 can be used to change the default. Open http://localhost/kurator-web/ in your browser after starting the server to test the web application. The -Dkurator.jar option is required and should point to a copy of the kurator-validation jar and is used by the command-line workflow runner in the web app to run workflows.

Systemd startup script

Create a unit file for the kurator web systemd service at /etc/systemd/system/kurator.service with the following contents:

[Unit]
After=network.target

[Service]
EnvironmentFile=/home/kurator/deployments/kurator-web/conf/env
MemoryLimit=8G
PIDFile=/home/kurator/deployments/kurator-web/RUNNING_PID
WorkingDirectory=/home/kurator/deployments/kurator-web
ExecStart=/home/kurator/deployments/kurator-web/bin/kurator-web -Dhttp.port=80 -Dkurator.jar=/home/kurator/projects/kurator-validation/target/kurator-validation-1.0.2-jar-with-dependencies.jar
Restart=on-failure
User=root
Group=kurator

# See http://serverfault.com/a/695863
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target

If using the same directories according to the config the defaults in the example above can be used. Otherwise replace the paths with the ones you are using for your deployment. By default the systemd script is configured to start the web app listening on port 80, the -Dhttp.port option in the command set as the value of ExecStart can be used to change the port.

Once you are done with this file, enable the service via:

systemctl enable kurator.service

Reboot the machine or start the service manually by using:

sudo systemctl start kurator