Skip to content

JonathanJarrett/graphene

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Graphene: Search and Graph your data

Graphene is a high performance Java based web framework you can use to build a searching and graphing application on top of your data. It is datastore agnostic, but has some built in support for generating a graph database using Neo4J, from your non-graph datastore (RDBMS, Solr, etc).
For configurations, building, and deployment instructions, view the Graphene Wiki.

Using Graphene

The core of Graphene (this project) is to be used as a WAR overlay for your Java based web application. We will be providing two examples using the Kiva and Enron (limited) datasets. In the future we will be fleshing out a complete Maven archetype to get you going more quickly.

It is our goal that you should not have to modify this project in order to suit your individual needs (although we welcome ideas and suggestions). The intent is that this project be used as an underlying framework, and that your individual implementations can be wired within your code using IOC.

Building Graphene

Graphene is built using Apache Maven version 3.0.4 and a recent version of Java 7.

  • A plain 'mvn clean install' will build all the jar files and a single war file (to be overlaid on your project)
  • Test execution is part of the build, but you can add -DskipTests=true to cut down on the build time.
  • A BuildAll.bat is supplied for windows users. This will perform a few cleans to overcome some windows issues, and then compile and install to your local maven repo.

Graphene overview

  • Graphene expects that you are familiar with some modern Java concepts:
  • Interfaces and Implementations
  • Knowledge of Maven, or a knowledge of how to search for answers to your questions
  • Dependency Injection (aka Inversion of Control or IOC or DI)
    • Graphene uses [Apache Tapestry](http://tapestry.apache.org/) to provide the IOC framework. It is very similar to Guice, but also allows distributed configuration and can act as a light weight OSGI alternative. We may use the term 'wiring' and 'binding' interchangeably. Essentially a registry is created and lives throughout the life of your program, which defines which implementation services will get when they ask for the interface. The IOC 'wiring' is mostly done at the customer implementation level, although basic shared services are wired in modules within graphene-parent. For consistency's sake, any class that performs IOC wiring we suffix with the word "Module", i.e. AppModule.java or DAOModule.java.
  • Graphene currently requires you to implement an ExtJS UI, although the Kiva and Enron demos should be helpful in setting up the application for your own dataset.
    • Graphene is structured as a multi module maven project.

    The modules are

    • graphene-parent

      • graphene-analytics

        • Still under development. This is for precomputed or post ingest analytics
      • graphene-dao

        • Defines all DAO interfaces used under graphene-parent, as well as any business logic interfaces and implementations
      • graphene-dao-neo4j

        • Defines some DAO implementations for a standardized Neo4J property graph, and graph querying abilities using Cypher
      • graphene-dao-sql

        • Defines some DAO implementations for standardized SQL tables. These are not used unless you wire them in IOC
      • graphene-export

        • Defines utilities for converting internal lists into CSV and native Excel XLS files
      • graphene-hts

        • Under development. Defines utilities for entity extraction and resolution based on the nature and context of the data.
      • graphene-ingest

        • Defines some basic ingest utilities which are used only by other ingest modules.
      • graphene-introspect

        • Used during the ingest phase. Currently its main function is to run a series of queries against every table and every column, so you can get a feel for the bounds of your data and which columns are interesting.
      • graphene-memorydb

        • An in house memory database for property graphs, for small datasets and when a graph database is not available. An implementation of this is preloaded into memory when the application is started (which may take several minutes depending on how much you tell it to load)
      • graphene-model

        • This module contains generic model and view classes which are used by graphene's services.
        • Your DAO implementations translate your domain specific (and database specific) objects into the more generic objects defined in this module.
        • Your DAO implementations receive query objects (POJOs) defined in this module.
        • Many of the objects are generated use [Apache Avro](http://avro.apache.org/).
      • graphene-rest

        • This module defines the REST interfaces which will be exposed to the UI.
        • Your REST implementations will adhere to the interfaces defined here.
        • The REST interfaces control the paths of the resource, so the UI will not break because of bad paths.
        • This module uses Tynamo RESTEasy integration with the Apache Tapestry web framework.
      • graphene-search

        • General search utilities
      • graphene-util

        • Utilities which have cross cutting concerns for all modules. For example, query timing, logging, memory and file utilities.
        • Almost all modules require this module as a dependency.
      • graphene-web

        • This module defines some basic wiring and imports many other of the *Module.java classes from other graphene maven modules.
        • The web module also contains shared html, css and js resources used by ExtJs, Cytoscape and many other libraries.
        • It also contains Apache Tapestry based UI components and pages (currently limited)

    Developing with Graphene

    We recommend that your application use the Maven module structure, as shown in the Kiva and Enron demos. For example, if you have a company name or dataset name you are developing for, like IMDB, the structure would look as follows:

    • graphene-imdb ..*graphene-imdb-ingest (aka the ingest module) ..*graphene-imdb-web (aka the web module)

    The POM.xml at the IMDB level lists the Ingest and Web modules as children, so you can build both parts together.

    We recommend that the ingest module depend on parts from the web module (and not the other way around), so that code relating to ingest doesn't get deployed with your war.

    ###The injest module The ingest module has to do with ETL (Extract, Transform, Load) of your data into a more generic format.

    ###The web module

    If you ETL'd into an RDBMS

    The first thing you might want to do with the web module is to setup and run the DTOGeneration.java main(). Once it connects to your database, it will generate Java model objects that reflect your database tables, and query helping objects which will ensure you write valid SQL (invalid SQL or bad type conversions will be caught during compilation). This portion of the process uses QueryDSL to do code generation, which is normally tedious and error prone when done by hand.

    If you didn't ETL into an RDBMS

    You can skip the DTOGeneration, and go straight to implementing your DAOs

    DAO Implementation

    Graphene expects you to create implementations for most of the DAOs, following the interfaces provided in the graphene-dao module. This allows the storage mechanism you choose to be independent of the main services and UI. In the DAO implementations, you will mostly be querying your datastore and then converting the results into one of the model or view objects in the graphene-model module. (try saying that 10 times fast!)

    DAO Implementation

    Graphene expects you to create implementations for most of the DAOs, following the interfaces provided in the graphene-dao module.

    Running a Graphene application

    By default, Graphene expects some other software to be available (Although it is easy to change this or override the defaults)

    • Tomcat 7 or 8, with any JDBC on the class path (i.e. tomcat/lib or packaged with your app)
    • You should set the Catalina_Home environment variable
    • Graphene supports Chrome and Firefox, and does not require any browser plugins.

    Licensing

    Graphene is an open source project, containing dependencies on other open source projects. This project was funded by DARPA under part of the XDATA program.

    About

    No description, website, or topics provided.

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages

    • JavaScript 55.1%
    • Java 26.6%
    • CSS 18.1%
    • Other 0.2%