Skip to content

andrescanoutrera/toolbox

 
 

Repository files navigation

Scope

This toolbox offers a collection of scalable and parallel algorithms for inference and learning of hybrid Bayesian networks (BNs) from streaming data. For example, AMIDST provides parallel multi-core implementations of Bayesian parameter learning, using streaming variational Bayes and variational message passing. Additionally, AMIDST efficiently leverages existing functionalities and algorithms by interfacing to existing software tools such as Hugin and MOA. AMIDST is an open source Java toolbox released under the Apache Software License version 2.0.

The figure below shows a non-exhaustive taxonomy of relevant data mining tools dealing with probabilistic graphical models (PGMs) and data streams. To the best of our knowledge, existing software systems for PGMs only focus on mining stationary data sets, and hence, the main goal of AMIDST is to fill this gap and provide a significant contribution within the areas of PGMs and mining data streams.

Scalability

Scalability is a main concern for the AMIDST toolbox. Java 8 functional programming style is used to provide parallel implementations of the algorithms. If more computation capacity is needed to process data streams, AMIDST users can also use more CPU cores. As an example, the following figure shows how the data processing capacity of our toolbox increases given the number of CPU cores when learning an hybrid BN model (including a class variable C, two latent variables (dashed nodes), multinomial (blue nodes) and Gaussian (green nodes) observable variables) using the AMIDST's learning engine. As can be seen, using our variational learning engine, AMIDST toolbox is able to process data in the order of gigabytes (GB) per hour depending on the number of available CPU cores with large and complex PGMs with latent variables. Note that, these experiments were carried out on a Ubuntu Linux server with a x86_64 architecture and 32 cores. The size of the processed data set was measured according to the Weka's ARFF format.

Documentation

  • Getting Started! explains how to install the AMIDST toolbox, how this toolbox make use of Java 8 new functional style programming features, and why it is based on a module based architecture.

  • Toolbox Functionalities describes the main functionalities (i.e., data streams, PGMs, learning and inference engines, etc.) of the AMIDST toolbox.

  • Code Examples includes a list of source code examples explaining how to use some functionalities of the AMIDST toolbox.

  • API JavaDoc of the AMIDST toolbox.

Contributing to AMIDST

AMIDST is an open source toolbox and the end-users are encouraged to upload their contributions (which may include basic contributions, major extensions, and/or use-cases) following the indications given in this link.

Publications & Use-Cases

The following repository https://github.com/amidst/toolbox-usecases contains the source code and details about the publications and use-cases using the AMIDST toolbox.

Upcoming Developments

The AMIDST toolbox is an expanding project and upcoming developments include for instance the implementation of dynamic models for handling data streams, the integration of the toolbox in Big Data platforms like Spark and Flink to enlarge its scalability capacities, and a new link to R to expand the AMIDST user-base.

About

Analysis of data streams using expressive and flexible Bayesian networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 99.8%
  • Other 0.2%