BaseX over LMDB

A long time wish to make BaseX on disk base more robust.

BaseX Core was stripped to its bare essentials (not really, more can be removed to make it even skinnier).

A LmdbData, TableLmdbAccess and related builder an indexes were created to work on top of LMDB with lmdbjni.

The idea is to strengthen the BaseX store structure and later replicate it (with BookKeeper?, jgropus-raft?) for further high availability.

build

gradle clean install test

run

java -jar basex-lmdb.jar

from project basedir

simple usage

In a browser or with curl, issue a HTTP GET request to http://localhost:8080/doc('file://etc/books.xml')

Any xquery will work after http://localhost:8080/

XQuery details

The query string part of the URL will be interpreted as external variables to the XQuery context except for the following two:

content-type: what should be the resulting contents output type? default is "text/xml"

indent-content: if resulting content should be indented. default is "no". use "yes/no", "true/false".

All items below assumes you are in the project basedir:

create a collection named etc:

curl -X PUT 'http://localhost:8080/etc'

create a document named factbook inside etc collection:

curl --upload-file ./db/xml/etc/factbook.xml 'http://localhost:8080/etc/factbook'

create a document named lakes inside etc collection as the result of an xquery:

curl -X PUT 'http://localhost:8080/etc/lakes/<lakes>\{doc("etc/factbook")//lake\}</lakes>'

remove the document named factbook from the collection etc:

curl -X DELETE 'http://localhost:8080/etc/factbook'

remove the etc collection:

curl -X DELETE 'http://localhost:8080/etc'

some updates to factbook document:

curl -d 'rename node doc("etc/factbook")//lake[1] as "LAKE"' -X POST http://localhost:8080```

curl -d 'replace value of node doc("etc/factbook")//LAKE/@name with "Casper Sea"' -X POST http://localhost:8080```

curl -d "insert node <lake name='Lago da Paz'/> into doc('etc/factbook')/mondial" -X POST http://localhost:8080```

bigger things

If you want bigger examples, try db/xml/shakespeare.zip and db/xml/religion.zip from the base directory:

> cd db/xml 
> unzip shakespeare.zip curl -X PUT 'http://localhost:8080/shakespeare'
> cd shakespeare
> ls | while read F; do N=`echo $F | cut -d '.' -f 1`; curl --upload-file $F > "http://localhost:8080/shakespeare/$N" & done
> unzip religion.zip
> curl -X PUT 'http://localhost:8080/religion'
> cd ../religion
> ls | while read F; do N=`echo $F | cut -d '.' -f 1`; curl --upload-file $F "http://localhost:8080/religion/$N" & done

even bigger things

I think this is not yet the hardest for basex-lmdb but it is a feasible real world example at hand. download National Library of Medicine (ftp://ftp.nlm.nih.gov/nlmdata/sample/medline/) data and try it like the shakespeare example above. the biggest file there is over 150MB and has over 4.5 million XML nodes.

there's also XMark's Benchmark Data Generator if you want to get serious.

extra documentation

As stated by the title this is nothing less than BaseX itself, so any BaseX documentation regarding XQuery and modules (with some exceptions yet to be listed) can be used as is.

todo

considering kafka for replication. can I embed it?
optimize xquery updates by writing to a LSM based solution before writing to LMDB thus freeing the sync client faster. considering the idea is to (maybe) replicate by using jgropus-raft and once it uses LevelDB internally, would simply writing to the cluster do the trick?
create OS based maven profile for dealing with lmdbjni dependencies
need to port tests and improve LmdbDataManager tests
improve the return error codes in REST XQueryHandler
migrate XQueryHandler to a servlet and create a maven WAR packaged project
needs more documentation about configuration and running standalone or servlet
document the URI's used in fn:doc(): bxl://, file://, jdbc:// and related configurations where it fits
create new URI's accessed through fn:doc(): http:// with HtmlUnit and extras with commons VFS
replicate with jgropus-raft. Ideas?
assuming above replication is using raft and we have a good cluster, what about distributing XQuery queries amongst the cluster members for load balancing?
create a Camel component for basex-lmdb and use it as a solid integration database (in the end canonical messages passing by are all xml anyway... right?).

Name		Name	Last commit message	Last commit date
Latest commit History 10,059 Commits
db		db
etc		etc
src		src
www		www
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db

db

etc

etc

src

src

www

www

.gitignore

.gitignore

README.md

README.md

build.gradle

build.gradle

Repository files navigation

BaseX over LMDB

build

run

simple usage

XQuery details

All items below assumes you are in the project basedir:

create a collection named etc:

create a document named factbook inside etc collection:

create a document named lakes inside etc collection as the result of an xquery:

remove the document named factbook from the collection etc:

remove the etc collection:

some updates to factbook document:

bigger things

even bigger things

extra documentation

todo

About

Releases

Packages

Languages

mauricioscastro/basex-lmdb

Folders and files

Latest commit

History

Repository files navigation

BaseX over LMDB

build

run

simple usage

XQuery details

All items below assumes you are in the project basedir:

create a collection named etc:

create a document named factbook inside etc collection:

create a document named lakes inside etc collection as the result of an xquery:

remove the document named factbook from the collection etc:

remove the etc collection:

some updates to factbook document:

bigger things

even bigger things

extra documentation

todo

About

Resources

Stars

Watchers

Forks

Languages