Developed Features

This is a modified version of Apache Cassandra, based on a early version 0.6.8 (2011). It tried to make Cassandra more flexible and more available.
Please forgive me that it is renamed as "dastor" and the java packages are renamed as "com.bigdata.dastor...". There is no disrespect to it's mother Cassandra.

But finally, we think the architecture and design-principle from Cassandra/Dynamo is wrong for big-data storage and then give it up. Refer to this article: 深入研究Cassandra后重读Dynamo Paper -- it is a chinese article. And in fact, we finally prefer Google Bigtable and Apache HBase.

Please refers to following material for some detail.
DaStor/Cassandra Evaluation Report for CDR Storage & Query
Cassandra Compression

Developed Features

Admin Tools

Configuration improvement based on config-files
Script framework and scripts
Admin tools
CLI shell
WebAdmin
Ganglia, Jmxetric

Compression

New serialization format.
Support Gzip and LZO.

Bucket mapping and reclaim

Mapping plug-in
Reclaim command and mechanism.

Java Client API

Easy and Simple

Concurrent Compaction

From single thread to bucket- independent multi-threads.

Scalability

Easy to scale-out
More controllable

Benchmarks

Writes and Reads
Throughput and Latency

Bug fix

...

DaStor/Cassandra vs. Bigtable

Scalability: Bigtable has better scalability.
The scale of DaStor/Cassandra should be controlled carefully, and may affect services. It is a big trouble.
Bigtable's scalability is easy.

Data Distribution: Bigtable’s high-level partitioning/indexing scheme is more fine-grained, and so more effective.
DaStor/Cassandra's consistent hash partitioning scheme is too coarse-grained, and so we must cut up the bucket level partitions. But sometimes, it is not easy to trade-off on bigdata.

Indexing: Bigtable may need less memory to hold indexes.
Bigtable's indexes are more general and can be shared equally (均摊) by different users/rows, especially when data-skew.
There’s only one copy of indexes in Bigtable, even for multiple storage replications, since Bigtable use GFS layer for replication. (multiple copies of data, one copy of indexes)

Local Storage Engine: Bigtable provides better read performance, less disk seeks.
Bigtable vs. Cassandra ? InnoDB vs. MyISAM

In my opinion, Bigtable ’s architecture and data model make more sense.
The Cassandra project maybe a fault for big-data, and maybe a big fault to mix Dynamo and Bigtable. Cassandra is just a partial Dynamo and target to a wrong field - Big Data Storage. It is anamorphotic.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
conf		conf
crypt		crypt
ganglia		ganglia
interface/thrift		interface/thrift
lib		lib
src/java/com/bigdata/dastor		src/java/com/bigdata/dastor
webadmin		webadmin
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
build-crypt.xml		build-crypt.xml
build.xml		build.xml

schubertzhang/dastor

Folders and files

Latest commit

History

Repository files navigation

Developed Features

Admin Tools

Compression

Bucket mapping and reclaim

Java Client API

Concurrent Compaction

Scalability

Benchmarks

Bug fix

DaStor/Cassandra vs. Bigtable

About

Resources

Stars

Watchers

Forks

Languages