dat

Dat is a version-controlled, decentralized data sync tool designed to improve collaboration between data people and data systems - github

YOUTUBE AKpJgNoT1b8 Dat + Federated Wiki

The dat module is designed with a small-core philosophy. It defines an API for reading, writing and syncing datasets and is implemented using Node.js.

# Future

This release is the first step towards our goal of creating a streaming interface between every database or file storage backend in the world. We are trying to solve hard problems the right way. This is a process that takes a lot of time. github

In the future we would also like to work on a way to easily host and share datasets online. We envision a sort of data package registry, similar to npmjs.org, but designed with datasets in mind. This kind of project could also eventually turn into a sort of "GitHub for data".

We also want to hook dat up to P2P networks, so that we can make downloads faster but also so that datasets become more permanent. Dat advisor Juan Benet is now working on IPFS, which we are excited to hook up to dat when it is ready.

Certain datasets are simply too large to share, so we also expect to work on a distributed computation layer on top of dat in the future (similar to the ESGF project).

# Installation

Internally dat has two kinds of data storage: tabular and blob. The default tabular data store is LevelDB and the default blob store stores files on a content-addressable blob store. Both of these default backends can be swapped out for other backends.

npm install dat -g

$ cd /test $ dat init Initialized a new dat at /test/.dat

Show current status, including row count, file count, last updated:

$ dat status Current version is now 8eaf3b0739d32849687a544efae8487b5b05df52 (latest) 2 datasets, 438 keys, 32 files, 3 versions, 143 Mb total Last updated 3 seconds ago (Tue Jun 02 2015 13:46:54 GMT-0700 (PDT))