Skip to content

dadoonet/dropbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dropbox River for Elasticsearch (PROJECT STOPPED)

Welcome to the Dropbox River Plugin for Elasticsearch

This river plugin helps to index documents from your dropbox account.

WARNING: You need to have the Attachment Plugin.

Versions

Dropbox River Plugin ElasticSearch Attachment Plugin
master (0.2.0) 0.21.0.Beta1-SNAPSHOT 1.6.0
0.1.0 0.20.4 1.6.0

Build Status

Thanks to cloudbees for the build status : build status

Getting Started

Installation

Just type :

$ bin/plugin -install fr.pilato.elasticsearch.river/dropbox/0.1.0

This will do the job...

-> Installing fr.pilato.elasticsearch.river/dropbox/0.1.0...
Trying http://download.elasticsearch.org/fr.pilato.elasticsearch.river/dropbox/dropbox-0.1.0.zip...
Trying http://search.maven.org/remotecontent?filepath=fr/pilato/elasticsearch/river/dropbox/0.1.0/dropbox-0.1.0.zip...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/fr/pilato/elasticsearch/river/dropbox/0.1.0/dropbox-0.1.0.zip...
Downloading ......DONE
Installed dropbox

Get Dropbox credentials (token and secret)

First, you need to create your own application in Dropbox Developers.

If you create a Full Dropbox application, you will have access to all folders.

If you create a App folder application, you will only have access to your app folder files. You will get Dropbox HTTP Error 403 : {"error": "Forbidden"} errors when accessing to other folders.

Note your AppKey and your AppSecret.

You need then to get an Authorization from the user for this new Application.

Just open the _dropbox REST Endpoint with your AppKey and AppSecret parameters: http://localhost:9200/_dropbox/oauth/AppKey/AppSecret

$ curl http://localhost:9200/_dropbox/oauth/AppKey/AppSecret

You will get back a URL:

{
  "oauth_token":"OAUTHTOKEN",
  "oauth_secret":"OAUTHSECRET",
  "url" : "https://www.dropbox.com/1/oauth/authorize?oauth_token=OAUTHTOKEN"
}

Open the URL in your browser. You will be asked by Dropbox to Allow your application to access to your dropbox account. If you have added to the url a oauth_callback parameter, Dropbox will redirect your user to this end point.

For example, https://www.dropbox.com/1/oauth/authorize?oauth_token=OAUTHTOKEN&oauth_callback=http://yourwebserver/callback will redirect your user to http://yourwebserver/callback if your user allows your application to have an access to its Dropbox folders.

Once you get back the success reply from Dropbox, you can get the user Token and Secret by calling

$ curl http://localhost:9200/_dropbox/oauth/apptoken/appsecret/OAUTHTOKEN/OAUTHSECRET

You will get back a JSON document like the following:

{
  "token" : "yourtoken",
  "secret" : "yoursecret"
}

You will just have to use it when you will create the river (see below).

By the way, you can use the SettingUpDropboxTestsCases test class to get a token and a secret for your user.

Creating a Dropbox river

We create first an index to store our documents (optional):

$ curl -XPUT 'localhost:9200/mydocs/' -d '{}'

We create the river with the following properties :

  • AppKey: AAAAAAAAAAAAAAAA
  • AppSecret: BBBBBBBBBBBBBBBB
  • Token: XXXXXXXXXXXXXXXX
  • Secret: YYYYYYYYYYYYYYYY
  • Dropbox directory URL : /tmp
  • Update Rate : every 15 minutes (15 * 60 * 1000 = 900000 ms)
  • Get only docs like *.doc and *.pdf
  • Don't index resume*
$ curl -XPUT 'localhost:9200/_river/mydocs/_meta' -d '{
  "type": "dropbox",
  "dropbox": {
    "appkey": "AAAAAAAAAAAAAAAA",
    "appsecret": "BBBBBBBBBBBBBBBB",
    "token": "XXXXXXXXXXXXXXXX",
    "secret": "YYYYYYYYYYYYYYYY",
	"name": "My tmp dropbox dir",
	"url": "/tmp",
	"update_rate": 900000,
	"includes": "*.doc,*.pdf",
	"excludes": "resume"
  }
}'

Adding another Dropbox river

We add another river with the following properties :

  • AppKey: AAAAAAAAAAAAAAAA
  • AppSecret: BBBBBBBBBBBBBBBB
  • Token: 2XXXXXXXXXXXXXXX
  • Secret: 2YYYYYYYYYYYYYYY
  • Dropbox directory URL : /tmp2
  • Update Rate : every hour (60 * 60 * 1000 = 3600000 ms)
  • Get only docs like *.doc, *.xls and *.pdf

By the way, we define to index in the same index/type as the previous one:

  • index: docs
  • type: doc
$ curl -XPUT 'localhost:9200/_river/mynewriver/_meta' -d '{
  "type": "dropbox",
  "dropbox": {
    "appkey": "AAAAAAAAAAAAAAAA",
    "appsecret": "BBBBBBBBBBBBBBBB",
    "token": "2XXXXXXXXXXXXXXX",
    "secret": "2YYYYYYYYYYYYYYY",
	"name": "My tmp2 dropbox dir",
	"url": "/tmp2",
	"update_rate": 3600000,
	"includes": [ "*.doc" , "*.xls", "*.pdf" ]
  },
  "index": {
  	"index": "mydocs",
  	"type": "doc",
  	bulk_size: 50
  }
}'

Note that you can index for another Dropbox Application (appkey and appsecret may be different than the previous river).

Note that you can use the same credentials (appkey, appsecret, token, secret) as the previous river if you only want to index another directory for the same user.

Searching for docs

This is a common use case in elasticsearch, we want to search for something ;-)

$ curl -XGET http://localhost:9200/docs/doc/_search -d '{
  "query" : {
    "match" : {
        "_all" : "I am searching for something !"
    }
  }
}'

Advanced

Autogenerated mapping

When the Dropbox detect a new type, it creates automatically a mapping for this type.

{
  "doc" : {
    "properties" : {
      "file" : {
        "type" : "attachment",
        "path" : "full",
        "fields" : {
          "file" : {
            "type" : "string",
            "store" : "yes",
            "term_vector" : "with_positions_offsets"
          },
          "author" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string",
            "store" : "yes"
          },
          "name" : {
            "type" : "string"
          },
          "date" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "keywords" : {
            "type" : "string"
          },
          "content_type" : {
            "type" : "string"
          }
        }
      },
      "name" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "pathEncoded" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "postDate" : {
        "type" : "date",
        "format" : "dateOptionalTime"
      },
      "rootpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "virtualpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      }
    }
  }
}

Creating your own mapping (analyzers)

If you want to define your own mapping to set analyzers for example, you can push the mapping before starting the Dropbox River.

{
  "doc" : {
    "properties" : {
      "file" : {
        "type" : "attachment",
        "path" : "full",
        "fields" : {
          "file" : {
            "type" : "string",
            "store" : "yes",
            "term_vector" : "with_positions_offsets",
            "analyzer" : "french"
          },
          "author" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string",
            "store" : "yes"
          },
          "name" : {
            "type" : "string"
          },
          "date" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "keywords" : {
            "type" : "string"
          },
          "content_type" : {
            "type" : "string"
          }
        }
      },
      "name" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "pathEncoded" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "postDate" : {
        "type" : "date",
        "format" : "dateOptionalTime"
      },
      "rootpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      },
      "virtualpath" : {
        "type" : "string",
        "analyzer" : "keyword"
      }
    }
  }
}

To send mapping to Elasticsearch, refer to the Put Mapping API

Meta fields

Dropbox River creates some meta fields :

Field Description Example
name Original file name mydocument.pdf
pathEncoded BASE64 encoded file path (for internal use) 112aed83738239dbfe4485f024cd4ce1
postDate Indexing date 1312893360000
rootpath BASE64 encoded root path (for internal use) 112aed83738239dbfe4485f024cd4ce1
virtualpath Relative path mydir/otherdir

Advanced search

You can use meta fields to perform search on.

$ curl -XGET http://localhost:9200/docs/doc/_search -d '{
  "query" : {
    "term" : {
        "name" : "mydocument.pdf"
    }
  }
}'

Behind the scene

How it works ?

TO BE COMPLETED

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2011-2013 David Pilato

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

About

Dropbox River for Elasticsearch (PROJECT STOPPED)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages