Social Networks Crawler

This application is intended to collect information about social graph of users of various social networks, namely Twitter, Instagram, Facebook and Foursquare.

Build

It is written in Java 8 using Apache Maven build system. To build it you should have them installed on your computer. Run mvn clean package in the project directory to assemble artifacts. Executable jars will appear in target directories in corresponding modules.

Configuration

Settings of the crawler are read from the crawler.properties file in the working directory. It should have the following format:

# Database
mongo.host=
mongo.port=
mongo.database=
mongo.username=
mongo.password=

# Twitter
twitter.auth_token=

# Foursquare
foursquare.client_id=
foursquare.client_secret=

# Instagram
instagram.access_token=

# Facebook
facebook.login=
facebook.password=

Foursquare and Instagram crawlers use APIs, so your application should be registered. Twitter crawler uses a request that is sent when a browser tries to load a subscriptions list. You can get auth_token for the config in your browser's cookies. And Facebook crawler tries to act like a browser, so you should specify your login and password in the config.

List of users that should be visited are read from an input file that is specified as a command-line argument. If it is not specified, then standard input stream is used. Users should be presented as a list of ids, one in a row. Twitter and Facebook crawlers require id to user name mapping. So, each line of the file should contain id and user name comma separated. -names option is used to show that input file is in this format.

Run command example:

java -jar twitter.jar -names userList.csv

Making your own crawler

The core module is designed to let you create your own modules collecting information from your favorite social network. To do that you should only implement the FriendsService interface and mark your class with the @Target annotation. Name of the result collection in MongoDB should be specified as an argument of this annotation. If your FriendsService needs id->names mapping, then it should have a constructor of NamesService class, or an empty constructor otherwise.

To start crawling run the main method of the ru.ifmo.ctd.mekhanikov.crawler.Runner class. It will find your FriendsService and start the process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

facebook

facebook

foursquare

foursquare

instagram

instagram

twitter

twitter

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Social Networks Crawler

Build

Configuration

Making your own crawler

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
core		core
facebook		facebook
foursquare		foursquare
instagram		instagram
twitter		twitter
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

dmekhanikov/social-crawler

Folders and files

Latest commit

History

Repository files navigation

Social Networks Crawler

Build

Configuration

Making your own crawler

About

Resources

Stars

Watchers

Forks

Languages