- Failed insertion SQL statements are stored inside the file
file\txt\failed-insertion-sql.txt
. - Those weibos whose comments are failed to be retrieved are stored in the file
\file\txt\weiboid_2000_error.txt
. - Those weibos with more than 2000 comments are listed in file
\file\txt\weibo-ids-with-more-than-2000-comments.txt
. - Weibo API call details are stored inside file
weibo.log
. This file is in very detailed!
A summary log file is stored in file\txt\summary.txt
.
Before you kick start, you may want to run examples\weibo4j\wang\ResetLogFiles.java
to clear these log files:
- failed-insertion-sql.txt
- summary.txt
- weiboid_2000_error.txt
- weibo-ids-with-more-than-2000-comments.txt
- ./weibo.log
- ./weibo.log.1
Note: please empty these two log files before a refresh run.
In file \examples\weibo4j\wang\CommentExtractor.java
top lines:
public static int PAGE_SIZE = 200; // Number of comments to be retrieved every page
public static int START_INDEX = 0; // If last run you stop at 100, you can set it to be 100 for next run.
public static int PAUSE_SECONDS_COMMENTS = 45; // Time in seconds to pause between retrieving comments of two consecutive weibos
public static int PAUSE_SECONDS_PAGING = 10; // Time in seconds to pause between retrieving two pages of comments of the same weibo.
In the file src/config.properties
, fill in the common Weibo configurations:
baseURL=https://api.weibo.com/2/
accessTokenURL=https://api.weibo.com/oauth2/access_token
authorizeURL=https://api.weibo.com/oauth2/authorize
Whenever a "out of rate limit" error happens, the program will switch to another access code automatically. So, please provide a list of app key/client/redirect-uri values in file \file\txt\weibo-auth-info.txt
. This file should be like:
2208476072 9bd6e9b16fee38f9065a2ad9c4d79727 http://www.baidu.com
3722953941 47cf202ca308ad73869992e1b5230b60 http://www.baidu.com
Every line contains the information of some Weibo App. The three columns are App Key, App Secret and Redirect-URI respectively. NOTE: It is a SINGLE SPACE between the properties.
-
Open the project
weibo-api
using Eclipse -
Right click the file
examples\weibo4j\examples\oauth2\OAuth4Code.java
, chooseRun as Java Application...
. -
On the
Console
panel, the following line will be displayed:https://api.weibo.com/oauth2/authorize?client_id=3722953941&redirect_uri=http://www.baidu.com&response_type=code Hit enter when it's done.[Enter]:
-
Meanwhile, your browser will open a page automatically. Here you enter your Weibo credentials and assign the permission. Then submit.
-
The page will be redirected to
http://www.baidu.com/?code=873652da20697fdc137b6585e24cae05
. The copy the string aftercode
, i.e.,873652da20697fdc137b6585e24cae05
, paste to the Eclipse console and press enter key. -
An access token will be issued like:
AccessToken [accessToken=2.00oCIiNDN5IxDE78814a2afeBDSCHC, expireIn=157679999, refreshToken=,uid=2951008222]
-
If you provide multiple lines in the file
file\txt\weibo-auth-info.txt
, the process above will repeat for all lines.
The access tokens will be written to the file file\txt\access-code.txt
automatically.
Before you can grap the data, please configure your local database. Here we are using MySQL. Install MySQL and create a database name sina_weibo
. Then create a table comment
through the SQL statement defined in file \file\sql\create_Comment.sql
.
Important: in order to support Chinese characters, configure the collation to be
gbk - gbk_chinese_ci
. All columns inside the tables containing Chinese characters must be configured to begbk - gbk_chinese_ci
as well.
Inside the file examples\weibo4j\wang\db\LocalDbConnection.java
Line 45, change your database username and password.
- Right click file
\examples\weibo4j\wang\CommentExtractor.java
, chooseRun as Java Application...
. - On the Eclipse console it shows:
1 - Getting comments of ...
. Then it means the retrieval starts.
If your program stops due to various reason, you can run the program again from where it stops last time. Change the value of START_INDEX
at Line 17.