REUSE

Summer REU Project for predicting the likelihood of pull requests getting accepted, rejected, and reverted.

Collect Github Pull Requests

Make sure there is a "Repos" folder in this repo.
run file_analysis.py to start cloning pull requests to the "Repos" directory. It will prompt you to enter the Repo ID which you can find by going to https://api.github.com/repos/USERNAME/REPONAME. To mine through multiple repos, you can modify file_analysis's listofRepoID array with as many RepoIDs as you want:

listofRepoID = [] #Repo Example: 19148949

Once file_analysis.py is running, there should be a loading bar that shows the script's progress.
Once the script is finished, each pull should have either before, after, or both file verisons. There should also be a pull info.txt file for each pull.

Be sure that TestChangeDistiller is has codemining-treelm and ChangeDistiller as maven dependencies. Also make sure there is a .ser file in [Entropy-model folder] (/Entropy-model) named after the repo. Due to the size of the .ser files, they are not available in this repo.
Run AnalyzeWork's main from TestChangeDistiller. It will ask for two arguments (the repo path and whether you want to store changes). The repo path should be a directory of the repo. To store changes, enter true to add ChangeDistiller's output and entropy to each pull's info file.
While the program runs, it should print change distiller information.
To check if the program succeed, check any pull info.txt file for additional info.

Make sure all repos in the Repos folder have been processed with AnalyzeWork.
Run Format-ARFF-Files.py. This script will go into all info.txt files and format the info into a arff file.
When the script it done, a file called Data-Complete.txt should appear in the Weka folder.
Run Driver main from Weka-SimpleModel project to convert the txt file into a an ARFF file. If a problem occurs, it will print: "Problem found when reading.."

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
ARFF		ARFF
Entropy-model		Entropy-model
PyGithub-1.26.0		PyGithub-1.26.0
PyGithub-master		PyGithub-master
Repos		Repos
TestChangeDistiller		TestChangeDistiller
Weka		Weka
paperRepo		paperRepo
sealuzh-tools-changedistiller-d7ceec136e94		sealuzh-tools-changedistiller-d7ceec136e94
.gitignore		.gitignore
Ashley_Chen_Paper.tex		Ashley_Chen_Paper.tex
FixedFindBugsWarningsRuleTest.java		FixedFindBugsWarningsRuleTest.java
Format_Advance_ARFF.py		Format_Advance_ARFF.py
Format_Advance_ARFF.pyc		Format_Advance_ARFF.pyc
Format_Advance_ARFF_Driver.py		Format_Advance_ARFF_Driver.py
Format_Basic_ARFF.py		Format_Basic_ARFF.py
Format_Metrics_ARFF.py		Format_Metrics_ARFF.py
GitHub_API_Collect.py		GitHub_API_Collect.py
GitHub_API_Collect.pyc		GitHub_API_Collect.pyc
InvariantViolationReport.docx		InvariantViolationReport.docx
InvariantViolationReport.pdf		InvariantViolationReport.pdf
Jam.tex		Jam.tex
PyGithub-1.26.0.tar.gz		PyGithub-1.26.0.tar.gz
README.md		README.md
codemining-sequencelm.patch		codemining-sequencelm.patch
codemining-treelm.patch		codemining-treelm.patch
file_analysis.py		file_analysis.py
ntfsmac14_trial_e.dmg		ntfsmac14_trial_e.dmg
repoCollection		repoCollection
repoID.p		repoID.p
sample.xlsx		sample.xlsx
test.py		test.py