Summer REU Project for predicting the likelihood of pull requests getting accepted, rejected, and reverted.
- Make sure there is a "Repos" folder in this repo.
- run file_analysis.py to start cloning pull requests to the "Repos" directory. It will prompt you to enter the Repo ID which you can find by going to https://api.github.com/repos/USERNAME/REPONAME. To mine through multiple repos, you can modify file_analysis's listofRepoID array with as many RepoIDs as you want:
listofRepoID = [] #Repo Example: 19148949
- Once file_analysis.py is running, there should be a loading bar that shows the script's progress.
- Once the script is finished, each pull should have either before, after, or both file verisons. There should also be a pull info.txt file for each pull.
- Be sure that TestChangeDistiller is has codemining-treelm and ChangeDistiller as maven dependencies. Also make sure there is a .ser file in [Entropy-model folder] (/Entropy-model) named after the repo. Due to the size of the .ser files, they are not available in this repo.
- Run AnalyzeWork's main from TestChangeDistiller. It will ask for two arguments (the repo path and whether you want to store changes). The repo path should be a directory of the repo. To store changes, enter true to add ChangeDistiller's output and entropy to each pull's info file.
- While the program runs, it should print change distiller information.
- To check if the program succeed, check any pull info.txt file for additional info.
- Make sure all repos in the Repos folder have been processed with AnalyzeWork.
- Run Format-ARFF-Files.py. This script will go into all info.txt files and format the info into a arff file.
- When the script it done, a file called Data-Complete.txt should appear in the Weka folder.
- Run Driver main from Weka-SimpleModel project to convert the txt file into a an ARFF file. If a problem occurs, it will print: "Problem found when reading.."