Our goal in this project is to integrate information about companies with information about cities in which their headquarters are located. The resulting dataset could then be analyzed from a data science point of view in order to find relationships, i.e. how does the population in a city correlate with the size or other attributes of companies. In order to gather more information about companies, we first combine several datasets together, all of which are about companies but derived from different sources. We then integrate this result with the data about locations.
- data: Contains each of the raw datasets, the gold standards and the mapping/resolution/fusion results
- latex: .tex files for the project report
- lib: .jar files used as libraries in Java
- queries: Our queries used for collecting data from Freebase and DBpedia
- RapidMinerRepo: Contains the RapidMiner processes used to learn matching rules with a linear regression. Can be imported as a repository within RapidMiner
- src: Java source code
- usecase: Given sample files for a movie/actors use case
- Oliver Frendo
- Dandan Li
- Zehui Wang
- Yi-Ru Cheng