Ensemble MT-MrSBC Ensemble Multi-Type Multi Relational Structural Bayesian Classifier ----------------------------------------------------- INSTALLATION ----------------------------------------------------- Ensemble MT-MrSBC is written in Java and uses PostgreSQL 9.5 as DBMS. All you need to do is to download the system that is contained in "EnsembleMRSBC.zip" and unzip it into your favorite folder. Inside this package there are the following folders and files: - folds: a folder that contains the 10 folds used for the 10-fold cross validation. - db: a folder that contains all the database dumps used for the experiments. Some dump names contain the string "_small" or "_big". The "small" version is a sampled database used by RelWEKA and GNetMine algorithms that fail the execution on the full database ("big"). - config.properties: an example of configuration file for the execution of the system. - EnsembleMRSBC.jar: this is the runnable jar file of the system. - runExperiments.bat: this file launches the system on Windows using several configuration files in the same directory of the jar file. The other files: "classificationResults.dtd", "config.ini", "model.dtd", "postgres.xml" and "postgresOut.xml", are required by Mr-SBC system to perform the classification and they should be left as they are. ----------------------------------------------------- CONFIGURATION ----------------------------------------------------- Open "config.properties" file and edit the parameters in order to run henpc. You can also have several configuration files and run them in sequence. There are a lot of parameters, some of them must be left unchanged. Others can be changed and only some of them must be necessarily edited. Here you can find a list of parameters that must be changed. The meaning of each parameter is also explained in the config.properties file, but read the following guide if you are in trouble. List of parameters you must examine: - system: system to run (set to 2 if you want ST-MrSBC, set it to 3 if you want MT-MrSBC. No other possible values) - target: target to consider (used only for ST-MrSBC, ignored in MT-MrSBC). Uses an integer value that corresponds to the target table you want to select according to the parameters at the end of the file (it depends on the database used) - shuffleTargets: true if you want to run the system with lexicographic ordering or random ordering. - dbName: specify the database you want to use - kTuplesToDelete: Percentage of tuples to be deleted. For example, if you want to perform 20% sampling, this value must be set to 80. This parameter corresponds to (100 - perc) in the paper. - numberOfRuns: Number of executions of the ensemble learning for each target type. this correspond to the "z" parameter in the paper. - runsPerBlock: Specify the interval for which accuracy results must be showed. For example if you want to show the results for each iteration of the ensemble set it to 1; if you want to show results every 5 iteration, set it to 5; and so on. - nfold: number of K-fold Cross Validation Folds. If you use our folds, it must be set to 10. - selfTraining: specify if self-training should be performed. Leave this value set to 1 for ST-MrSBC. If you are running MT-MrSBC and you do not want to exploit previous prediction performed on the same current type set it to 0 (Default is 1 for both ST and MT versions). - prediction: If you have more than one target attribute for the same type choose if you want use predictions obtained in other target attributes of the same target table set it to 1. If you do not have multiple target attributes of the same type, set it to 0 (Default value is 0 for both systems). - existingFolds: Set it to 1 if you want to use our folds, otherwise set it to 0 if you want to generate new folds. - port: database port - databaseAddress: database address - dbUser: postgres username - dbPassword: postgres password All the other parameters after #DATASET PARAMETER section must be left unchanged Please do not change them. ----------------------------------------------------- EXECUTION ----------------------------------------------------- Move to the henpc folder and simply run the command: "java -jar EnsembleMRSBC.jar" We suggest you to add VM parameter -Xmx in order to exploit all your free RAM. For example if you have 32 GB of RAM and your system and processes are taking 2GB of RAM, run HENPC in this way: "java -jar EnsembleMRSBC.jar -Xmx30G" If you want to run the system with several configuration files (on Windows) use the file "runExperiments.bat". It use 8GB of RAM, feel free to edit it. ----------------------------------------------------- GATHERING RESULTS ----------------------------------------------------- You can find the results directly into the subfolder "output". That's all folks!