.. Silas documentation master file, created by sphinx-quickstart on Mon Nov 19 12:49:44 2018. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. _sec-basic-tutorial: Basic Tutorial ============== .. toctree:: :maxdepth: 2 :caption: Contents: In this tutorial, we will use the well-known flight dataset used in the benchmark available `on this github page `_. It includes flight records during years 2005 - 2006. The goal is to predict whether a flight is going to be delayed for more than 15 min. .. _sec-basic-tutorial-download-data: Download Dataset ---------------- Navigate to the directory where the Silas executable is located, let's call it *bin* for now. Create a new directory called *tutorial1* in *bin*. Download the `training data `_ of 1 million flight records and `testing data `_. Place them in *bin/tutorial1/data/* folder. .. _sec-basic-tutorial-gen-config: Generate Configuration Files for Machine Learning ------------------------------------------------- Go back to *bin*. Open a terminal from this directory, and run the following command:: silas gen-all -o tutorial1 tutorial1/data/train-1m.csv tutorial1/data/test.csv This command automatically generates all the required configuration files for Silas machine learning with default settings. It outputs the configuration files in *tutorial1* and sanitised the data files. Note that the configuration generator automatically chooses the last feature (column in the dataset) as the outcome feature (target), which is fine for this example. For other examples, you may want to check the "outcome_feature" in settings.json and make sure that the outcome feature is not listed in "selected_features". .. _sec-basic-tutorial-ml: Run Machine Learning -------------------- To build a predictive model using Silas machine learning, run the command:: silas learn -o model/flights tutorial1/settings.json This command will run machine learning using the parameters in *tutorial1/settings.json*, and store the predictive model in *model/flights*. It will also output some information about the performance of the predictive model against the testing data set. With the default parameters, you will probably get an ROC-AUC of 0.75+ (in other places it might be displayed as 75+). You may want to tune the parameters of the learner to improve results. To do so, open *tutorial1/settings.json* with your preferred text editor and change the value of *feature_proportion* from *"sqrt"* to *0.8*. Re-train a model using the above command once more: you should obtain an ROC-AUC of 0.76+. .. _sec-basic-tutorial-predict: Use Machine Learning To Perform Prediction ------------------------------------------ Now that we have a predictive model stored in *model/flights*, we can use it to predict the outcome of new data samples. Since we only have two data files at hand, let's just run prediction over the testing dataset:: silas predict -o tutorial1/predictions.csv model/flights/ tutorial1/data/clean-test.csv This command will output the predictions in *tutorial1/predictions.csv*. Each row in this file corresponds to a row in *test.csv*. The first column gives the outcome value, the second column gives the probability of that value. .. Indices and tables .. ================== .. * :ref:`genindex` .. * :ref:`modindex` .. * :ref:`search`