Machine Learning - Decision Trees and Random Forests for Classification and Regression - Python Example using Real Data

 I've written the following code as an example of an application for Decision Trees and Random Forests for Classification and Regression using Python code and libraries with a downloadable CSV with real data about US housing prices and characteristics.


The Use Case

You want to predict the price of a property using training data for the machine learning supervised algorithms of Random Forests and Classification and Regression. The principle is very simple, you have a set of historic data, this can be separated into training data and test data for our algorithms, and determine the accuracy of each algorithm to see which one is better for this sample data.

CSV File

The file and be downloaded from my repository here.

https://github.com/Markuspg1/machine-learning1

Loading CSV File Function

Here's a python function that will help you upload any csv file going forward.


Using the following code you can test this function, we use it now to upload the CSV into our df variable.


Use the following piece of code to see a full list of the columns, type and null count.


Data Analysis

Once you've uploaded the CSV file and confirmed it was successful, we continue by analyzing the data, finding those columns containing data that correlates the most with the result, and we use those columsn as training. Using all columns is also an option, but the algorithms will behave and be more accurate if the training data has a large percentage of correlation to let the algorithm take better decisions when learning.


Part 2 Coming Soon

Comentarios

Entradas populares de este blog

Instalación JFLEX y CUP

Expresiones Regulares