Once you have defined your columns, you have to give them their proper

data type.


As of current version of nbt, 4 types are supported : string, number, boolean and date.


(Hint : If you have given some rows, so that nbt can extract information on the data, you may already have some types configured, but we encourage you to review them).




Data typing is very important for the analysis because it can allow for different behaviour in different algorithms.


Some rules of thumb you have to follow : 

  • Dates are almost never good predictors. When working with dates, you should convert them to numbers, by for instance, calculating the number of years / months.
  • Numbers should be used when something has a logical ordering (higher number means higher value of the property). An example where is correct to use a number is Salary. An example of where is not correct to use a number is Zip Code. (though Zip Codes are usually number is better to use them as stings so they can be grouped).
  • Booleans for true/false values
  • Strings for all the other types.


You can also reject columns, which means they will be sent with the data but they will not be considered in the analysis. Date columns should be rejected, and also columns which have the same value in all rows, as they don't provide any additional value.

(Hint : it is ok, to have rejected columns, there's no need to go to the data source and take them out of the model. It may happen that in the future this column will may different values, so you may leave it in your model, but just reject it, until it makes sense to include it).