1.Input and Explore
dataset:df{X,y(target)
upload files:
upload files
predict:X
save w/id
select as target:
target
click to populate!
Parse Field
select as time:
time
Parse Data
Time/groupby
Histogram
Correl_all
Correl_target
Coeff_target
2.Preprocess
Drop row with nan?
Clear Outliers with IQR?
Drop row if condition!
UnderSampling(0 more!)
Create field:X.eval("fld1=...;...");no target!
Drop Field
Encode Field
time to serials(YMW)
DropNonNumCol_fillna
Normalization?
Stop training to show X
3.train/validate split
size(04magic):
random_state:
shuffle
===model parameters===
Tree:
Depth:
alpha:
gamma:
C:
degree
4.Model Selection
DecisionTreeClassifier
RandomForestClassifier(independent)
RandomForestRegressor
GradientBoostingClassifier(tree by tree)
GradientBoostingRegressor
XGBClassifier(derivatives)
XGBRegressor
LinearRegressor
wt regul:
Lasso(L1)
Ridge(L2)
none
LogisticReg(class)
SVM.SVC(class)
SVM.SVR(regress)
Linear
Ploy
RBF
5.Error/Accuracy
MAE:
MSE:
R2_score:
recall:
accuracy:
Predicted=[0]
feature_importances:
6.y_pred vs y_test
model save?
model load?
img here