Data Exploratory in ML


Data Exploratory

 Single data point or single axis 

1. Mean - influence by extreme value but good part it consider all of them equally

2. Median - not influence by extreme value however distribution is not consider it doesn't matter what is at extreme and start

3. Percentile - good to show data but multiple measure get involve 

4. Standard deviation - consider all of them and distribution but hard to calculate 


bivariant/ two data point / two axis

1. Correlation - positive support each other, negative opposite of each other, no correlation doesn't impact 


Garbage in garbage out 

Overfit or underfit 

Insufficient data - simple model or data augmentation , synthetic data , use ensemble learning 

Transfer learning - similar model which is already available pre trained 

Data augmentation - change some data like change something like scaling , rotate image 

Synthetic data - artificial generated data 


Curse of dimensionality - too much column 

Outdated date - too much rows 

Concept drift - due to outdated data . As things change with time 











Comments

Popular posts from this blog

How to Run Anaconda3 Code from PowerShell ?

What is Flask?

What is MLflow?