Data Exploratory in ML
Data Exploratory
Single data point or single axis
1. Mean - influence by extreme value but good part it consider all of them equally
2. Median - not influence by extreme value however distribution is not consider it doesn't matter what is at extreme and start
3. Percentile - good to show data but multiple measure get involve
4. Standard deviation - consider all of them and distribution but hard to calculate
bivariant/ two data point / two axis
1. Correlation - positive support each other, negative opposite of each other, no correlation doesn't impact
Garbage in garbage out
Overfit or underfit
Insufficient data - simple model or data augmentation , synthetic data , use ensemble learning
Transfer learning - similar model which is already available pre trained
Data augmentation - change some data like change something like scaling , rotate image
Synthetic data - artificial generated data
Curse of dimensionality - too much column
Outdated date - too much rows
Concept drift - due to outdated data . As things change with time
Comments
Post a Comment