Data Exploratory in ML

Data Exploratory

Single data point or single axis

1. Mean - influence by extreme value but good part it consider all of them equally

2. Median - not influence by extreme value however distribution is not consider it doesn't matter what is at extreme and start

3. Percentile - good to show data but multiple measure get involve

4. Standard deviation - consider all of them and distribution but hard to calculate

bivariant/ two data point / two axis

1. Correlation - positive support each other, negative opposite of each other, no correlation doesn't impact

Garbage in garbage out

Overfit or underfit

Insufficient data - simple model or data augmentation , synthetic data , use ensemble learning

Transfer learning - similar model which is already available pre trained

Data augmentation - change some data like change something like scaling , rotate image

Synthetic data - artificial generated data

Curse of dimensionality - too much column

Outdated date - too much rows

Concept drift - due to outdated data . As things change with time