Data Exploratory in ML
Data Exploratory Single data point or single axis 1. Mean - influence by extreme value but good part it consider all of them equally 2. Median - not influence by extreme value however distribution is not consider it doesn't matter what is at extreme and start 3. Percentile - good to show data but multiple measure get involve 4. Standard deviation - consider all of them and distribution but hard to calculate bivariant/ two data point / two axis 1. Correlation - positive support each other, negative opposite of each other, no correlation doesn't impact Garbage in garbage out Overfit or underfit Insufficient data - simple model or data augmentation , synthetic data , use ensemble learning Transfer learning - similar model which is already available pre trained Data augmentation - change some data like change something like scaling , rotate image Synthetic data - artificial generated data Curse of dimensionality - too much...