Posts

Showing posts from January, 2021

Data Exploratory in ML

Data Exploratory  Single data point or single axis  1. Mean - influence by extreme value but good part it consider all of them equally 2. Median - not influence by extreme value however distribution is not consider it doesn't matter what is at extreme and start 3. Percentile - good to show data but multiple measure get involve  4. Standard deviation - consider all of them and distribution but hard to calculate  bivariant/ two data point / two axis 1. Correlation - positive support each other, negative opposite of each other, no correlation doesn't impact  Garbage in garbage out  Overfit or underfit  Insufficient data - simple model or data augmentation , synthetic data , use ensemble learning  Transfer learning - similar model which is already available pre trained  Data augmentation - change some data like change something like scaling , rotate image  Synthetic data - artificial generated data  Curse of dimensionality - too much...