Statistics Related Interview Questions Usually asked In Field of Data Science, Machine Learning, Deep Learning……….

Published in

Becoming Human: Artificial Intelligence Magazine

4 min readJul 27, 2020

How You are Going to Fix the Missing Values in a DataSet?

→ Your Approach Should be to check Whether the Features in the dataset is Following a Normal (Symmetric) Distribution Curve or Skewed.

if The Dataset is Skewed then we will replace it with the Median Value, because the Median is not affected by the Outliers.

if The Dataset is Following a Normal (Symmetric) Distribution then we can replace it with any of the terms (i.e. Mean, Median, Mode).

What is the Central Mode of Tendency?

→ Central Mode of Tendency is a Single term that represents the whole data. The most common Central Mode of Tendencies are Mean, Median, Mode.

What is Central Limit Theorem?

→ The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed.

What is Type I and Type II Error?

A type 1 error is also known as a false positive and occurs when a researcher incorrectly rejects a true null hypothesis.

A type II error is also known as a false negative and occurs when a researcher fails to reject a null hypothesis which is really false.

What is Inferential Statistics?

→ The process of using data analysis to deduce the properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates.

What is the Empirical Formula?

→ 68 Percentile of the Distribution is Covered between μ-σ and μ+σ.

→ 95 Percentile of the Distribution is Covered between μ-2σ and μ+2σ.

→ 99.7 Percentile of the Distribution is Covered between μ-3σ and μ+3σ.

What is Dispersion?

→ The measure of Variability describes the spread of the Dispersion of the Dataset.

What is Quartile?

→ The measure of Central Tendency that divides a group of data into 4 SubGroups.

What is the Inter-Quartile Range?

→ The difference between the first and the third Quartile is known as the Inter-Quartile Range. (Q3-Q1)

What is Feature Engineering?

→ Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.

What is Exploratory Data Analysis (EDA)?

→ Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

What is Principal Component Analysis?

→ the principal component analysis is a method to project data in a higher-dimensional space into a lower-dimensional space by maximizing the variance of each dimension.