Statistics for Data Science

MANPREET KAUR
2 min readSep 6, 2022

--

Data science is a constantly evolving field and statistics is a huge part of it. They help us analyze the underlying patterns in data to make accurate decisions based on sound data.

What is Statistics?

Statistics is a branch of mathematics that provides methods for collecting, analyzing, interpreting, and presenting data. The goal of statistics is to understand the world by examining data.

Data can be “facts or pieces of information”.For example, the age of students {18,19,21,23,25}.With the help of statistics, we can calculate the average, median, mode of age, etc.

Why is Statistics important for Data Science?

Most Data Scientists always devote more to pre-processing of data. This field requires a good understanding of statistics. There are a few general steps that always need to be performed to process any data to get insights.

1. Identify the importance of features by using various statistical tests.

2. Finding the relationship between features to eliminate the possibility of duplicate features.

3. Converting the features into the required correct format.

4. Normalization and scaling of the data. This step also involves the identification of the distribution of data(eg. histograms) and the nature of the data.

5. Taking the data for further processing by using the required adjustments in the data.

6. After processing the data identify the right model.

7. Once the results are obtained the results are verified on the different accuracy measurement scales based on the problem(classification/regression).

Types of Statistics

There are two types of statistics that we can use for a given problem: descriptive and inferential.

1. Descriptive statistics: It consists of organizing and summarising the data using different kinds of plots such as histograms, pie charts, bar charts, boxplots, etc. It is used in exploratory data analysis and feature engineering. For instance, we have given the age and weight of university students. Using descriptive statistics, we can answer these two questions what is the average age of students in the classroom, and what kind of relationship is exist between the weight and height of the students?

2. Inferential statistics: This statistic consists of collecting sample data, and making conclusions about population data using some experiments (hypothesis testing). In the case of inferential statistics, we can answer this question are the average age of the students in a classroom less than/greater than/equal to the average age of the students in the university?

Following are the few key concepts that are required to pace up and understand the fundamentals of Statistics for Data Science such as probability, sampling & types, types of variables, regression/classification problems, etc.

thank you for coming here to read my article. Please leave a comment below, I’d love to know what you think.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

MANPREET KAUR
MANPREET KAUR

Written by MANPREET KAUR

Data Science and Analytics Professional

Responses (1)