In 1970s, John Tukey produced a new definition for statistics; instead of calling it a pure mathematical science, he suggested that deriving hypotheses from data was the future. It was a reform of statistics and announcement of an as-yet unrecognized science. It has been called Data Science for a long time and it is influenced by computer science, mathematics, statistics as well as the applied sciences.

In this series of articles, I will cover the basic parts of statistics which are crucial for a data scientist and I will try to answer the following questions:

· What is statistics?

· Why should I learn statistics?

· How can statistics help me in my profession?

**Definitions and Concepts**

For the sake of formality, let us start with the Wikipedia definition for statistics:

”Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.”

There are two basic types of data sets, namely, a population and a sample.

A **population** is the collection of all outcomes, responses, measurements, or counts that are of interest.

Meanwhile, a **sample** is a subset of a population. Sample data are used to form conclusions about populations.

The figure below shows an example of systematic sampling, where the sample set is generated by picking specimens by following a certain pattern (not randomly).