Data: The New Oil of the 21st Century

Tej Narayan
Analytics Vidhya
Published in
8 min readJul 13, 2020

--

“You can have data without information, but you cannot have information without data.”

– Daniel Keys Moran,

Source: Unsplash by Grant Durr

Data is the collection of facts, figures, and information that can be processed by computers to perform various tasks. Data can be numerical, textual, audio, visual, or any other form that can be stored and analyzed. Data is important because it is the source of knowledge, insight, and innovation for individuals, organizations, and society.

Data has become the new oil of the 21st century because it has immense value and potential for generating wealth, improving lives, and solving problems. Data is the fuel that powers the digital economy and the engine of artificial intelligence. Data is also the raw material that can be refined, transformed, and used for various purposes, such as decision making, prediction, optimization, and personalization.

Data is a necessity for most of the companies across the world because it gives them a competitive edge and helps them understand their customers, markets, products, and processes better. Data enables companies to create value by offering better products and services, enhancing customer experience and loyalty, increasing efficiency and productivity, and reducing costs and risks.

There are millions of data scientists, researchers, statisticians, computer scientists, and other professionals who work with data every day to discover what is hidden in it and how to make it speak and serve. They use various methods and tools to collect, clean, analyze, visualize, and interpret data, and apply various statistical and machine learning algorithms to extract insights and patterns from it. They use data to create solutions, innovations, and opportunities for various domains and sectors, such as health, education, business, finance, agriculture, energy, environment, security, and entertainment.

Companies like Amazon, Google, Walmart, Alibaba, and many others base their business on data collected from their consumers. They use data to analyze the behavior, preferences, lifestyles, and habits of their customers, and to offer them personalized recommendations, offers, and services. They use data to optimize their operations, logistics, inventory, pricing, and marketing. They use data to enhance their innovation, creativity, and growth. They use data to create value for themselves and their customers.

Data is also the key to unlocking many mysteries and challenges of the world and humanity. Data can help us understand the past, present, and future of our planet, our society, our culture, and ourselves. Data can help us find answers to questions, such as how the universe began, how life evolved, how the brain works, how diseases can be cured, how climate change can be mitigated, how poverty can be eradicated, and how peace can be achieved.

Data is a powerful and precious resource that can be used for good or evil, for benefit or harm, for progress or regression. Data is a responsibility and a right that requires careful and ethical handling, protection, and governance. Data is a gift and a challenge that demands curiosity, creativity, and collaboration.

Data is the new oil of the 21st century, and we are the explorers, the refiners, the users, and the guardians of it.

“In God we trust, all others bring data.” — W Edwards Deming

As per Oxford dictionary Data is “Facts and statistics collected together for reference or analysis”.

Statistics is not a new concept; it can be traced back to Germany in the 18th century, around 1749. Since then, it has undergone many transformations, from being used mainly for demographic purposes, such as counting the population, to encompassing all kinds of data collection and analysis. Statistics is now a powerful tool for making sense of the vast and complex information that surrounds us.

Statistics is the study of how to learn from data, which are pieces of information that can be measured or described, often with numbers. Data can come from observing one or more people or things and their qualities or quantities.

Based on these data and its analysis a new science is being coined we call it Data Science with millions of engineers working now as Data Scientists or Statisticians. And now it’s the sexiest job of 21st Century as stated by Harvard Business Review. Data science is an interdisciplinary field focused on extracting knowledge from data sets i.e. to understand and analyze actual phenomena” with data or we can say making sense of data. The credit goes to John W Tukey who first time wrote about data analysis in his book “The Future of Data Analysis” in 1962.

How these data are being generated, here are few supporting facts about it, data is being used in every sector of business, such as — social media, e-commerce, banking, government, entertainment etc. some of the awe-inspiring facts:

1. Google processes over 1.2 trillion searches per year, 3.5 billion searches per day i.e. 40,000 search queries are performed per second.

2. On YouTube/Facebook, users send 31.25 million messages and watch 2.77 million videos each minute.

3. On WhatsApp, over 55 billion messages and 4.5 billion photos are sent each day.

4. Retail giants like Walmart handles more than 1 million customer transactions every hour

5. As per forecast by 2025, the volume of digital data will increase to 163 zettabytes

Let’s explore more details about data, in statistics, there are mostly five data measurement scales:

Nominal, Ordinal, Interval, Ratio and Cardinal.

Nominal: Nominal is from the Latin nomalis, which means “pertaining to names”. It’s another name for a category. For e.g. Gender: male, female, hair Color: brown, black, blue, green etc.

Ordinal: means in order. Includes “First,” “second” and “ninety ninth.” For e.g. Class ranking: 1st, 2nd, 90th…, Socioeconomic status: poor, middle class, rich, super rich.

Interval: has values of equal intervals that mean something. For e.g., a thermometer might have intervals of ten degrees, like Celsius or Fahrenheit temperature.

Ratio: same as the interval scale except that the zero on the scale means: does not exist. For e.g. age, weight, height, sales figures, years of education.

Cardinal number, sometimes called a “counting number,” is used for counting, like when we count 1,2,3… We use these numbers to answer the questions like “how many?”

However, at the highest level, two kinds of data exist:

Quantitative and Qualitative.

Quantitative data deals with numbers and things we can measure, dimensions such as height, width, and length. Temperature and humidity. Prices. Area and volume. There are two types of quantitative data: continuous and discrete. Commonly, counts are discrete and measurements are continuous.

§ Continuous data, could be divided and reduced to finer and finer levels. For example, we can measure the height of a kids at progressively more precise scales — meters, centimeters and beyond — so height is continuous data.

§ Discrete data is a count that can’t be made more precise. It involves integers. For e.g., the number of children (or adults, or pets) in a family is discrete data, because we are counting whole, indivisible entities: we can’t have 2.5 kids, or 1.3 pets.

Qualitative data deals with characteristics and descriptors that can’t be easily measured, but can be observed subjectively — such as smells, tastes, textures, attractiveness, and color.

“Torture the data, and it will confess to anything.” — Ronald Coase

Source: gettyimages.co.uk

To illustrate the value and potential of data, we will examine a simple dataset that contains information about the Nobel prize winners from 1901 to 2016. This dataset was obtained from Kaggle, a platform for data science and machine learning. The dataset has 979 rows, each representing a Nobel laureate, and various attributes, such as name, country, category, year, and motivation.

By exploring this dataset, we can learn many interesting facts and stories about the Nobel prize and its recipients, such as the youngest and oldest winners, the most common categories and countries, the gender gap, and the controversies and scandals.

Summary of dataset

Top 6 countries with maximum number of Nobel winners, US dominate the list followed by UK and Germany.

If we look at the city wise details, New York contributed most followed by Paris and London:

The result shows 29% winners from US followed by UK, Germany and France:

Comparison of Sex and Category by Year
Category wise winners
Winners multiple times
Top 15 Organizations with maximum number of Laureates

University of California dominates followed by Harvard and MIT.

Motivations for Laureates

That’s very interesting facts as top five motivations are common for more than 6 winners.

If we look at the life span of Nobel laureates three winners touched 103 and eight of them crossed or touched 100 mark.

Maximum age — Winners
Cities where maximum winners died

Another interesting fact shows that in their old age Nobel laureates prefer to live in Paris followed by Cambridge and London.

These are few exploration results, although we can find more, The complete analysis report kept at here. (https://www.kaggle.com/tyadav/nobel-laureate-eda/notebook)

Some of the possible applications of data are:

· Helping computers act intelligently

· Solve Medical mysteries

· Help in making better decisions

· Help in finding complex Business Solutions

· Powering tomorrow’s innovations.

Conclusion:

Data is a vital asset for organizations and governments, as it can reveal insights, patterns, and trends that can help solve various problems and improve lives. However, data alone is not enough; it needs to be processed, analyzed, and communicated in a meaningful way. This document summarizes what data is, how it can be used, and how to make it speak and benefit from it.

(Note: As per available datasets at Kaggle records available from 1901 to 2016, also cannot comment on accuracy of data, this is just for exploration and visualization of data only for learning purposes.)

Thanks for reading, please share your feedback at tej_on@outlook.com.

--

--

Tej Narayan
Analytics Vidhya

Data Scientist, Passion writing, Data Visualization, Story telling.