Big Data Basics: A Beginner's Tutorial

big data
data analysis
data processing
data mining
hadoop

This tutorial explains the fundamentals of Big Data. We’ll cover the definition and explore the basics you need to know.

Definition of Big Data:

According to Gartner in 2012, “Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”

In simpler terms, Big Data refers to the massive amounts of data being collected and stored from various sources like:

  • Web Data, E-commerce
  • Purchases at department or grocery stores
  • Bank/Credit Card Transactions
  • Social Networks

How Much Data Are We Talking About?

  • Google processes 20 PB (Petabytes) a day (Statistics: year 2008)
  • 1 PB = 101510^{15} Bytes = 1 million gigabytes = 1 thousand terabytes
  • Facebook had 2.5 PB of user data + 15 TB/day (Statistics: April 2009)
  • eBay had 6.5 PB of user data + 50 TB/day (Statistics: May 2009)

Three V of Big Data

Big Data Vectors

Big Data is often characterized by the “Three V’s”:

  • High-Volume: The sheer amount of data.
  • High-Velocity: The speed at which data is collected, acquired, generated, and processed.
  • High-Variety: The different types of data, such as:
    • Text, audio, video, image data, XML
    • Relational data (e.g., tables, transactions, legacy systems)
    • Graph data (semantic web, social networks)
    • Streaming data (data that can only be scanned once)

What Can Be Done with Big Data?

  • Aggregation and Statistics: Data warehousing and OLAP (Online Analytical Processing).
  • Indexing, Searching, and Querying:
    • Keyword-based search
    • Pattern matching (XML/RDF)
  • Knowledge Discovery:
    • Data Mining
    • Statistical Modeling

Hadoop was developed to handle the growing demands of Big Data.

Top Data Mining Companies in India

Explore the leading data mining companies in India, offering innovative solutions in data analysis, AI, and machine learning across diverse sectors. Learn about their services and expertise.

data mining
data analysis
india
Data Mining Tools: OmniViz and Aureka

Data Mining Tools: OmniViz and Aureka

Explore data mining tools like OmniViz and Aureka, their techniques (link analysis, predictive modeling), and their applications across industries for data-driven decisions.

data mining
data analysis
data tool
Data Mining Tutorial: Basics Explained

Data Mining Tutorial: Basics Explained

Learn the fundamentals of data mining, including its architecture, applications, and benefits. Understand the process and how it extracts valuable knowledge.

data mining
data analysis
machine learning