Big data: The power of information and the data around us

For some time the term Big Data has been gaining notoriety and relevance until this technology has become one of the disruptive technological paradigms of this century.
Date 21/10/2016
Category Big Data
Big Data refers to the storage of large amounts of data and the procedures used to find repetitive patterns within that data.

We first heard this term in 2004 when two Google engineers published an article entitled "MapReduce: Simplified Data Processing on Large Clusters".

In it they described a programming model that responded to Google's own needs and that allowed simplifying the processing of large volumes of data and what they called MapReduce (we will talk about it in future posts).

Doing a bit of history the first process that we could qualify as Big Data was born to make statistics and in it were used punch cards created by operators.

Over time, users began to interact directly with the machines, they were no longer operators. Nothing was more important to the IT departments than attending to the requests of those users, who demanded quick answers in the treatment and use of those data. The information was growing exponentially and becoming more accessible to users who demanded a large number of reports.

The first consultation tools appeared, the user could generate his own reports, concepts such as Datawarehouse, Datamars or Business Analytics began to be handled. Big Data had begun to be born, a model that has come to stay and with which we will have to familiarize ourselves and know where and how it is present.

The importance of categorizing data

It is very important to classify the data when we are going to work with large volumes of information. Two of the most used categories in Big Data are those related to data structure and data origin.

As far as structure is concerned, data types are usually organized into two categories: unstructured data and structured data.

The management of unstructured data has become one of the main challenges faced by companies in terms of information management and Big Data. Data that is not stored in a traditional database and its growth rate is much higher than that of structured data.

We speak of structured data when its length and format are well defined. They come to represent 20% of the data that a company manages.

As far as the origin is very diverse, we are talking about data generated in social networks, data from email, documents such as Word, Excel, Power Point, etc.. For its treatment, it is necessary to use specific architectures where scalability is an essential feature due to the growing processing and storage needs.

There is no single criterion for classifying the origin of the data, but we can think of these groups:

  • Internet: click information, search engines, web content and information from different social networks (Twitter, Facebook, Linkedin, ...).
  • Machine to Machine (M2M): Communication between machines, we talk about RFID data, GPS, sensors that capture temperature, light, height, pressure, sound, and so on.
  • Biometrics, facial recognition and genetic information.
  • Created by the human being: We talk about medical reports, recordings, emails, etc.

Transactions: communication logs, billing logs, banking operations, etc.

Internet of Things and Big Data

Another concept closely related to Big Data is that of Internet of Things (IoT) and that refers to the technology that allows all things to be connected to the Internet. Last year IBM already announced that the first source of information for Big Data would be the Internet of Things.

If we unify the two technologies, Big Data and the Internet of Things, we manage to develop projects such as the one carried out by the McLaren - Honda Formula 1 team, in collaboration with IBM, where information has been collected through 160 sensors incorporated into the single-seater, which has been transmitted in real time to some cloud services and analysed by the IBM Watson cognitive computing application.

The use of this technology has allowed the team to make decisions in real time during a race, based on information collected and analyzed through Big Data and cognitive computing.

If the analyses and statistics do not fail, Fernando Alonso will become Formula 1 World Champion again, as predicted by Big Data.

If you want to know more about Big Data and Internet of Things I encourage you to enroll in any of the courses we organize periodically, you can request more information at



Fernando Bonet