I analyze, therefore I am !

Author: Kadir KUĞU, Software Consultant / Senior Product Manager

“According to estimates, by 2020, there will be four times more digital data than all the sand on the Earth.” IBM’s comment about big data smells like a magazine, but it can give us an idea so we can imagine the number of data on the ground. But to talk about Big Data, it is not only the volume but also the variety, velocity and veracity that need to be addressed. These features are referred to as Big Data’s 4Vs. Since the internet was first used on mobile phones and then on all objects, the production volume of the produced data was diversified, the volume increased and it reached a dizzying speed.

Figure 1: Digital Around The World in 2018 [1]

Until a few years ago, in 2020 the world population was expected to reach 8 billion. In 2018, this prediction has almost come true. Another prediction is that the number of connected devices worldwide will reach 50 billion by 2020.

Figure 2: Number of connected devices worldwide from 2012 to 2020 [2]

The transformation of the data to a big data is destructively affecting business models and social life in the modern world. It is now possible to deliver of products to more customers, or even to develop customer specific products and campaigns, and to increase energy and time efficiency by analyzing big data. In the near future, companies’ assets will be directly proportional to the value of their data and their ability to analyze the data. Let’s take a look at some of the daily uses of big data analysis.

  • One of the biggest problems in the business world is the human resource foundation and the difficulty in keeping this resource. While it is getting difficult to find the right person, it is getting much harder to keep this job in today’s competitive environment. Is it possible to estimate when workers quit by correct data and correct analysis of these data?
  • Perhaps the biggest competition in the business world is about acquiring new customers and maintaining the loyalty of existing customers. This problem is mentioned in the literature as Customer Churn. To cope with this problem, millions of dollars are spent for television commercials and special campaigns. Technically, with the same problem as above, it is possible to solve this problem with big data analysis by putting the customer in place of the employee in the above scenario.

Below, employees who are likely to resign can be analyzed with 80 % accuracy by using a sample firm data and Knime platform. This rate can be increased further by cleaning the data, using different data analysis filters and algorithms, and ensuring the volume, variety and veracity of the data set.

Figure 3: A sample solution of the Churn Problem on Knime

The following are examples of other areas where data analysis methods can be used. It is important to remember that the algorithms to be used in each area will vary. However, the volume, variety and accuracy of the data will be equally important for a good result in all these areas.

  • Cross Selling
  • Market Basket Analysis
  • Customer Relationship and Satisfaction Management
  • Competition Analysis
  • Fraud Protection
  • Credit and Insurance Risk Assessment

The use of technology in all aspects of life also brings competition against time. The method of collecting the data, its cleaning up for a long time and preparing detailed analysis reports began to wear out. There are some problems, such as credit card fraud, which make it necessary to analyze the data instantaneously on the flowing data rather than on the stored data. This leads to the concept of Stream Analytics. In classical methods; data gain value if it is gathered, accumulated and waited. In Stream Analytics, real-time non analyzable data is not valuable.

In recent years, there are departments of Data Analytics in universities and certificate programs by educational institutions for enthusiasts. However, you can access hundreds of educational documents free of charge with a short search on the internet. There are dozens of paid and free platforms and software languages available for data analysis work. For more successful results in this area, it will be necessary to deeper investigation of algorithms such as artificial neural networks, decision trees, support vector machines and Bayesian networks, ability to work on platforms like Knime, SAS, RapidMiner, Pentaho, Tableau and Power BI and if you want to deepen in the software field you will have to spend hours working in languages such as Python, R or Scala.


[1] Digital in 2018 by “Hootsuite” and “We Are Social”, https://digitalreport.wearesocial.com

[2] Internet of things : Number of connected devices worldwide from 2012 to 2020, Statista, https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/

Leave a Reply

Your email address will not be published.