Big Data & Analytics

Costs associated with processing the Volumes, Velocity, Variety (and the fourth – Veracity) of data are only dropping – and already way more cost-effective than trying to process Big Data through RDBMS type solutioning.
The context is hugely different – data sources today mostly churn out unstructured data such as the huge quantities of video information that gets uploaded every minute into YouTube…or the volumes of raw graphic, text, video etc data that get posted into Facebook. This requires a new wave applications and tool sets.
If we first built cities and then the roadways to connect them, today we will analyze the flow of traffic, where the volume is moving to and fro and what times of day and night…where there are congested intersections and where there are free-flowing intersections, accident occurrences and seasonal trends…all of this data will assist us in future metropolitan planning and design of cities and their required infrastructures…so fasten your seat-belts!

**********************************************************************

Trying to resist or reverse the big Data & Analytics wave, Cognitive Computing, Robotics, Nanotechnology and Cloud trend is like trying to squeeze toothpaste back into the tube…that Genii is out of the bottle…that train has long gone…

**********************************************************************

It is impossible to separate these new technologies. These technologies are converging right before our very own eyes. Big data and analytics drives huge requirements in H/W, Storage mgt, Software, Security and curation, to name just a few. The Volume, Velocity and Variety (and Veracity) factors are better managed when in house resources do not have to be engaged end to end – where some sensitive data can stay safely behind the organization firewall, while the rest of it can sit in a cloud. Today’s cloud environment offers an affordable, robust, secure and scalable environment – which is a great marriage between big data requirements and a cost-efficient way to extract value from it.

*************************************************************************

Imagine a smart card that has a nano chip embedded in it. This card allows you to pay for or use a host of services available in the country you reside in as well as in some neighbouring countries. So you can pay for a bus, a subway ride, an air ticket, a visit to a museum, for meals at several restaurants, for your car rental and even the pet services to groom your dog.
Now think about all the diverse data you are generating that is readily available in your card’s chip. Imagine that data being extracted say from 1.2 billion people in China or 1.1 billion people in India…or to be realistic, lets say only 30% of those country populations use the card. Would you agree that this would definitely meet the criteria for Big Data – and that analyzing it would offer profound insights into the demographics and psychographics of those users, as well as a host of other potential decision data points that could possibly trigger new products and services for them?

All such transactions (each of which generate their own data sets) when considered in their totality (in terms of generating big data by their Volumes, Varieties, Velocities and Veracities) – contribute to Big data. So this is just an example of the sort of interdependence that could exist between nanotechnologies and big data

**************************************************************************

The issue is how do we upgrade our I/T Security process and scale it up to deal with the new volumes, Variety , Velocity and Veracity requirements of big data. Like I said above, we need 21st century and beyond tools to keep pace with the exponential growth in sheer data volume alone. IDS (Intrusion Detection Systems) need to be upgraded to include Anomaly, Statistical Anomaly, Protocol Anomaly, Traffic Anomaly based IDS systems to name only a few.
The kind of data processing, application availability and storage that a basic smartphone is capable of is ridiculous – in fact you could probably mix and release an album of the quality of say The Beatles Abbey Road – on an IPhone with Garage band. Abbey Road Studios requires a giant footprint of bricks and mortar, not to mention huge investments in recording hardware and mixing software…compare that with the ubiquitous IPhone…pretty neat, huh?

*************************************************************************

Big Data is not about replacing RDBMS. Big data growth is happening mainly because of the growth in volume of unstructured data (think Video) that are not suitable for traditional database vendors (like oracle). Moreover, not only is the data too unstructured and voluminous for traditional RDBMS players, but the cost of processing/crunching through these data sets using traditional RDBMS technology are too high (and that’s putting it mildly).
There is a whole new breed of Big Data Companies emerging to capitalize on the big data trend, and they champion commoditized hardware, proprietary and open source technology, to capture and analyze these new data sets. This suggests that the traditional RDBMS vendors are in for a fight for their lives, against the new players who will have the advantage of being able to offer better price points. A few of the new players who come to mind include 1010 Data, and Cloudera. It’s going to be an interesting ride for sure…

Big Data presents data sets that existing commonly used software tools are unable to capture, process and curate, within reasonably acceptable lapsed times. It continues to be a moving target beginning from the Terabytes scale to the many Petabytes of today. Here is Gartner’s definition: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
We will require new techniques and tools to tap into its vast potential. Cognitive solutions, Analytics, Data-Mining, Deep Learning, Predictive Modeling, etc are all outputs that can be extracted from Big Data – the caveat being the need to use 21st century (and beyond) tool sets – the old tools just won’t suffice.
It is here to stay – as inexorably as is the Internet, YouTube, Facebook, Wiki and hosts of other spin-offs – all of which provide cost free big data by-products from digital interactions.

How many of us believed in “Moore’s law” when we first heard about it?
That over the history of computing hardware, the number of transistors in a dense integrated circuit would double every two years?
Well the fact is that it has proven to be true and we now model a lot of manufacturing goals after this somewhat simplistic law.
Or that Mainframe computing would give way to thin client or distributed server model or cloud technologies of today.
Or that technology would advance to the degree that we may have an entire generation who never have to use a laptop or a desktop because of the quantum leaps made by smartphone technology.
So it is easy to understand that some may doubt innovations like self-drive cars, Big Data and analytics, Robotics and Micro Robotics…but remember the book Future Shock from 40 years ago: “The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn. ”
― Alvin Toffler

***********************************************************************

Think of Big Data systems as capable of reading, interpreting and storing all forms that data comes to it – regardless of the format it comes in. In other words, diverse data sources are constantly being added to the system and the system has the ability to differentially scan for the latest data introduced into the algorithm…and generate output that will include that last data. If a law firm is managing a client’s legal documentation, it will constantly be adding newer documentation into the Big Data system so that each time it extracts a report it is inclusive of the last data set that was loaded into the system.
So if IBM conceivable had to load every known written, spoken, physical, tangible book, journal, whitepaper, cinema script, map or web site into their system in order to prepare for the challenge against Jeopardy – they would need to continue loading incremental data as it becomes available – if they want to be ready to take on Jeopardy in the future.

Consider the implications. Big Data and Analytics capable systems located in a cloud, accessible to all at reasonable rates. Think of the explosion of apps development and services that could be offered without the need of owning or investing in expensive H/W , S/W or middle-ware, etc.
If there are a million cases of Ebola scattered across the globe, think of how the various case histories, symptom, failed/successful treatment histories could be co-assimilated, drilled down into, analyzed and interpreted through cognitive systems.
Think of the superlative insights and treatment options that would emerge.
The above scenario could be and is indeed already being applied to the treatment of Cancer, with IBM Watson analytics at the forefront.
Big Data, Analytics and Cognitive Computing does what any single doctor, surgeon or psychiatrist cannot do – read, absorb and interpret the millions of medical journals, books, papers, references and patient histories into decision data.

***********************************************************************

Uber is a great example of Big Data, Cognitive Computing and Technology (hardware and software in GPS systems & Smart phones) being used to create a Technology Disruption (or innovation- depending on which side of the licensed Taxi Service Union fence you sit on).
You install a simple app on your smart phone and you can see where the nearest Uber car is, how much time it will take to get to you, how much your ride will cost and how much time it will take to your destination.
You exit at your destination without the need to fumble for currency or small change, tips etc. All are pre-set and your credit card is routinely charged.
Yes – there are numerous cab drivers who have lost their jobs as a result, but the consumer has gained additional latitude in his/her choice of transport – whether it is a regular cab or an Uber.
On another note, the Taxi Drivers of London who previously had to have ‘ The Knowledge’ of all roads and destinations memorized, no longer need to – thanks to GPS.

***********************************************************************

Big Data is part of the current Technology Disruption wave that is poised to change the very economics of our business model. If you factor in other current technology objectives such as Cognitive Computing, Deep Learning Systems, The Cloud, Robotics and Nano Technology, you have a veritable Technology Revolution in the offing.
Big Data was recently tested when IBM challenged the Game of Jeopardy and destroyed the two best players in that exciting, frontier-challenging battle.
Unlike the finite construct of the game of chess with its rules and fixed patterns and predictable piece movements, when IBM’s Big Blue defeated Kasparov, Jeopardy was a different beast altogether. It required not big data processing, but all the cognitive methods, supercomputing capabilities as well as the programming to sift through literally millions of sets of data, select the most likely (correct) responses, and then prioritize the best response – all within the response time frames.
Explore IBM’s Watson.

************************************************************************

Here is a potential ‘Disruption’ that is probably already happening around us: Let us say you wear a Smart Watch that sends signals on key health parameters like Blood Pressure, Temperature, Blood Sugar levels, Breathing, Heart Rate etc. to a local Medicaid hub. The staffers are not MD’s but instead students of pharmacology or similar qualifications. The Hub is supported by a Big Data and Analytics application in a Cloud. Your medical inputs are processed real time by the Hub application and your case is mapped against millions of similar cases within the database. It is able to prioritize the symptoms with causal data as well as prioritize the remedies that worked best on similar cases. Your prescription with recommendations and notes is transmitted back to your Smart Watch.
It’s easy to see how such a change could  impact Walk In Clinics, Pharmacies, Private MD Offices, Drug Stores etc…and that is precisely what Disruptive Innovation is all about!

*********************************************************************

Consider the ‘S’ Curve in Business (definition borrowed from Andrew Latham, Demand Media:

Businesses, or the products of businesses, that follow an S curve are characterized by a shallow start, where only early adopters and niche markets buy the product or invest in the company. Then they experience a rapid growth, and the product or business has a dominant position in the market. After the rapid growth, these businesses maintain a high performance level but with little growth, which often signals a mature but saturated market.

We see similar ‘S’ curve examples in cellphone/smart phone penetration in 3rd world markets, in Moore’s law where we are able to predict that the number of transistors we will be able to pack on a square inch of IC board will double every two years…and the costs keep dropping too.

*****************************************************************************

Imagine a computer that is designed to compete against human competitors – for example against human players in the Indian TV Program: “Kaun Banega Crorepati.”
You would need to develop a massive ‘Big Data’ reference library to conceivably include almost any topic or subject that the quizmaster would have as options to set his/her questions. The Computers program would necessarily have to be truly ‘Cognitive’ – in other words it would have to be capable of determining ‘meaning and Intent’ behind the questions. Also It would need to be able to understand the question and respond correctly – in other words it would need to Observe, Interpret, Evaluate and then Decide based on the data library it has, and then prioritize the most probably answer and respond. Such a ‘man versus machine’ contest has already occurred when a computer was pitted against a popular game show called Jeopardy in the U.S, during 2011. You may watch it on YouTube.