After working for few years in NOSQL databases & so called Big Data Projects, I realized people use NOSQL as a synonym to Big Data. It’s true most of the people out there believe using NOSQL database makes a project Big Data Project.
I thought I do my bit of mess by defining Big Data :)
Some Suggest it’s the 3V’s that matters,
1. Volume - The size of the data set (Terra byte, Peta Byte so on…).
2. Velocity - The rate at which your data grows (1GB per day, etc.)
3. Variety - How different is your dataset, is it a dynamic & doesn’t have a standard structure elements.
So what ?
1. RDBMS & existing infrastructure was handling this problem for a while now, probably with some higher licensing cost, isn’t it ?
2. But how about variety of data, because RDBMS is much more structural & may not work well for a non structured data.
3. Is ACID properties an overhead, may be may be not but that’s the cost you pay for performance.
Assume that your system falls into one of the above category, what next,
1. What you wanna do with the data ?
2. What’s the current pain points in Business Analytics ?
3. How real-time is your analytics is expected to be ?
Business analytics is core for any business, some of the traditional tools Business analytics tools aren’t that real time considering the ability to do parallel processing & the licensing cost involved. I do agree this is kind of a motivation to use technologies like Hadoop, open source & runs on a low cost commodity hardwares unlike the proprietary solutions.
I could only attribute Big Data as a filler to the existing problems in Business Analytics & Intelligence space.
1. It’s critical to have parallelism when it comes to generating BI jobs/reports & so on.
2. It’s critical to be able to handle huge volumes, Velocity & Variety of data for a realtime analytics.
3. Obviously the cost involved with the traditional BI solutions.
I tend to believe just like the way we normalize database for realtime transactions & de-normalize for business analytics (remember star schema !). The Big Data Architecture puts the analytics hat right from the induction of data & less to worry about offline/de-normalization etc.
Oops ! your right, I started of saying “Defining Big data”. I am lost, may be intentionally on a quest to see what’s in play for 2014 in Big data space.
Happy new year to you all !