A Reference Architecture For Real-Time Big Data, Guest Series Part 3Source: Agilence Inc.
By Derek Rodner, vice president of product strategy, Agilence
In the first part of this series we identified the different types of data (structured and unstructured) and we formalized a definition of Big Data from a retailer’s perspective. As a recap, the definition is as follows:
The collection of large amounts of (structured and unstructured) data from multiple, disparate, and often unrelated systems into a single repository to enable retailers to more efficiently and effectively understand their business and their customers in a timely manner.
Part 2 of this series covered the challenges and benefits of big data for retailers and offered perspectives from the industry.
Now we turn our attention to the more technical aspects of Big Data. As mentioned in the previous articles, it’s not enough just to aggregate the data into a single repository. The data has to be readily available. Timeliness of data is of utmost importance in today’s fast paced world.
In other industries, big data is handled by exporting and consolidating vast amounts of data into a massive business intelligence database. This process, called ETL or extract, transform and load, strips out unnecessary data and loads the scrubbed information into an OLAP database. There are four problems with this scenario.
- The data is typically coming from only a few other applications and all of the data is already centrally located.
- It takes time to perform the ETL process and in retail, stale data is bad data.
- You have to know what you want to learn before you build the DB. What if you are stripping the data you need?
- These industries don’t have video.