This article originally appeared on the BeyeNETWORK
There's a convergence occurring… If we think about our minds and our primitive understanding of their function, they store information in both short and long term memory. We need the long-term memory to understand and interpret the context of the short-term memory. Has anyone said that short-term memory uses a different structure of neuron cells than long-term memory? No!
Why then do we insist on separating different structures for processing purposes like theODS (operational data store), OLTP (on-line transaction processing) and DW (data warehouse)? If we are ever to construct a Nanohouse™ then we must accept the idea that the model (structure) of the data is not dependent on the function or utilization of the information!
The future Nanohouse™ is one type of structure with potentially different types of functions attached. The more we separate our systems, the further from our destination we get. That is, if our destination is a thinking machine, or even a semi-smart device.
Figure 1-1 DNA Double Helix
This image was generated for purposes of demonstrating DNA and a double helix; it shows chemical bonding of the different elements. It’s much like a three-dimensional model of the Data Vault, in that the Data Vault Hubs are the center or nucleus of the molecules, the Satellites are the surface area, and the Link tables are the valence bonds. http://www.chemicalgraphics.com/
Integration of OLTP, ODS, and ADW
Convergence has caused a heated debate in the data warehousing industry. Many individuals in the industry are fighting over the best method of building an integrated historical data store. Furthermore, they still argue over storing operational data with the historical data. This will not move the industry forward. It’s as if you had two woolly mammoths arguing over a tar pit – is it black or is it hot? Will we sink into it? Well, it’s black and hot, and we’re already sinking! As a result, we need to evolve our models to breathe new life into the industry.
Can we build a system that contains all the information needed for OLTP, ODS, and ADW? I believe we can. I also believe that if we don’t, then we cannot move forward with our goal of understanding context through induction or deduction. While there are very different uses for the information, there really shouldn’t be different structures. Today, we typically apply third normal form (3NF) for the OLTP systems and ODS’s. We also apply a combination model for ADW using bits and pieces of OLTP and Star Schema.
What is an ADW?
An active data warehouse combines historical data with the active arrival of transactional information. It stores information in real-time and applies it with the knowledge of the long-term history. It is almost an operational data warehouse… It is a crude equivalent of short-term and long-term memory storing the data in a single model, in a single place.
The war rages on: do we build a Star or do we build a 3NF with history? This really shouldn’t be a question. If we tie this information set to neural models and Nanotechnology we begin to see that the models themselves converge, while the functions (application of the models and contents) remain independent. This is most evident in the way the brain functions.
What about the way the brain functions?
First, you have to be willing to agree that the neurons in the brain (or all across the body) are similar in structure, while they vary in form. A neuron is said to network with other neurons through a series of dendrites and synapses. Many scientists have speculated that the networking of the neurons is what’s important, formation of the neural pathways makes the difference.
Figure 1-2 Images of Neuron Cells
This image of neuron cells represents the most basic of the Data Vault data models. The Data Vault includes a Hub (center of the neuron), Satellites (context within the neuron, or chemical make-up of the neuron) and Links (the dendrites and synapses) the network between the neurons.
Take the brain for instance; we have a notion about two components: short-term memory and long-term memory. Has anyone said that the neurons for short-term and long-term memory look different, or have physically different structures? No, most scientists will state that the neurons have similar composition, but function differently when presented with different proteins and chemicals, which are also known as stimuli (throughout the body.)
Why do we insist on separating ODS, OLTP and ADW?
Why do so many people insist on this train of thought? Will it move us forward? It might, but only to a point. Independent systems work to build foundations and levels of understanding. As with any independent system, the more integrated it is, the more powerful it becomes. As our understanding deepens, we should be building single structured systems with the same architecture and forming different utilizations of the information within them.
The Data Vault proposes a set of structures which are capable of storing both current and historical information within the same structure. It doesn’t mean that the system that uses the data doesn’t require different functions. What it means is that we can consolidate our information into a large and very powerful integrated data store. The technology has come far enough to carry the complex functions to the next level.
What about the Nanohousing Initiative?
Nanohousing™ is the goal of the Data Vault. The Data Vault data modeling techniques are based on “wet-technology”, that is to say, the blurring of lines between natural models and man-made models or representations. Try to imagine the model as a three-dimensional (3D) chemical model, or a neuron/synapse/dendrite model. Its ability to represent information stores in multi-dimensional space brings together a new way of thinking about integrated information.
Nanohousing with the Data Vault simply helps bridge the gaps between today’s’ data modeling efforts and the future of information representation. Nanohousing brings to the table, the ability to think of information as a chemical structure with high-valence bonding capabilities. DNA (in the double helix) appears to be a single type of structure, capable of holding all types of information for many different purposes, yet it is unique as a chemical model.
Figure 1-3 Chemical Bonding Images
Imagine modeling your information on the molecular level. Each molecule could be built as containers for DNA. The DNA could house the data within each of the chemical structures. Again, each Hub in the Data Vault should be treated as the “nucleus” for the chemical, with the Satellites as the surface area (determining the nature of the bonds) and the Links represented either by another chemical type, or by the valence bonds tthemselves.
Why separate any of these systems in the future if OLTP and ADW can coexist? The conclusion is a natural one based on the answers to the following business questions: Why isn’t there one corporate answer across the enterprise? Why doesn’t my enterprise use one corporate application that’s fully web-enabled? Why doesn’t my data reside in a single location so it can be mined with the historical information? Why not?
Just as short-term memory houses “today’s events” we need the long-term memory to place these events into context. If there are two different applications of the same information, why not use the same physical structure? Why not integrate all our systems information into a single structure with different functionality?
Lest we forget – there’s convergence between technology, biology, molecular science, mathematics, physics, and chemistry: Nanotechnology!
But wait! There's More!
I’m suggesting that thinking systems must evolve from a similar structure (form). I’m also suggesting that in order to get there, our data sets must converge into the same repeatable structures. I also believe that by combining all the information (regardless of utilization) it will allow us to begin to see patterns in business we have never identified. Furthermore, it will lead to the application of neural net technologies, data mining and high performance alerting systems.
DNA computing will finally couple form and function together. The form will be the DNA chemical structure. The function will be the enzymes we use to build, split, replicate or search the information. Of course, new concepts of data mining will be developed. Inductive and deductive reasoning will begin to appear; especially if we introduce the DNA and molecular structures of neurons.
The network of these neurons will make the difference in the intelligence of the system. Teaching the system to learn will be a huge challenge, but it will be tackled.
Convergence is everywhere. It’s time for this industry to change. A new breed of design must emerge and new systems must be built. While this may be called an operational integrated historical data store, it will combine both types of data. We will be creating the equivalents of short-term and long-term memory.
Nanohousing is the next evolution; it is information convergence. Form and function are converging at the molecular level. Nanohousing begins where other conventional systems end. It gives us the ability to program, de-program, and alter the atomic levels of chemical components. In order to reach the atomic level of these systems, we absolutely must consider integrating all our information stores, along with our models. Otherwise the necessary patterns will not emerge, the learning states will not be found and the model will not adapt via neural networks.