From Database to MPP to Data Lake

Posts

Showing posts from November, 2011

The Secret behind A B I N I T I O.

November 08, 2011

Background on Abinitio : Abinitio is considered one of the major ETL players in line with informatica ,datastage ,SSIS ( if it can be considered major ETL player ;-) ) .In fact, in most respects it is considered better than all of the above . It has user interface and debugging simpler than Informatica ( one of the positive points about informatica ) ,It has parallelism which data stage may still take years to reach . Superiority : (HearSay ) It has the ability to convert a 6 people ,6 month ETL project to 2 month 4 people ETL project . (Sounds hyperbole ,But yet it has been established in many cases that it takes half the development time compared to INFA with much better quality of code ) . Other statistics suggest that if you call Abinitio ,there is 75% chance you will be talking to a PhD holder.In other words it is considered to be ingenously simple. Secrecy : In spite of all this there is very little you would know about it (If at all ) . ...

Trends in Data warehouse and Data Visualization tools

November 07, 2011

Data warehouse for any enterprise is always in an evolving state. We seem to be now maturing from report only to analytical phase and therefore one step closer to our dream of real time data warehouse. Every change in phase brings along with it new sets of challenges and therefore, often, new sets of tools and technologies. As we move from reporting to analytical phase, BI giants such as Cognos, MSTR, BO, OBIEE, Hyperion seem to be getting more and more obsolete .This is where we welcome data visualization tools like QlikView, Corda, Tableau, Panoptican, Spot Fire and Tibco. Advantages of these tools are very much in your face. They have very less build up time i.e. to say you can complete your first report within two days of installation (Including the learning phase). This is preferable as against traditional reporting tools, where you would spend considerable time building metadata layer.(MSTR claims you can bring down a 30-day activity to 1 hour). They are visual...

Finding Skewed Tables in Teradata

November 04, 2011

Skewed Tables in Teradata Teradata distributes its data for a given table based on its Primary Index and Primary Index alone .If this Primary Index is not selected appropriately it can cause performance bottle necks. Following are general parameters we should consider while creating Primary Indexes 1:Access Path 2:Volatility 3: Data Distribution Volatility is usually not an issue in a well designed database .That is to say we do not expect...