Posts

Showing posts from November, 2011

The Secret behind A B I N I T I O.

Background on Abinitio : Abinitio is considered one of the major ETL players  in line with informatica ,datastage ,SSIS ( if it can be considered major ETL player ;-) ) .In fact, in most respects it is considered better than all of the above . It has user interface and debugging simpler than Informatica  ( one of the positive points about informatica ) ,It has parallelism which data stage may still take years to reach . Superiority : (HearSay ) It has the ability to convert a 6 people ,6 month ETL project to 2 month 4 people ETL project . (Sounds hyperbole ,But yet it has been established in many cases that it takes half the development time compared to INFA with much better quality of code ) . Other statistics suggest that if you call Abinitio ,there is 75% chance you will be talking to a PhD holder.In other words it is considered to be ingenously simple. Secrecy : In spite of all this there is very little you would know about it (If at all ) . Reason : It is so because Abi

Trends in Data warehouse and Data Visualization tools

Data warehouse for any enterprise is always in an evolving state. We seem to be now maturing from report only to analytical phase and therefore one step closer to our dream of real time data warehouse. Every change in phase brings along with it new sets of challenges and therefore, often, new sets of tools and technologies. As we move from reporting to analytical phase, BI giants such as Cognos, MSTR, BO, OBIEE, Hyperion seem to be getting more and more obsolete .This is where we welcome data visualization tools like QlikView, Corda, Tableau, Panoptican, Spot Fire and Tibco. Advantages of these tools are very much in your face.  They have very less build up time i.e. to say you can complete your first report within two days of installation (Including the learning phase). This is preferable as against traditional reporting tools, where you would spend considerable time building metadata layer.(MSTR claims you can bring down a 30-day activity to 1 hour).  They are visually very a

Finding Skewed Tables in Teradata

       Skewed Tables in Teradata                                                                  Teradata distributes its data for a given table based on its Primary Index and Primary Index alone .If this Primary Index is not selected appropriately it can cause performance bottle necks. Following are general parameters we should consider while creating Primary Indexes       1:Access Path       2:Volatility       3: Data Distribution        Volatility is usually not an issue in a well designed database .That is to say we do not expect update clauses updating the primary index itself. This usually leaves us with data distribution and Access Path.       Access Path implies particular column ( set of columns ) are always used in join conditions .Advantage of using these columns as PI is that it will avoid redistribution of data during join .( One of the most expensive operation for teradata ) and therefore can reduce usage of Spool files.        Data distribution implies