ETL and ELT: how much difference can the order of those letters really make? To explore that question, we first need to understand what each step is.
E is for Extract
API, database connection, flat file, JSON, XML… Whatever it is, Extract grabs raw data from a source system.
L is for Load
Next, Load that data into the repository (expect big differences depending on whether this step comes before or after Transform).
T is for Transform
Make sense of the data with respect to your use case. Structure, clean, model, join, blend, govern and secure the data.
To dive into the detail, our blogs on Extract and Load and the Transformation layer are definitely worth a read.
For more on ETL and ELT, you’re in the right place. Why does the order matter and what difference could it make to your business?
The traditional approach
ETL has been the approach used to construct a data warehouse for many years. On-premise servers have physical constraints in both storage and compute power. This means a great deal of thought and planning is needed. Transforming data before loading it into a data warehouse minimises the volume of data. And it reduces data repetition and redundancy.
That sounds good – but what’s the cost? Let’s examine that question by thinking about the following metrics:
ETL requires the data to load several times. In fact, a more accurate acronym would be ELTL. Crucially, the time needed is dependent on the data load beforehand. Maintenance of the end data state is also more time-consuming. Any reprocessing means a rerun of the entire ETL process.
ETL typically requires the deployment of multiple programming scripts and languages. They must be in place to perform the necessary structuring of the data before loading.
ETL infrastructure creates a high barrier to entry to the transform layer. The use of programming languages demands technical expertise. Iterative development of the logic is difficult and time-consuming as a result.
Enter stage left – ELT
Cloud infrastructure has effectively removed the constraints associated with on-premise servers (Think AWS, GCP, Azure and others). Storage is cheap enough to not be a consideration for the vast majority of use cases. And compute power is scalable, flexible and on demand.
By loading the data once, without transform, you can remove the programmatic overhead. The barrier to entry is lower and the time to execute a pipeline is dramatically reduced. Crucially, the data transformation challenge ends up in the hands of the people best placed to solve it: your analysts.
Consider a column store cloud-based data warehouse – scalable and quick. It can empower your analysts to execute necessary data transformations super-fast. They can also deploy SQL to apply the business logic iteratively and transparently.
No more communication breakdown between analysts and engineers. No more wasting of engineers’ talents and skills on lift-and-shift data tasks. And most important of all, no more delays on insight or answers to your questions.
ELT unlocks the power in your data. Kleene unlocks ELT for your business.