Towards Data Freedom: ETL Pipelines
Kicking off the series with ETL: what it is, why it’s painful, and how we make it easier to build, iterate, and inspect.
Towards Data Freedom: ETL Pipelines
Welcome to the first installment in a series of posts we’re calling Towards Data Freedom! Our goal here is simple: cut through the noise and gatekeeping around data by eliminating the need for practitioners to memorize buzzwords—and let anyone be a data cowboy.
We’re starting with a doozy because we at Structify believe the biggest problems have the coolest solutions.
What is an ETL Pipeline?
So, what is an ETL pipeline? Starting with the acronym, ETL stands for Extract – Transform – Load (ELT, in many cases, because loading can happen before transformation). Each piece has its own intricacies and pain points, so stay tuned for follow-ups where we’ll deep-dive into each.
At a high level:
- Extract: grab your data—CSV, internal database, or an API call
- Transform: clean, normalize, and derive the fields you need
- Load: put your data where it needs to be—an S3 bucket, a warehouse, or a database that powers a Tableau visualization
Historically, “ETL pipeline” is a loaded term. Usage ranges from one-off scrapers GTM teams use to get their data all the way to robust, production-grade pipelines that serve as the backbone for companies as big as Amazon.
Why do we care?
No matter which way you slice it, these are painful to build and maintain. A host of very similar and repetitive problems tend to arise:
“Oh, this data isn’t clean at all.”
“The API’s data dictionary changed and now all my visualizations are broken.”
“My custom scraper broke this morning and I have to burn 3 hours to fix it before I actually start my day.”
“My upload is broken because an input contained unexpected null values my transform was supposed to catch but didn’t.”
...and the list goes on.
We listened to customers, and the biggest needs in ETL tooling are:
- Automates the building
- Makes iteration on the different pieces of the workflow a breeze
- Makes inspectability easy and consistent
By giving users access to this suite of tools, we believe that the big data landscape is going to change positively and forever. Check us out to learn more and if this sounds useful to your team, send us a message: let’s build together.
Yours in data,
Alex Reichenbach