Is Airbyte the King of ETL?
Along with the rise and normalization of cloud computing, data ETL (Extract, Transform, Load) data platforms have become an industry standard for most enterprise companies. They allow more rapid development, standardization across data source schemas, and ease of use for engineering teams that have demanding timelines and internal stakeholders.
I am currently working on a marketing warehouse implementation for a client that has little prior footprint in the cloud. Because of this, I have an open field to find the most optimal solution for their cloud setup. When it came time to choose an integration platform that they would use to leverage a myriad of data sources from marketing, to sales, to other 3rd party tools, I found that Airbyte was an option that was so compelling that it was hard to seriously look at other competing platforms.
The Use Case
The company I am working for is an emerging energy supplier that utilizes many data sources including Salesforce, HubSpot, Google Ads, Microsoft Ads, SEO tools, and Call Center data. They don’t have a large data engineering team so a low maintenance solution would be key for long-term management. Also they want data integrated into their warehouse of choice as soon as possible so they can begin to leverage the data for automated reporting and machine learning use cases.
Evaluating the Options
Knowing that I had a wide variety of options to choose from, I began my search for the right integration solution. For this use case I was able to derive several key factors that an ETL / ELT platform would have to incorporate to be selected for this implementation.
Integration Capability - The platform of choice needed to be able to integrate as many of the sources that the client was currently using, and give options to easily create integrations for those that it did not support currently.
Low Maintenance - The platform needed to be simple and user friendly, with excellent instructions on how to set up and maintain connections overtime as well as triage issues as they come up.
Price - This is always an important factor, cost optimization without sacrificing performance is a key component to success.
Future Proof - Knowing that technology is always changing and growing, understanding how flexible we can be with the integration partner we choose is paramount. Choosing a platform that does not lock in its users to a long term contract or only support one major cloud platform as a destination will allow for changes in the future to be made, without having to rip out the current solution and start again from scratch.
Performance - Last but not least, the chosen platform will need to be able to meet the integration requirements for the business and keep up with them as they change and grow.
Selecting the ETL Platforms
This is going to sound highly technical, to find different options for integration platforms I googled “data integration platforms”, “Popular ETL tools”, and other related searches. Digging through popular blogs and cloud platform affiliated sites along with my own previous industry experience narrowed my selection of possible platforms down to 6 options.
The order above denotes my original prioritization based on my anecdotal experience working with the platforms previously in my career. From here I put these integration options into an evaluation table with my requirements for the client as the main criteria.
Below is a subset of that evaluation for review.
As you can see from the table, when comparing my number one and number two options FiveTran and Airbyte, I had a difficult time convincing myself that Airbyte wasn't the better choice.
Firstly, the Open Source option from Airbyte makes it a more enticing option than Fivetran, Stitch, or Matillion which all require a sales contract and some type of credit system to pay for the service provided. From an engineers perspective, being able to host my own instance of Airbyte and manage the cost for hosting in my cloud platform means I have more control over the cost of ingestion and can scale and adjust my configuration as needed to maximize my performance. Also if my client decides to switch clouds in the future, this implementation can be lifted and shifted. Points to Airbyte on this for Price, Performance, and Future Proof.
Integration-wise the platform had available built-in connectors for a majority of the platforms needed by the client, with more being added each month. Also they have a custom connector option that still leverages a low code configuration for any smaller sources not currently available. For the destination options, all major platforms were included and when testing each I was surprised by the relative ease that I had setting up each option.
I was able to configure connections from my major sources to BigQuery, Redshift, and Azure SQL Server within a 3 week time frame using Airbyte’s UI. All without ever having to contact sales or reach out to a support staff member at Airbyte. Speaking about the UI, this thing is slick, the only complaint that I have is that Fivetran and others have better Entity Relationship Diagrams (ERD) that show the relations of the data schemas for most connectors and I wasn't able to dig that up in any of Airbyte’s documentation or through the connector setup UI. Compared to the other open source options including Meltano and Apache Airflow, Airbyte takes the cake when it comes to Low Maintenance.
Conclusion
ETL platforms like Airbyte, and the others listed previously, are all relatively newer companies. So to say that Airbyte is the best of the bunch based on my evaluations at this point and time shouldn't hold too much weight when thinking about the next 3 - 5 years.
Much of the reason Airbyte has been so successful is due to the fact that they are a younger company than the rest of the competition, and have had the ability to see what customers have complained about pertaining to the services those companies provide. With that said, leveraging open source with a simple UI / UX and focusing on rapid development of connectors has proven well for the 4 year old company. As the industry grows and customers push more and more into the cloud with ML and AI use cases buzzing around everyone's heads, who knows which one of the ETL / ELT providers will emerge as long-term players in the space, but one thing is for sure, Airbyte puts up a good argument for being at the head of the pack.
If you're unsure about which ETL / ELT tool your company should be using, and want to see if Lakefront can help you navigate this ever changing landscape, reach out to see how we can help.