Agile Data Engineering
We use Matillion to produce highly curated datasets for customers. Matillion has allowed us to move this workload entirely into the data warehouse where the final work product lives, and cut processing time from days to hours.
Matillion allowed us to move a legacy ETL pipeline to BigQuery with speed, precision, and confidence. It is now used exclusively by our team for all of our ELT needs. The user interface is clean, intuitive, and allows members of the team to be immediately productive. Onboarding new users has never been easier.
Feature parity between the different flavors (Redshift, Snowflake, BigQuery) is lacking, however, Matillion continues to make progress in narrowing the gap.
Easy to use solution, allowing rapid delivery of ELT processes
We came to Matillion while looking for a tool that would allow us to orchestrate our load processes for Amazon Redshift. We'd used other ETL tools previously, were blown away by how easy it was to pick up and understand Matillion. We liked the simplicity, and the fact that we could always dig into the SQL that was generated. New features are added frequently, and we usually find one or two improvements that directly impact us with each release.
Their support team are also very responsive, and Matillion are always keen to help you get the most out of their software, so there's a lot of support if you want it.
2+ years later, and we're using Matillion to run all our ELT jobs across various data warehouse projects, and very happy with it.
- Very intuitive design environment - super quick to learn.
- Components map closely to SQL capabilities in Redshift.
- Easy to view the SQL generated.
- Sample, row count, and data lineage make it easy to unit test as you build.
- Powerful management options via REST API (orchestration, Github integration, etc.).
- Native AWS integration.
- Aggressive release cycle, constantly adding new features.
- Easy to document in-workflow.
- Lots of third party integrations.
- User concurrency/licencing model (simple, but can't be customised to fit your need).
- Exception handling could be improved.
- A single Task View would be helpful for multi-environment/project support.
Responsive to User Needs
Gets better and better with each release. Keep up the good work!
I've been using Matillion for about 2 and a half years. I've seen the software improve so much in that time. If I ever found a feature to be lacking, it would be included within the next couple release cycles. The software really seems to adapt as user needs have been evolving. Some of my favorite recent features are the ability to configure some components with just text (really saves me time) and the many Grid orchestration components which allow me to greatly reduce the complexity of the orchestration jobs.
UX is sometimes lacking or inconsistent. For example, let's say my goal is to replace an existing job with a new job of the same name. I delete the old job, and import the new one with the same name. All Orchestration components that used that job will understand to use the new job, but for some reason the scheduler doesn't.
Also, jobs will have validation errors simply because the components haven't been validated (grey borders). It can be confusing because you may be searching for an error in your work when all you actually have to do is revalidate the job.
Great for pipelining data to warehouse/lake, not getting it back
It was easy to get set up, and it worked great for pushing data into Snowflake as a data warehouse. However, we quickly realized that we would need yet another tool to get the data back out to the source systems to synchronize/integrate it. We could sort of do it with Matillion, but it required a lot of custom programming and was not very intuitive. We ended up using a different iPaaS tool that could handle traffic/integration in both directions.
It is extremely fast and easy to pull data from various sources and pipe it into Snowflake. The graphic interface lets this happen without programming.
There is almost no ability to get data back from your warehouse into the other systems to synchronize the data. It's great to have it all in the warehouse, but it seems pretty critical to have that data flowing back to other systems that are part of your environment.
The pricing model is frustrating. You are billed for when the machine instance is ON, not when it is actively doing something. So, if you have a couple hours of job that run in a day, you have to shut down the machine to save money the rest of the day. With it shut down, development can't happen. So, we had to turn it on and off all the time. It would be much better if they billed as it is used, not as it is on.
Matillion is a great out of the box product with minimal requirements. You spin the machine up, allow your database's firewall to communicate with Matillion. Start creating jobs, schedule them, and sit back. It helps you focus on visualizing your data.
It is a pay for what you use type of software. Most other ETL platforms are a flat monthly fee, sometimes at crazy prices. The pay for what you use model makes it easy to justify cost to leadership. You can schedule jobs to make sure you maximize your time.
The user experience feels a little dated. Some things could be better displayed/ explained.
The collection of components; especially the integration components that hook into a very popular CRM and another quite popular ERP.
The python component. Depending on the box where your instance is hosted, this component will drag along like a broken train or move at the speed of shooting star.
Need on the fly connectivity to different instance of Redshift
I like the bunch of API integrated in Matillion it helps me to directly grab the data from different sources without manually extra and import in my database.
I am using Matillion for Redshift from last more than 2 years. When i setup a project in Matillion with connection to a Redshift cluster, sometimes there is a need to extract the data from other instance of Redshift while running job in Matillion. In that case there is straight forward way of getting the data from other instance of Redshift like you have comments to extract the data from databases like SQL Server, Oracle etc. instead of first i need to run another job by connection second Redshift cluster and extract the records after that run different job to load the extracted records from S3 to main Redshift cluster.
Matillion - orchestrating complex ELT for analytics
Our business requires us to ingest large amounts of data from customers on a frequent basis. Once typical ELT has been performed, we create customized data stores which are used in a variety of ways including modeling, machine learning and a SaaS front end. Matillion has been instrumental in our ability to move this processing from file systems to Big Query where our performance has moved from days to hours.
- Provides great framework for visualizing complex queries with business users that are not SQL savvy
- Logical flow of Transformation builder allowed quick transformation from file based to database based ELT
- Just works! We have had very few issues with the tool over the 2 years I have been using it.
- Would like a more robust scheduler
- Would like built-in messaging (e.g. for job failures/success) to be able to email out. This is not provided in the version for Big Query that we are running on GCS.
- Better source code control (at the transformation/orchestration level, rather than the whole project)
- Always could use additional documentation in how to use some of the more complex parts of Matillion.
Our experience with Matillion has been great. We've used to greatly improve and automate our data intake and transformation process.
I appreciate the ease of use of the tool. It is very simple and intuitive to build data pipelines using Matillion.
It's necessary to perform out own server maintenance to support the tool, which isn't too difficult, but can be a pain at times. If this were a hosted solution that would be easier.
I really like Matillion ETL but I was a bit disappointed with some of its limitations and quirks, like when Twitter updated his API all my jobs stopped working and the only thing I could do was develop everything again without using the Matillion component.
When I was trying to make upsert procedures on RDS Databases, I was thrilled that the Matillion component has that option, bit it simply didn't work. It seems like these functionalities haven't got the same attention of the others, I contacted the Matillion support which is simply great and these things are still going to be fixed.
When I wanted specific libraries of Python, I couldn't get it working on my instance because pip was out of date and it couldn't be updated. I tried to update and I messed up with my instance.
Overall, Matillion is great, when you're creating a flow of data migration and you need to parse the data before and do some automations, but there are some cases when you need something a little more specific, that's when it becomes a pain.
The ease to use, Matillion has a great set of tools that make complex and difficult process more simple and fast to develop.
There are many tools in Matillion that doesn't have enough attention and because of that these tools are fragile, while some of them do not work.
I like working on matillion , but this tool can be made more flexible by providing some of the additional features.
1) Most of the components are user friendly.
2) Development of ETL orchestrations and transformation consumes less time.
3) Advanced features are available in some of the components makes the complex scenarios achievable.
1) OAuth document does not provide the details of proper permissions and access levels on account_id or client_id which are required for a connector.
2) Differentiation of naming convention of metrics between the console and matillion data model , mapping document of these naming conventions is not available.
3) Connector for Outbrain is not available.
4) Indirect file loading concept is not available , for example if we have five files with same structure , reg ex can not be used to load all files in a single component(s3 load,s3 put or excel)
and need to use the file iterator.
5) Loss of properties when the component configuration is changed from Basic to Advanced features , ideally the component should include all the properties of basic and then additional setting should be provided.
Matillion for Data Engineers
An easy to use tool that has allowed me to build a fully automated data warehouse to report to the business, it was easy to learn and implement, and as difficult as SQL to master. I think the real complexity comes from optimising complex workflows, the more T you need in the ETL/ELT process, the more Matillion falls down. This being said it is a very good tool for 90% of your work.
It is easy to use and quick to get started, there are many inbuilt APIs and transformations that integrate it across a broad suite of systems, and enable you to transform data for the business without having to spend too much time thinking about scripts or bespoke batch files.
The filter transformer is pretty minimal considering the power of the SQL it is utilising, it cannot use compound logic or reference other columns/objects in the query.
There is also optimisation issues in very complex workspaces.
Both of these have easy to use workarounds but it is unfortunate it is not supported more at surface level.
Matillion is an Easy To Leverage (ETL) solution!
- Easy to orchestrate an ETL pipeline
- Integration with Redshift and other AWS services provides convenience
- Support team is easy to reach and have relatively quick response times
- They also support the other 2 popular cloud data warehouses, Snowflake and BigQuery
- Some components are available out to the box for easy integration
- Task history is useful for debugging and monitoring current state of ETL pipelines
- A GUI is provided for implementing complicated workflows, that can be easily followed
- Since it's mainly GUI based, it becomes difficult for data engineers to compare changes between old and new versions of an ETL.
- There are hard limitations for the number of concurrent users. This is part of the pricing model which is the reason.
- Integration with GIT is not yet available but we heard that it would be coming in the near future.
Easy to learn and use. The knowledge from other products is transferable. The documentation is amazing (one probably cannot understand how good without the experience). Support is always available and of good quality.
There is no restart-from-point-of-failure option for failed loads. Everything has to start from scratch or remaining jobs need to be run manually.
Matillion when used with Google BigQuery
We are now using Matillion as our TL tool to load into our Google Data Lake which will be the source of all our BI reporting data (replacing SQL servers).
Matillion is easy to install (Google Marketplace, one click build), easy to use initially, sometimes lacks a few specialist bits (usually around new things in the Google estate), but theses can be overcome by using scripts or sql statements and I am sure some will become components in their own right.
Matillion support is very good - I usually get a response within the hour
From an engineers point of view one of the selling points for Matillion (apart from the absolute simplicity of use - at least to start with) is that it produces its own documentation !!
Sometimes the error messages can be confusing , but this is often down to the messages coming from BigQuery.
Its also not always obvious that it is producing SQL statement 'under the hood' and that can mean breaking sql statement size limits without realising it
Review of Matillion as ETL cloud Engineer
It was great working as well learning experience , hope Matillion will come up as industry leader.
Matillion integration with AWS services is flawless and has real time impact in batch jobs pulling data from on-premises to cloud platform.
Restricting the number of user can access Matillion, Lot's of time huge data set get stuck and fail with many other issues lastly better full log specifically for Developer to debug the issues.
Geographically disparate data to the cloud in 3 days flat
Piping and transformation billions of rows of data from geographically dispersed machines into Snowflake
* If you've done any type of ETL work before, this will be a breeze. If you haven't it might take a few hours as opposed to minutes to get up and running
* The trial is full featured, get started today without waiting
* The software is on a EC2 instance in your own VPC in AWS. If you have company policies for how/where data can be processed -- you have complete control
* You can pay by the hour or the year, no hassle no fuss!
If you do encounter an issue the support organization is amazing. All questions were answered the same day, many within the hour.
More examples to work through on the transformation components would be helpful, but again... it's pretty easy
We're using Matillion for all the ETL in our environement: 3000 employees, around 500 BI users, around 30 different systems that we connected to. We were able to switch some existing jobs from Talend and create a whole lot more in just 6 minutes.
+ Fast & easy to use
+ Completely embedded into AWS environement
+ Comes light, it's not overloaded with stuff no one uses
+ Some really innovative features
- Sometimes feels like not "fully under control", e.g. when a component runs into an OutOfMemory, the whole environement crashes. And there are different ways to trigger that...
- Scheduling overview could be more sophisticated, it's kind of hard to have a proper overview there
- Variable handling seems sometimes to be a bit difficult, e.g. when setting it in one process, it's not always set in sub-jobs
- In some cases, one can see that Matillion is still "under development": Same features are available in some components, but are missing in others
Great ETL product
Matillion is very intuitive.
The GUI is easy to use and fairly easy to learn.
The support team is great and they respond to your questions quickly.
I like how the Matillion team is frequently looking to add new functionality and features. The new keyword search functionality is very helpful.
Matillion works great extracting data from a variety of databases and is very efficient unloading onto S3 or Amazon Redshift. However, when we attempted to use it to unload large datasets to Microsoft SQL Server it was very slow and we experienced performance issues.
A Great Cloud-based Tool
I like it overall. It is easy to quickly learn and get up to speed. It is made to work in the environment I'm working in (AWS) and has a nice, clean browser-based interface to build and schedule jobs in EC2. Email-based support is responsive. However, I sometimes feel it is too expensive compared to alternatives for the value it adds and the ELT model, while beneficial in many ways, hamstrings the component feature-set.
Well integrated with AWS. Great browser-based interface. Develop in the same place where the code will run, as opposed to other solutions I use that develop in Windows and run in Linux. Easy to schedule.
I have mixed feelings on the overall value. The hourly rate really adds up over time. This is probably my biggest misgiving. Otherwise, I sometimes get frustrated with the ELT model because it means if the feature you need isn't supported within Redshift, it's up to you to create Python scripts or Bash scripts to enable it.
Fast track to results
Very helpful support which takes care of all our problems, especially with "stupid" beginner questions.
Very easy to use and very easy to get results. Fast implementation cycles compared to other tools. Best is to loop over variables which enables simple but powerful solutions.
Sometimes more flexibilty on specific tasks, e.g. Excel import or csv.
SalesLoft Matillion for Data Warehouse
Evaluating an ETL tool is quite a challenge. With a wide variety of options in the market, it is difficult to know where to start. We shortlisted three different ETL tools based on the capabilities to perform full load, incremental load with change data capture (CDC), process chains, job scheduling and monitoring, job success/failure notification to name a few most important criteria. With a very intuitive development interface, Matillion ETL for Redshift came out as a winner.
Quick development capabilities to perform full load, incremental load with change data capture (CDC), process chains, job scheduling and monitoring, job success/failure notification, customer support to name a few
Wish to see Matillion having Github control
Matillion ETL For Medium to Large Company
Matillion has worked incredibly well answering our business needs. We have integrated it into our core business and use it daily with nearly no issues. We began with Matillion ETL for Redshift but soon switched to Matillion ETL for Snowflake and have had incredible success with it thus far. I recommend this ETL tool to any company of any size as it can be scaled to any business needs and you only pay for what you use.
Matillion is easy to integrate into our snowflake system. We use it to build and create materialized tables and views, run scheduled builds, and have used it to create CDC processes. It is slick and intuitive and has a great user interface.
The browser and platform crashes occasionally, about once a week. The account will log you out, but it works in real time so it usually does not delete any of your work.
Good ease of use, should improve customer service
With the tools given most of the business problems can be solved(not in an efficient way though if you consider SnowFlake), but there's a lot of room for improvement on the customer service side. Also, having the same functionalities for both RedShift and SnowFlake would make it a lot better, for now there are more functionalities and components for RedShift.
The tool was very easy to use and was intuitive most of the time. The functions and components for RedShift were amazing even though I didn't use them
1. Initially the customer service was good, calls were being scheduled either on the same day or on the next day. But once you start using it, support calls would take weeks. Sometimes support will say they'll get back to you but they don't.
2. In one particular instance, a support executive said they would get back to me because they couldn't find an immediate solution, but even after 3 weeks there was absolutely no response on their side. I got a reply after I followed up again though.
3. There are good amount of components for RedShift which are not available for SnowFlake, I hope they would incorporate them as soon as possible.
Matillion is working great!
We are using Matillion to pull API JSON data from various systems and push that data into Snowflake. Matillion is working rock solid.
I don't have to babysit the software. Using automation and integration with AWS, I can get messages if there are issues with any orchestration jobs.
Getting the API pulls working was a little clunky, but it came together relatively quickly utilizing Matillion's support. Their support team is fantastic!