I’ve done a lot of work with Apache Airflow over the last two years, but my experience of how people use it is limited to the teams I’ve worked on and a few glances through talks I’ve seen. I have plenty of ideas of things I’d like to improve, but I have no idea what might be bugging everyone else.
So last week (okay, the week before actually. I got busy) I sent out this tweet
Do you use @ApacheAirflow? Want to help me get an indication of how people use it? Mind taking a (very short, 7 question) survey?
Thank you to the 152 people who responded! As promised here is a summary of the results.
This post doesn’t really have a conclusion as such - I’ve just presented and summarized the results.
Before we go any further, a re-cap of the questions I asked. These were picked fairly randomly and I tried to make it quick to answer so didn’t ask too many.
- On a scale of 0-10 how likely are you to recommend Apache Airflow? (0 being not at all)
- How do you expect your use of Airflow to evolve in 2019? Increase, Stay about the same, Not sure yet, Decrease
- How many active DAGs do you have in your Airflow cluster(s)? 1—5, 6—20, 21—50, 51+
- Roughly how many Tasks do you have defined in your DAGs? 1—10, 11—50, 51—200, 201+
- What executor do you use? Sequential, Local, Celery, Kubernetes, Dask, Mesos
- What would you like to see added/changed in Airflow for version 2.0 and beyond? Free text input
- Anything else you’d like to mention? Free text input
1. Airflow’s Net Promoter Score
The average score was 8.3, which is pretty good (though the channels I used to find respondents is probably going to impart a large chunk of selection bias to the results. Still, a pretty good figure, and there were some “detractors”, and I’m glad they responded too!)
NPS = Percent_of_9_and_10s - Percent_of_0_to_6s NPS = 70/152 - 17/152 NPS = 0.348684
(65 responses, or 42% were in the “passive” 7-8 range.)
A Net Promoter Score of 35 is good (0—50, 50+ is fantastic), though I have no idea what to compare this too - an OSS software NPS is an entirely different to the score for a commercial company.
Still, 88% of people had a 7+ rating, which I think is pretty good.
2. Change of use in Airflow for 2019
|Stay about the same||28 responses||18.4%|
|Not sure yet||4 responses||2.6%|
Interestingly only one of the detractors (people who rated 0—6) expected their use to decrease, and 9 (of the 17) were increase.
3 and 4. How many DAGs and Tasks people have
- Some people asking for a Kubernetes Executor - well there is one!
5. What executor do you use?
It’s probably no surprise that many people use the Celery executor —- it’s well written about and the “default” horizontal scaling approach. A few people using the Sequential executor commented on why - that they mostly just use APIs from their tasks (EMR, or Cloud Dataproc etc.) so their tasks don’t present a heavy load.
That’s it for the quantitative answers. Now for the harder work of summarizing the free-text fields.
6. What would you like to see added/changed in Airflow for version 2.0 and beyond?
To be able to summarize these answers in any useful format I’ve had to try and classify the responses given. For each response I classified it as against the following categories and sub-categories. In total 70% of the responses had
Scheduler - 23 comments
High-availability or run multiple schedulers: 8 comments
Performance of scheduler: 8 comments
Ed: Comments about the CPU use of the scheduler when running, or the time it takes the scheduler to queue tasks
Reparsing of DAG files: 5 comments
The scheduler currently re-parses the DAG files in a fairly tight loop, which can be a bit heavy on external systems if you have a dynamic DAG.
Improvements to SubDAGs: 2 comments
General requests for “improve subdags”. Ed: I agree, and I’m surprised more people didn’t ask for this.
Webserver and WebUI - 39 comments
Accessibility: 3 comments
Colour blind/high contrast mode. General accessibility improvements. Absolutely, we should be better about this.
User Experience: 11 comments
Lots of comments around asking for a “Better UI” or a “Cleaner UI”
Performance: 7 comments
Comments about the UI being slow - especially for large DAGs or a large number of DAGs.
The Web server shouldn’t have to parse the DAGs. Ed: Agreed, and AIP-12 will go a large way towards that
Auto-updating: 3 comments
Having to refresh the page to see tasks changing state is so 2001 ;)
Ed: this would make a huge difference to the feel of the UI, but might need larger architectural changes to make happen. Sadly
Operational Visibility:: 2 comments
Requests to make it easier to see that state of the whole Airflow system from within the UI - i.e. helping workout why tasks in a DAG might not be progressing etc.
Ed: people after my own heart!
Timezone handling: 5 comments
Better handling of Timezones in the UI, specifically better support for local timezone. Ed: not clear if “local” means the viewers timezone, or just the configured timezone - i.e. do people access Airflow from multiple TZs?
Misc Feature Request: 8 comments
Comments that didn’t fit else where - things like parameterized DAG trigger from UI, more control, keyboard shortcuts, grouping/collapsing rows
Core - 15 comments
The “core” of Airflow, excluding the scheduler or the webserver.
Plugins: 4 comments
Requests for clearer defined plugin architecture, splitting Airflow into core and plugins. Ed: they may not need to be plugins to split, just python modules would work
More Operators: 11 comments
Requests for more operators/sensors. One good request was to have “composable” operators to explosion of XtoY operators. Ed: this would be nice! If someone wants to start an Airflow Improvement Proposal for this that would be ace.
Pull Request review/merge time - 3 comments
Three people commented about how long it takes to get PRs reviewed or merged. Ed: Absolutely, and we’d love to get through them quicker, but there is only so much time the volunteer-based committers can spend on this in a day without getting fired ;)
DAGs - 16 comments
Inter-DAG dependencies: 3 comments
A better way of declaring cross-dag dependencies. Ed: None of the comments specifically said what the current ExternalTaskSensor was lacking.
Event-based Sensors: 4 comments
The ability to sensors to respond to external events without polling. Ed: the new mode="reschedule" on sensors goes a little way to helping with this, but this could still be improved.
Versioned DAGs: 4 comments
Asking for better handling of DAGs as they change over time.*Ed: Again* AIP-12 will go a large way towards that
Misc: 4 comments
Various DAG API changes such as more flexibility in retry, SLA, timeout. Better isolation between DAGs Ed: PythonVirtualEnvOperator might help a little bit with this.
Documentation - 21 comments
Lots of requests for better docs Ed: yes please!*, many mentioning “best practice” around deployment, upgrade process etc. Clearer write ups of what new features each release brings.
Kubernetes - 10 comments
Better/tighter Kubernetes integration. Easier deployments of DAGs on Kube. Further customization of pods that are run.
Ed: Some comments like “integration with Kubernetes” probably ties back to the previous point about docs - we have a Kubernetes executor and PodOperators too. Maybe people don’t know about them
Other - 20 comments
Improved HTTP API: 5 comments
Calls for better/more fully-featured HTTP API - anything you can do via Web UI or CLI should be possible via HTTP API too. Ed: Totaly!
Test Framework for end users: 3 comments
Three people asked for “ways to test DAGs locally” or variations of that. Ed: Bas at GoDataDrvien wrote https://blog.godatadriven.com/testing-and-debugging-apache-airflow which provides some useful tips.
Things that didn’t fit elsewhere, or didn’t deserve their own category: “Better security” Ed: yes, security could always be improved, but what specifically?”, multi-tenant clusters Ed: RBAC helps a tiny bit there, execution_date is confusing to new-comers, Airflow should be on the Amazon Marketplace, etc.