Unknown Techhttps://ash.berlintaylor.com/2022-09-26T11:46:08+01:00Setting env vars automatically in Virtual Environments2022-09-26T00:00:00+01:002022-09-26T11:46:08+01:00Ash Berlin-Taylortag:ash.berlintaylor.com,2022-09-26:/writings/2022/09/automatic-env-vars-virtualenv/<p>As part of my development flow on Airflow I use <a class="reference external" href="https://virtualenvwrapper.readthedocs.io/en/latest/">virtualenvwrapper</a> and a long time ago I set up the ability to automatically set environment variables whenever the virtualenv is activated, and to unset them when leaving. (I think this trick would probably work with a few small changes for …</p><p>As part of my development flow on Airflow I use <a class="reference external" href="https://virtualenvwrapper.readthedocs.io/en/latest/">virtualenvwrapper</a> and a long time ago I set up the ability to automatically set environment variables whenever the virtualenv is activated, and to unset them when leaving. (I think this trick would probably work with a few small changes for other venv tools and isn’t that specific to virtualenvwrapper).</p>
<p>This trick lets me create an “env” file so that I can easily switch between different configs as I change virtualenvs, super useful as I switch back and forth between versions or have multiple envs in parallel when I’m testing performance etc.</p>
<p>For example my “main” dev virtualenv has <tt class="docutils literal"><span class="pre">~.virtualenvs/airflow/share/auto-env.zsh</span></tt> containing:</p>
<div class="highlight"><pre><span></span><span class="nv">AIRFLOW__CORE__SQL_ALCHEMY_CONN</span><span class="o">=</span>postgresql:///airflow
<span class="nv">AIRFLOW__DATABASE__SQL_ALCHEMY_CONN</span><span class="o">=</span>postgresql:///airflow
</pre></div>
<div class="section" id="how-it-works">
<h2>How it works</h2>
<p>In <tt class="docutils literal"><span class="pre">~/.virtualenvs</span></tt> (the path I have configured to keep my environments) there are a set of “hook scripts”, and I tweaked two of them in particular:</p>
<p><tt class="docutils literal">postactivate</tt>:</p>
<p>This one is fairly straight forward, but makes use of <tt class="docutils literal">set <span class="pre">-a</span></tt> which automatically exports any variable set until <tt class="docutils literal">set +a</tt> is encountered. This just means the env file doesn’t need <tt class="docutils literal">export</tt> on every line — just a bit simpler. (<tt class="docutils literal">set <span class="pre">-a</span></tt> works on Bash as well as <span class="caps">ZSH</span>).</p>
<div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/zsh</span>
<span class="c1"># This hook is sourced after every virtualenv is activated.</span>
<span class="k">if</span> <span class="o">[[</span> -e <span class="s2">"</span><span class="nv">$VIRTUAL_ENV</span><span class="s2">/share/auto-env.zsh"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
<span class="nb">set</span> -a
<span class="nb">source</span> <span class="s2">"</span><span class="nv">$VIRTUAL_ENV</span><span class="s2">/share/auto-env.zsh"</span>
<span class="nb">set</span> +a
<span class="k">fi</span>
</pre></div>
<p><tt class="docutils literal">predeactivate</tt>:</p>
<p>This script is a bit more complex, but it goes and “parses” the env file (in a very simplistic manner) looking for the name of all the env vars we set, and then unsets them.</p>
<div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/zsh</span>
<span class="c1"># This hook is sourced before every virtualenv is deactivated.</span>
<span class="nv">autoenv</span><span class="o">=</span><span class="s2">"</span><span class="nv">$VIRTUAL_ENV</span><span class="s2">/share/auto-env.zsh"</span>
<span class="k">if</span> <span class="o">[[</span> -e <span class="s2">"</span><span class="nv">$VIRTUAL_ENV</span><span class="s2">/share/auto-env.zsh"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
<span class="k">for</span> variable <span class="k">in</span> <span class="k">$(</span>grep -oP <span class="s1">'^[a-zA-Z_][a-zA-Z-1-9_]*(?==)'</span> <span class="s2">"</span><span class="nv">$autoenv</span><span class="s2">"</span><span class="k">)</span><span class="p">;</span> <span class="k">do</span>
<span class="nb">unset</span> <span class="s2">"</span><span class="si">${</span><span class="nv">variable</span><span class="si">}</span><span class="s2">"</span>
<span class="k">done</span>
<span class="k">fi</span>
</pre></div>
</div>
Apache Airflow User Survey 2019 — Results2019-02-25T00:00:00+00:002022-09-26T10:55:31+01:00Ash Berlin-Taylortag:ash.berlintaylor.com,2019-02-25:/writings/2019/02/airflow-user-survey-2019/<p>I’ve done a lot of work with Apache Airflow over the last two years, but my experience of how people use it is limited to the teams I’ve worked on and a few glances through talks I’ve seen. I have plenty of ideas of things <em>I’d …</em></p><p>I’ve done a lot of work with Apache Airflow over the last two years, but my experience of how people use it is limited to the teams I’ve worked on and a few glances through talks I’ve seen. I have plenty of ideas of things <em>I’d</em> like to improve, but I have no idea what might be bugging everyone else.</p>
<p>So last week (okay, the week before actually. I got busy) I sent out this tweet</p>
<blockquote>
<p>Do you use @ApacheAirflow? Want to help me get an indication of how people use it? Mind taking a (very short, 7 question) survey?</p>
<p><a class="reference external" href="https://ashberlintaylor.typeform.com/to/hIO0Ks">https://ashberlintaylor.typeform.com/to/hIO0Ks</a></p>
<p>Ta!</p>
<p class="attribution">—<a class="reference external" href="https://twitter.com/AshBerlin/status/1096083434538180608">https://twitter.com/AshBerlin/status/1096083434538180608</a></p>
</blockquote>
<p>Thank you to the 152 people who responded! As promised here is a summary of the results.</p>
<p>This post doesn’t really have a conclusion as such - I’ve just presented and summarized the results.</p>
<div class="section" id="the-questions">
<h2>The questions</h2>
<p>Before we go any further, a re-cap of the questions I asked. These were picked fairly randomly and I tried to make it quick to answer so didn’t ask too many.</p>
<ol class="arabic simple">
<li><strong>On a scale of 0-10 how likely are you to recommend Apache Airflow? (0 being not at all)</strong></li>
<li><strong>How do you expect your use of Airflow to evolve in 2019?</strong> Increase, Stay about the same, Not sure yet, Decrease</li>
<li><strong>How many active DAGs do you have in your Airflow cluster(s)?</strong> 1—5, 6—20, 21—50, 51+</li>
<li><strong>Roughly how many Tasks do you have defined in your DAGs?</strong> 1—10, 11—50, 51—200, 201+</li>
<li><strong>What executor do you use?</strong> Sequential, Local, Celery, Kubernetes, Dask, Mesos</li>
<li><strong>What would you like to see added/changed in Airflow for version 2.0 and beyond?</strong> Free text input</li>
<li><strong>Anything else you’d like to mention?</strong> Free text input</li>
</ol>
</div>
<div class="section" id="the-results">
<h2>The results</h2>
<div class="section" id="airflow-s-net-promoter-score">
<h3>1. Airflow’s Net Promoter Score</h3>
<p>The average score was 8.3, which is pretty good (though the channels I used to find respondents is probably going to impart a large chunk of selection bias to the results. Still, a pretty good figure, and there were some “detractors”, and I’m glad they responded too!)</p>
<p>> <span class="caps">NPS</span> = Percent_of_9_and_10s - Percent_of_0_to_6s
> <span class="caps">NPS</span> = (70 - 17)/152
> <span class="caps">NPS</span> = 0.348684</p>
<p>(65 responses, or 42% were in the “passive” 7-8 range.)</p>
<p>A Net Promoter Score of 35 is good (0—50, 50+ is fantastic), though I have no idea what to compare this too - an <span class="caps">OSS</span> software <span class="caps">NPS</span> is an entirely different to the score for a commercial company.</p>
<p>Still, 88% of people had a 7+ rating, which I think is pretty good.</p>
<table border="1" class="docutils">
<colgroup>
<col width="27%" />
<col width="41%" />
<col width="32%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Rating</th>
<th class="head">Responses</th>
<th class="head">Precent</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>0</td>
<td>0</td>
<td>0%</td>
</tr>
<tr><td>1</td>
<td>0</td>
<td>0%</td>
</tr>
<tr><td>2</td>
<td>1</td>
<td>0.7%</td>
</tr>
<tr><td>3</td>
<td>2</td>
<td>1.3%</td>
</tr>
<tr><td>4</td>
<td>3</td>
<td>2%</td>
</tr>
<tr><td>5</td>
<td>2</td>
<td>1.3%</td>
</tr>
<tr><td>6</td>
<td>8</td>
<td>5.3%</td>
</tr>
<tr><td>7</td>
<td>14</td>
<td>9.2%</td>
</tr>
<tr><td>8</td>
<td>51</td>
<td>33.6%</td>
</tr>
<tr><td>9</td>
<td>28</td>
<td>18.4%</td>
</tr>
<tr><td>10</td>
<td>42</td>
<td>27.6%</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="change-of-use-in-airflow-for-2019">
<h3>2. Change of use in Airflow for 2019</h3>
<table border="1" class="docutils">
<colgroup>
<col width="49%" />
<col width="33%" />
<col width="18%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Answer</th>
<th class="head">Responses</th>
<th class="head">Percent</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>Increase</td>
<td>118 responses</td>
<td>77.6%</td>
</tr>
<tr><td>Stay about the same</td>
<td>28 responses</td>
<td>18.4%</td>
</tr>
<tr><td>Not sure yet</td>
<td>4 responses</td>
<td>2.6%</td>
</tr>
<tr><td>Decrease</td>
<td>2 responses</td>
<td>1.3%</td>
</tr>
</tbody>
</table>
<p>Interestingly only one of the detractors (people who rated 0—6) expected their use to decrease, and 9 (of the 17) were increase.</p>
</div>
<div class="section" id="and-4-how-many-dags-and-tasks-people-have">
<h3>3 and 4. How many DAGs and Tasks people have</h3>
<p><strong>DAGs:</strong></p>
<table border="1" class="docutils">
<colgroup>
<col width="23%" />
<col width="46%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head"># DAGs</th>
<th class="head">Responses</th>
<th class="head">Percent</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>1—5</td>
<td>25 responses</td>
<td>16.6%</td>
</tr>
<tr><td>6—20</td>
<td>44 responses</td>
<td>29.1%</td>
</tr>
<tr><td>21—50</td>
<td>27 responses</td>
<td>17.9%</td>
</tr>
<tr><td>51+</td>
<td>55 responses</td>
<td>36.4%</td>
</tr>
</tbody>
</table>
<p><strong>Tasks:</strong></p>
<table border="1" class="docutils">
<colgroup>
<col width="26%" />
<col width="44%" />
<col width="30%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head"># Tasks</th>
<th class="head">Responses</th>
<th class="head">Percent</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>1—10</td>
<td>41 responses</td>
<td>27.3%</td>
</tr>
<tr><td>11—50</td>
<td>44 responses</td>
<td>29.3%</td>
</tr>
<tr><td>51—200</td>
<td>23 responses</td>
<td>15.3%</td>
</tr>
<tr><td>201+</td>
<td>42 responses</td>
<td>28%</td>
</tr>
</tbody>
</table>
<ul class="simple">
<li>Some people asking for a Kubernetes Executor - well there is one!</li>
</ul>
</div>
<div class="section" id="what-executor-do-you-use">
<h3>5. What executor do you use?</h3>
<table border="1" class="docutils">
<colgroup>
<col width="34%" />
<col width="41%" />
<col width="24%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Executor</th>
<th class="head">Responses</th>
<th class="head">Percent</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>Celery</td>
<td>95 responses</td>
<td>63.8%</td>
</tr>
<tr><td>Local</td>
<td>30 responses</td>
<td>20.1%</td>
</tr>
<tr><td>Kubernetes</td>
<td>16 responses</td>
<td>10.7%</td>
</tr>
<tr><td>Sequential</td>
<td>7 responses</td>
<td>4.7%</td>
</tr>
<tr><td>Dask</td>
<td>1 response</td>
<td>0.7%</td>
</tr>
<tr><td>Mesos</td>
<td>0 responses</td>
<td>0%</td>
</tr>
</tbody>
</table>
<p>It’s probably no surprise that many people use the Celery executor —- it’s well written about and the “default” horizontal scaling approach. A few people using the Sequential executor commented on why - that they mostly just use APIs from their tasks (<span class="caps">EMR</span>, or Cloud Dataproc etc.) so their tasks don’t present a heavy load.</p>
<p>That’s it for the quantitative answers. Now for the harder work of summarizing the free-text fields.</p>
</div>
<div class="section" id="what-would-you-like-to-see-added-changed-in-airflow-for-version-2-0-and-beyond">
<h3>6. What would you like to see added/changed in Airflow for version 2.0 and beyond?</h3>
<p>To be able to summarize these answers in any useful format I’ve had to try and classify the responses given. For each response I classified it as against the following categories and sub-categories. In total 70% of the responses had</p>
<div class="section" id="scheduler-23-comments">
<h4>Scheduler - 23 comments</h4>
<p><strong>High-availability or run multiple schedulers:</strong> 8 comments</p>
<p><strong>Performance of scheduler:</strong> 8 comments</p>
<p><em>Ed: Comments about the <span class="caps">CPU</span> use of the scheduler when running, or the time it takes the scheduler to queue tasks</em></p>
<p><strong>Reparsing of <span class="caps">DAG</span> files</strong>: 5 comments</p>
<p>The scheduler currently re-parses the <span class="caps">DAG</span> files in a fairly tight loop, which can be a bit heavy on external systems if you have a dynamic <span class="caps">DAG</span>.</p>
<p><strong>Improvements to SubDAGs:</strong> 2 comments</p>
<p>General requests for “improve subdags”. <em>Ed: I agree, and I’m surprised more people didn’t ask for this.</em></p>
</div>
<div class="section" id="webserver-and-webui-39-comments">
<h4>Webserver and WebUI - 39 comments</h4>
<p><strong>Accessibility:</strong> 3 comments</p>
<p>Colour blind/high contrast mode. General accessibility improvements. Absolutely, we should be better about this.</p>
<p><strong>User Experience:</strong> 11 comments</p>
<p>Lots of comments around asking for a “Better <span class="caps">UI</span>” or a “Cleaner <span class="caps">UI</span>”</p>
<p><strong>Performance:</strong> 7 comments</p>
<p>Comments about the <span class="caps">UI</span> being slow - especially for large DAGs or a large number of DAGs.</p>
<p>The Web server shouldn’t have to parse the DAGs. <em>Ed: Agreed, and</em> <a class="reference external" href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB"><span class="caps">AIP</span>-12</a> <em>will go a large way towards that</em></p>
<p><strong>Auto-updating:</strong> 3 comments</p>
<p>Having to refresh the page to see tasks changing state is so 2001 ;)</p>
<p><em>Ed: this would make a huge difference to the feel of the <span class="caps">UI</span>, but might need larger architectural changes to make happen. Sadly</em></p>
<p><strong>Operational Visibility:</strong>: 2 comments</p>
<p>Requests to make it easier to see that state of the whole Airflow system from within the <span class="caps">UI</span> - i.e. helping workout why tasks in a <span class="caps">DAG</span> might not be progressing etc.</p>
<p><em>Ed: people after my own heart!</em></p>
<p><strong>Timezone handling:</strong> 5 comments</p>
<p>Better handling of Timezones in the <span class="caps">UI</span>, specifically better support for local timezone. <em>Ed: not clear if “local” means the viewers timezone, or just the configured timezone - i.e. do people access Airflow from multiple TZs?</em></p>
<p><strong>Misc Feature Request:</strong> 8 comments</p>
<p>Comments that didn’t fit else where - things like parameterized <span class="caps">DAG</span> trigger from <span class="caps">UI</span>, more control, keyboard shortcuts, grouping/collapsing rows</p>
</div>
<div class="section" id="core-15-comments">
<h4>Core - 15 comments</h4>
<p>The “core” of Airflow, excluding the scheduler or the webserver.</p>
<p><strong>Plugins:</strong> 4 comments</p>
<p>Requests for clearer defined plugin architecture, splitting Airflow into core and plugins. <em>Ed: they may not need to be plugins to split, just python modules would work</em></p>
<p><strong>More Operators:</strong> 11 comments</p>
<p>Requests for more operators/sensors. One good request was to have “composable” operators to explosion of XtoY operators. <em>Ed: this would be nice! If someone wants to start an Airflow Improvement Proposal for this that would be ace.</em></p>
</div>
<div class="section" id="pull-request-review-merge-time-3-comments">
<h4>Pull Request review/merge time - 3 comments</h4>
<p>Three people commented about how long it takes to get PRs reviewed or merged. <em>Ed: Absolutely, and we’d love to get through them quicker, but there is only so much time the volunteer-based committers can spend on this in a day without getting fired ;)</em></p>
</div>
<div class="section" id="dags-16-comments">
<h4>DAGs - 16 comments</h4>
<p><strong>Inter-<span class="caps">DAG</span> dependencies:</strong> 3 comments</p>
<p>A better way of declaring cross-dag dependencies. <em>Ed: None of the comments specifically said what the current ExternalTaskSensor was lacking.</em></p>
<p><strong>Event-based Sensors:</strong> 4 comments</p>
<p>The ability to sensors to respond to external events without polling. <em>Ed: the new</em> <tt class="docutils literal"><span class="pre">mode="reschedule"</span></tt> <em>on sensors goes a little way to helping with this, but this could still be improved</em>.</p>
<p><strong>Versioned DAGs:</strong> 4 comments</p>
<p>Asking for better handling of DAGs as they change over time.*Ed: Again* <a class="reference external" href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB"><span class="caps">AIP</span>-12</a> <em>will go a large way towards that</em></p>
<p><strong>Misc:</strong> 4 comments</p>
<p>Various <span class="caps">DAG</span> <span class="caps">API</span> changes such as more flexibility in retry, <span class="caps">SLA</span>, timeout. Better isolation between DAGs <em>Ed: PythonVirtualEnvOperator might help a little bit with this</em>.</p>
</div>
<div class="section" id="documentation-21-comments">
<h4>Documentation - 21 comments</h4>
<p>Lots of requests for better docs <em>Ed: yes please!*</em>, many mentioning “best practice” around deployment, upgrade process etc. Clearer write ups of what new features each release brings.</p>
</div>
<div class="section" id="kubernetes-10-comments">
<h4>Kubernetes - 10 comments</h4>
<p>Better/tighter Kubernetes integration. Easier deployments of DAGs on Kube. Further customization of pods that are run.</p>
<p><em>Ed: Some comments like “integration with Kubernetes” probably ties back to the previous point about docs - we have a Kubernetes executor and PodOperators too. Maybe people don’t know about them</em></p>
</div>
<div class="section" id="alternative-ways-of-authoring-dags-5-comments">
<h4>Alternative ways of authoring DAGs - 5 comments</h4>
<p><em>Ed: these are <span class="caps">II</span>’m afraid low-priority for the Airflow core team. One of the selling points of Airflow is that the DAGs are Python code. This could be added via a plugin though</em></p>
<p><strong>Add a <span class="caps">DSL</span> (Domain Specific Language):</strong> 1 comment</p>
<p>A request to describe DAGs in <span class="caps">YAML</span>/<span class="caps">JSON</span> and then submit via the <span class="caps">API</span> - helpful for non-Python teams. <em>Ed: JustEat described something similar (without the <span class="caps">API</span>) in their Talk ait the London Airflow Meetup #1)</em></p>
<p><strong><span class="caps">GUI</span> editor for DAGs:</strong> 4 comments</p>
<p>Various “<span class="caps">UI</span> to edit from Web”, “drag-and-drop” etc.</p>
</div>
<div class="section" id="other-20-comments">
<h4>Other - 20 comments</h4>
<p><strong>Improved <span class="caps">HTTP</span> <span class="caps">API</span>:</strong> 5 comments</p>
<p>Calls for better/more fully-featured <span class="caps">HTTP</span> <span class="caps">API</span> - anything you can do via Web <span class="caps">UI</span> or <span class="caps">CLI</span> should be possible via <span class="caps">HTTP</span> <span class="caps">API</span> too. <em>Ed: Totaly!</em></p>
<p><strong>Test Framework for end users:</strong> 3 comments</p>
<p>Three people asked for “ways to test DAGs locally” or variations of that. <em>Ed: Bas at GoDataDrvien wrote https://blog.godatadriven.com/testing-and-debugging-apache-airflow which provides some useful tips.</em></p>
<p><strong>Miscelanous:</strong> 12</p>
<p>Things that didn’t fit elsewhere, or didn’t deserve their own category: “Better security” <em>Ed: yes, security could always be improved, but what specifically?”</em>, multi-tenant clusters <em>Ed: <span class="caps">RBAC</span> helps a tiny bit there</em>, <tt class="docutils literal">execution_date</tt> is confusing to new-comers, Airflow should be on the Amazon Marketplace, etc.</p>
</div>
</div>
</div>
Scaling/Transforming Culture — Breakfast Ops Jan 20192019-02-01T00:00:00+00:002022-09-26T10:55:31+01:00Ash Berlin-Taylortag:ash.berlintaylor.com,2019-02-01:/writings/2019/02/breakfast-ops-2019-jan-scaling-culture/<p>Last week I went along to “Breakfast Ops”, a semi-monthly event aimed at CTOs and other technical decision makers, run by <a class="reference external" href="https://www.scalefactory.com/">The Scale Factory</a>, and held under <a class="reference external" href="https://en.wikipedia.org/wiki/Chatham_House_Rule">Chatham House Rule</a> which says</p>
<blockquote>
When a meeting, or part thereof, is held under the Chatham House Rule,
participants are free to use …</blockquote><p>Last week I went along to “Breakfast Ops”, a semi-monthly event aimed at CTOs and other technical decision makers, run by <a class="reference external" href="https://www.scalefactory.com/">The Scale Factory</a>, and held under <a class="reference external" href="https://en.wikipedia.org/wiki/Chatham_House_Rule">Chatham House Rule</a> which says</p>
<blockquote>
When a meeting, or part thereof, is held under the Chatham House Rule,
participants are free to use the information received, but neither the
identity nor the affiliation of the speaker(s), nor that of any other
participant, may be revealed.</blockquote>
<p>The first topic discussed was about “Scaling/Transforing Culture”, mostly in a technical organisation but not limited to a technical team, and I found the discussion insightful — so I thought I’d share my notes.</p>
<div class="section" id="what-is-culture">
<h2>What is Culture</h2>
<p>We started off with asking what we even mean by “Culture” and a few people gave some pithy statements (knowing that any “simple” definition is overly simplistic and often wrong):</p>
<ul class="simple">
<li>Culture is the sum of all those little stories</li>
<li>Culture is how people behave when the boss isn’t in the room</li>
<li>Not something you can “scale”, only change</li>
<li><span class="dquo">“</span>Values” written up on a wall are a Lie.</li>
</ul>
</div>
<div class="section" id="psychology-001">
<h2>Psychology 001</h2>
<p>(Not even 101, as this is second hand, and based off hastily scribbled notes.)</p>
<p>One of the participants did a “psychology-heavy” Masters course recently and had shared insights I found useful.</p>
<p>The concept of “Othering” was mentioned as it plays a big role in the culture of a team - how that team treats “Others” (another team, customers, the public) plays a big role in culture.</p>
<p>Culture is what we accept, and what we demonstrate, and ultimately what we believe in. What we accept plays a big part in how culture changes as a team or organisation grows: if new hires exhibit different culture (say making lewd jokes) but no one on the team publicly objects: that is now accepted and part of the culture.</p>
<p>Similarly it doesn’t matter at all what people say, but the behaviours they demonstrate are what matters: if we say we care about testing, but the team lead’s changes never come with tests then the rest of the team will likely start copying this behaviour (or feeling resentful about the double standards!)</p>
<p>Changing someone’s belief without changing their values is hard, bordering on impossible. If you wish to change someone’s culture and beliefs then it is best to start with their values. One way suggested to achieve this was to think of it in terms of “Come with me on this” - and make it convincing. This approach hopefully works by changing peoples values first (i.e. it makes them value testing) which should then naturally change what they do.</p>
</div>
<div class="section" id="nationality">
<h2>Nationality</h2>
<p>The discussion then moved on to point out that Nationality also plays a big role in the culture of a team. One person gave an anecdote of how a distributed team containing English, Americans and Polish have different “default cultures”: the English and Americans would always speak a lot on calls, but the Pols were much quieter. This was worrying at first, but on speaking privately to the Polish team lead he was told “if we’re silent it means we agree, but don’t worry: if we think you’re wrong we’ll speak up!”.</p>
</div>
<div class="section" id="communication">
<h2>Communication</h2>
<p>Changing the culture of a company as it grows is also closely tied in to the difficulties and changing ways a growing company communicates: when there are 3 of you around a kitchen table the culture and comms is implicit, as you grow to 20 people everyone can know each other and a weekly all-hands meeting is a good way of sharing culture and communicating, but as the company grows to 50 or 100 people both communicating and sharing a culture get harder.</p>
<p>One large private company tried an interesting approach that they called “Culture first inductions”. The on-boarding process for every new hire (including senior director level(!) started with a mandatory six week period in the call center, after which they had to give a “good” presentation about what they learnt. (How good was defined was not mentioned). This change did wonders for the company’s bug rate and bug resolution time!</p>
<p>A counter-point to this example was mentioned: a large retail company had a similar policy where staff from head-office would have to spend a week on a shop floor. However rather than improving things (in this one example) it lead to immediate frustration as the speaker found plentiful examples of poor process and poor computer systems that would be trivial to improve, but senior leadership showed no appetite to listen or fix. This sort of problem might be why the company in the previous example required a “good” presentation.</p>
<p>It was suggested that <a class="reference external" href="https://amzn.to/2G2Xzu7">Freedom from Command and Control — by John Seddon</a> would be worth a read.</p>
<p>A suggestion for how to improve shred culture across a whole company would be draw a parallel from “DevOps” (before it became a job title, if a bad one) - if you can get all teams to work together it will help build shared culture.</p>
<p>That said, someone pointed out that different teams are going to by necessity have different cultures: “I don’t want my Finance team to move fast and break things!”.</p>
<div class="section" id="psychology-002-learn-from-counseling">
<h3>Psychology 002 - Learn from Counseling</h3>
<p>Since culture is closely tied in to comms and interpersonal relationships it was also suggested that managers should learn some of the basics of counselling, especially if they want to earn the trust of their reports (see the earlier point about “Come with me on this”)</p>
<ul>
<li><p class="first">Establish safe spaces.</p>
<p>Nothing else can really happen and you won’t really know how your report is fairing if they don’t feel safe.</p>
</li>
<li><p class="first">They may say they feel safe, but if you dig a bit deeper you may find out “I feel safe except…” - often a worry at home, or about money or a promotion.</p>
</li>
<li><p class="first">Once you have a safe space then you can help the report, and try to change values.</p>
</li>
</ul>
<p>They couldn’t recommend any specific resources/<span class="caps">URLS</span> (they couldn’t remember them off the top of their head) but they recommended looking up guides on:</p>
<ul class="simple">
<li><span class="dquo">“</span>Basic” guides into Counseling and Creating Safe Spaces,</li>
<li>Nonviolent Communication</li>
</ul>
</div>
</div>
<div class="section" id="interviewing">
<h2>Interviewing</h2>
<p>Someone if people had any tips for how to assess culture at the interview stage - they were careful to point out that they wanted to avoid the bias-ridden “culture fit test” which just ends up hiring middle-aged white guys.</p>
<p>One person gave an example of how they did it in their company:</p>
<ul class="simple">
<li>They ask questions to assess against their “High Performing Behaviours”</li>
<li>Conceptually they have replaced “Culture Fit” with “Value Fit”</li>
<li>And importantly: they use a fixed script for each candidate and assess everyone on the same criteria.</li>
</ul>
</div>
<div class="section" id="final-thoughts-and-further-reading">
<h2>Final Thoughts and Further Reading</h2>
<p>If you want to change your culture then start by looking at the common denominators (i.e. is your whole team men). An easy (sounding) change might be to introduce diversity.</p>
<p>The easiest to change the culture of a team is at hiring (each new hire will nudge the team culture) but it is possible to change culture while keeping the same people - just harder.</p>
<div class="section" id="books">
<h3>Books</h3>
<p><a class="reference external" href="https://amzn.to/2G2Xzu7">Freedom from Command and Control — by John Seddon</a></p>
<p><a class="reference external" href="https://amzn.to/2G0UoTG">The Art of Growing Through Feedback: A Practical Guide on How to Give and Receive Feedback Graciously — by Adrian Pei</a></p>
<p><a class="reference external" href="https://amzn.to/2Be3wQJ">The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers Hardcover — by Ben Horowitz</a></p>
</div>
<div class="section" id="talks">
<h3>Talks</h3>
<p><a class="reference external" href="https://www.youtube.com/watch?v=4yODalLQ2lM">Randical Candour — by Kim Scott</a></p>
<p><a class="reference external" href="https://www.youtube.com/watch?v=hZFShSjAhlQ">How to Break The Rules — by Dan North</a> (this one was in the pre-discussion so may not be so relevant to culture, but it came highly recommended).</p>
</div>
</div>
Extending user-data in Terraform modules2017-08-28T00:00:00+01:002022-09-26T10:55:31+01:00Ash Berlin-Taylortag:ash.berlintaylor.com,2017-08-28:/writings/2017/08/reusable-terraform-modules-extending-userdata/<p>A very common task when launching an instance in <span class="caps">AWS</span> is wanting to configure the instance in some way - maybe we want to install a few packages, download and launch the latest release of our project, or perhaps register with Chef or Puppet and continue configuring that way.</p>
<p>One option …</p><p>A very common task when launching an instance in <span class="caps">AWS</span> is wanting to configure the instance in some way - maybe we want to install a few packages, download and launch the latest release of our project, or perhaps register with Chef or Puppet and continue configuring that way.</p>
<p>One option might be to build an <span class="caps">AMI</span> with your desired behaviour baked in as a startup script, but often the parameters to these commands can subtly change from environment-to-environment/region-to-region, or we just don’t want to go to the effort of building and maintaining a custom <span class="caps">AMI</span>. </p>
<p>When you launch an instance on <span class="caps">AWS</span> you can give it a blob of “user-data”, up to 16kB of text that can be fetched from on the instance itself via the <span class="caps">AWS</span> metadata service at <code>http://169.254.169.254/latest/user-data</code> (this <span class="caps">URL</span> will only work on an <span class="caps">EC2</span> instance, 169.254.0.0/16 is an IPv4 “link-local” address, meaning it will never be routed over the public internet. Give or take.)</p>
<p>Putting the commands in user data is a good way to not have to 1) log in and run them manually (which I try to avoid in almost every circumstance), 2) build a custom <span class="caps">AMI</span> for each and every combo, or 3) pay <span class="caps">AMI</span> storage costs for an open source Terraform module.</p>
<p>If we wanted we could have our own startup task that examines the user-data and performs the actions/commands we specified. Luckily we don’t have to write it as it already exists.</p>
<p>It’s called <a href="https://cloudinit.readthedocs.io/en/latest/"><strong>cloud-init</strong></a>, and it comes pre-installed on almost every <span class="caps">AMI</span>. (This is the case for every <span class="caps">AMI</span> I’ve ever run or extended, including Debian, Ubuntu, Amazon Linux, and CentOS. I’ve yet to find an <span class="caps">AMI</span> it didn’t come pre-installed on in 6 years of using <span class="caps">AWS</span>.)</p>
<p>(cloud-init also works on <a href="https://docs.microsoft.com/en-us/azure/virtual-machines/linux/using-cloud-init">Azure</a> and <a href="https://cloud.google.com/container-optimized-os/docs/how-to/create-configure-instance#configuring_an_instance">Google Cloud</a> (or at least some bits of it? I haven’t used <span class="caps">GC</span>* myself yet), and <a href="https://cloudinit.readthedocs.io/en/latest/topics/datasources.html#datasource-documentation">many, many more</a> hosting platforms.)</p>
<p>cloud-init understands a few formats of user-data. If the user-data starts with <code>#!</code> it is treated as a simple command to run – i.e. put a shell script in your user-data and it will be at boot. For simple cases we could stop there, but we’re talking about extensible Terraform modules here.</p>
<p>Another format it understands is <code>#cloud-config</code> which is a <span class="caps">YAML</span> document that cloud-init processes through various <a href="http://cloudinit.readthedocs.io/en/latest/topics/modules.html">modules</a> – it can write files, run commands (on first boot only or on every boot) and lots more. Nothing that couldn’t be done via a shell script, but it’s just a slightly more “declarative” syntax and with better error handling.</p>
<p>I’m not going to go into too much detail into what cloud-init can do, as to cover any of it in enough detail would warrant its own post. (And I’m not sure it’s interesting to enough people.) If you want a non-trivial example of what cloud-init can do then check out <a href="https://github.com/terraform-community-modules/tf_aws_nat/blob/v1.0.0/nat-user-data.conf.tmpl"><code>nat-user-data.conf.tmpl</code> from tf_aws_nat</a> – this uses cloud-init too install and update packages, write files, and run commands.</p>
<h2 id="specifying-user-data-in-terraform">Specifying user-data in Terraform<a class="headerlink" href="#specifying-user-data-in-terraform" title="Permanent link">¶</a></h2>
<p>Giving an <span class="caps">EC2</span> instance user-data in Terraform is quite easy. If you want a simple value you can give the <code>user_data</code> argument a string literal, but in most cases it’s complex enough that you either want to use the <a href="https://www.terraform.io/docs/configuration/interpolation.html#file-path-"><code>file()</code> function</a>, or the <a href="https://www.terraform.io/docs/configuration/index.html">template_file data source</a> if you need to interpolate values.</p>
<p>For example this (simplified) example taken from the <code>tf_aws_nat</code> module will launch an instance with the user-data populated with the contents of the “nat-user-data.conf” file found along side the other .tf files:</p>
<div class="highlight"><pre><span></span><code>data "template_file" "user_data" {
template = "${file("${path.module}/nat-user-data.conf")}"
vars {
name = "${var.name}"
}
}
resource "aws_instance" "nat" {
# ...
user_data = "${data.template_file.user_data.rendered}"
}
</code></pre></div>
<p>The cloud-config provided in the module is quite powerful, but what if we want to allow the user of the module to also be able to customize the instance on boot? (say they want to install a monitoring agent, or configure user’s ssh keys etc.)</p>
<p>Luckily cloud-init can read from multiple user-data sections</p>
<h2 id="multiple-sections-of-user-data">Multiple sections of user-data<a class="headerlink" href="#multiple-sections-of-user-data" title="Permanent link">¶</a></h2>
<p>One of the formats that cloud-init understands is a multi-part <span class="caps">MIME</span> archive, (yes <span class="caps">MIME</span> as in email. It’s the same format used when sending attachments via email) and since Terraform v0.6.9 (released January 8, 2016) it has been able to produce them using the <a href="https://www.terraform.io/docs/providers/template/d/cloudinit_config.html">template_cloudinit_config data source</a>:</p>
<div class="highlight"><pre><span></span><code>data "template_cloudinit_config" "x" {
part {
content = "#cloud-config\n---\nruncmd:\n - date"
}
part {
filename = "init.sh"
content_type = "text/x-shellscript"
content = "echo Hi\ntouch /root/There\n"
}
}
</code></pre></div>
<p>The “rendered” property of the data source produces something that looks like this:</p>
<div class="highlight"><pre><span></span><code>Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
Mime-Version: 1.0
#cloud-config
---
runcmd:
- date
--MIMEBOUNDARY
Content-Disposition: attachment; filename="init.sh"
Content-Transfer-Encoding: 7bit
Content-Type: text/part-handler
Mime-Version: 1.0
echo Hi
touch /root/There
--MIMEBOUNDARY--
</code></pre></div>
<p>As we can see, we have two parts each with a bit of metadata. (If you are wondering I don’t think the filename has any bearing, it’s purely for our informational purposes.) cloud-init will process each part according it a predefined ordering – certain types first, then for each type the order in which it is defined.</p>
<p>Using this technique we could add a variable to our module so that we can have a complex cloud-config user data defined in the module, and still let the caller add an extra shell script to be run:</p>
<div class="highlight"><pre><span></span><code>variable "instance_boot_script" {
default = ""
}
data "template_cloudinit_config" "userdata" {
part {
content = <<EOF
#cloud-config
---
runcmd:
- [ wget, "http://example.org", -O, /tmp/index.html ]
EOF
}
part {
filename = "extra.sh"
content_type = "text/x-shellscript"
content = "${var.instance_boot_script}"
}
}
resource "aws_instance" "instance" {
# ...
user_data = "${data.template_cloudinit_config.userdata.rendered}"
}
</code></pre></div>
<p>This works, and probably gives enough control as the module caller could do anything they needed via a shell script, but it would be nice if they could also use a cloud-config format. With one extra option to the cloudinit_config data source this is possible.</p>
<h2 id="merging-cloud-init-sources">Merging cloud-init sources<a class="headerlink" href="#merging-cloud-init-sources" title="Permanent link">¶</a></h2>
<p>If cloud-init encounters two cloud-config parts it will merge them. Lets say we had the two cloud-config parts, this one in our module</p>
<div class="highlight"><pre><span></span><code><span class="nt">run_cmd</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">bash1</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">bash2</span><span class="w"></span>
</code></pre></div>
<p>and this one provided by the user</p>
<div class="highlight"><pre><span></span><code><span class="nt">run_cmd</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">bash3</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">bash4</span><span class="w"></span>
</code></pre></div>
<p>The intent here is probably to run <code>bash1</code>, <code>bash2</code>, <code>bash3</code>, <code>bash4</code> but by default cloud-init will just run <code>bash3</code> and <code>bash4</code>. This is because cloud-init follows a set of “merge rules”, one of which is when it finds a list/array it will replace the content. A few years ago this used to be the only option, and although it is still the default it is now possible to customize the merge behaviour.</p>
<p>There is a page in cloud-init’s docs about <a href="http://cloudinit.readthedocs.io/en/latest/topics/merging.html">merging user-data sections</a> manages to give lots of detail whilst at the same time failing to mention what the possible merge settings are, or what results they give. After a bit of searching and trial and error I found a merge behaviour I’m happy with:</p>
<blockquote>
<p><code>list(append)+dict(recurse_array)+str()</code></p>
</blockquote>
<p>This tells cloud-init that when it encounters a list it should append the new one to the old, when it encounters a list inside a dictionary it should use the previous list rule, and when it encounters a string it should use the default behaviour (replace the old value).</p>
<p>There are a couple of ways cloud-init can be configured, but the easiest for us is to specify this as a <span class="caps">MIME</span> header on the extra parts, which Terraform supports.</p>
<h2 id="putting-it-all-together">Putting it all together<a class="headerlink" href="#putting-it-all-together" title="Permanent link">¶</a></h2>
<p>Taking everything we’ve mentioned so far we can turn it into a mostly-complete Terraform module:</p>
<div class="highlight"><pre><span></span><code>provider "aws" {}
data "aws_ami" "ami" {
most_recent = true
owners = "099720109477" # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
variable "extra_userdata" {
default = ""
description = "Extra user-data to add to the default built-in"
}
variable "extra_userdata_type" {
default = "text/cloud-config"
description = "What format is extra_userdata in - eg 'text/cloud-config' or 'text/x-shellscript'"
}
variable "extra_userdata_merge" {
default = "list(append)+dict(recurse_array)+str()"
description = "Control how cloud-init merges user-data sections"
}
data "template_cloudinit_config" "userdata" {
part {
content = "${file("${path.module}/cloud-init.yml")}"
}
<span class="hll"> part {
</span><span class="hll"> filename = "extra.sh"
</span><span class="hll"> content_type = "${var.extra_userdata_type}"
</span><span class="hll"> content = "${var.extra_userdata}"
</span><span class="hll"> merge_type = "${var.extra_userdata_merge}"
</span><span class="hll"> }
</span>}
resource "aws_instance" "instance" {
ami = "${data.aws_ami.ami.id}"
instance_type = "t2.micro"
user_data = "${data.template_cloudinit_config.userdata.rendered}"
}
</code></pre></div>
<p>I’ve highlighted where we use the new input variables. This module would by default have just the built-in user data, but the caller can provide a value to the <code>extra_userdata</code> variable to add their own instance customization, like this:</p>
<div class="highlight"><pre><span></span><code>data "aws_region" "current" {
current = true
}
data "template_file" "extra-userdata" {
vars {
region = "${data.aws_region.current.name}"
}
# Generally you shouldn't use a heredoc as a template, but it's easier to show here.
template = <<EOF
---
runcmd:
- [ 'sh', '-c', 'echo export AWS_DEFAULT_REGION=$${region} >> ~ubuntu/.bashrc ']
EOF
}
module "mymod" {
source = ...
extra_userdata = "${data.template_file.extra-userdata.rendered}"
}
</code></pre></div>
<p>Nifty.</p>Patterns for extensible Terraform modules - AMI IDs2017-07-04T00:00:00+01:002022-09-26T10:55:31+01:00Ash Berlin-Taylortag:ash.berlintaylor.com,2017-07-04:/writings/2017/07/reusable-terraform-modules-ami-ids/<p>Terraform is my preferred tool for automating cloud infrastructure deployments. It’s not without it’s problems but in balance I still prefer it over any other tool I’ve tried. When re-useable modules were added fairly early on in the development of Terraform (sometime before 0.5.0 in …</p><p>Terraform is my preferred tool for automating cloud infrastructure deployments. It’s not without it’s problems but in balance I still prefer it over any other tool I’ve tried. When re-useable modules were added fairly early on in the development of Terraform (sometime before 0.5.0 in 2015) it became possible to package up Terraform resources and to build upon the work of others.</p>
<p>The <a href="https://github.com/terraform-community-modules">Terraform Community Modules</a> GitHub organization is a good example of building quality modules — the modules all have documentation, a sensible versioning strategy, and make things configurable via input variables. These modules have been a huge help to me in getting a new project up and running quickly at my latest job. I’d especially like to thank them for writing the <a href="https://github.com/terraform-community-modules/tf_aws_vpc">tf_aws_vpc</a> module as it let me create a <span class="caps">VPC</span> with all the right subnets super quickly.</p>
<p>However the problem when using other people’s modules is that it often won’t do <em>quite</em> what you need it to do. (This isn’t a dig at the tf_aws_vpc or any other specific module). Perhaps it has a provisioner that SSHs directly to an <span class="caps">EC2</span> instance but in your deployment everything has to go via an <span class="caps">SSH</span> bastion/jump-host; or maybe it uses a stock Ubuntu <span class="caps">AMI</span> (Amazon Machine Image) with a single shared “ubuntu” user and a shared <span class="caps">SSH</span> keypair and you can’t have any shared credentials in your environment so need to configure users somehow. The “best” case is that the change to the module is small and the maintainer is still active so you can “easily” fork the module, add the feature you want and have it accepted upstream. Life rarely works out this nicely though.</p>
<p>In this series I’m going to cover some of the tricks that you, as a module author, can do to keep the scope of your module small, but provide suitable extension points so that users can tweak it to their hearts’ content without having to open pull requests for every site-specific feature.</p>
<p>(These tips will be written up using <span class="caps">AWS</span> resources as this is what I use most of the time. Some of them are <span class="caps">AWS</span> specific such as the <code>aws_ami</code> data source but the general principles should apply to other cloud providers.)</p>
<p>Lets start with a trivial Terraform module - one that just spins up an <span class="caps">EC2</span> instance using an Ubuntu 16.04 <span class="caps">LTS</span> <span class="caps">AMI</span>:</p>
<div class="highlight"><pre><span></span><code>resource "aws_instance" "mod" {
ami = "ami-d15a75c7"
instance_type = "t2.micro"
}
</code></pre></div>
<p>(You can view all of the code samples from this blog post on my <a href="https://github.com/ashb/blog-code-samples/tree/reusable-terraform-modules-ami-ids">code-samples Github repo</a>.)</p>
<h2 id="dont-hard-code-ami-ids">Don’t hard-code <span class="caps">AMI</span> IDs<a class="headerlink" href="#dont-hard-code-ami-ids" title="Permanent link">¶</a></h2>
<p>When terraforming instances for your own project hard-coding an <span class="caps">AMI</span> is fine - you know what image you want, and what region you run it in. This isn’t a great way to build reusable modules though. The main reason is that AMIs are specific to each region so hard-coding will make the module only work in the original region, and perhaps most importantly if a new version of the <span class="caps">AMI</span> is built to include security updates then the module source will need updating. If anyone has pinned the module to a specific version (which is probably sensible to do so that things don’t break/change underneath you) they will need to also update the module source stanza at the call-site.</p>
<p>(You do apply security updates to your AMIs and rebuild them. Don’t you?)</p>
<p>There are a few ways around this hard-coding:</p>
<h3 id="1-use-a-variable-for-ami">1. Use a variable for <code>ami</code><a class="headerlink" href="#1-use-a-variable-for-ami" title="Permanent link">¶</a></h3>
<p>This lets the module caller easily change it without having to open a pull request. I would always recommend adding a default value to input variables where at all possible. In the module code it looks like:</p>
<div class="highlight"><pre><span></span><code>variable "ami_id" {
default = "ami-d15a75c7"
description = "AMI ID to use when launching our EC2 instance. Default is Ubuntu 16.04 LTS in us-east-1"
}
resource "aws_instance" "mod" {
<span class="hll"> ami = "${var.ami_id}"
</span> instance_type = "t2.micro"
}
</code></pre></div>
<p>And then called like this:</p>
<div class="highlight"><pre><span></span><code>provider "aws" {
region = "us-east-2"
}
module "mymod" {
source = "..."
# We're in a different region, so we need to specify a different AMI.
<span class="hll"> ami_id = "ami-5e94b23b"
</span>}
</code></pre></div>
<p>(In the specific case of different regions it is possible to use <a href="https://www.terraform.io/docs/configuration/variables.html#maps">“map” variables</a> to provide a lookup of region to <span class="caps">AMI</span> id.)</p>
<p>Having the <code>ami_id</code> input variable allows for customization by the end user but puts some onus on them to find the right <span class="caps">AMI</span> and to manually track updates. There’s a way to avoid this though.</p>
<h3 id="2-use-the-aws_ami-data-source">2. Use the <code>aws_ami</code> data source<a class="headerlink" href="#2-use-the-aws_ami-data-source" title="Permanent link">¶</a></h3>
<p>Instead of having to specify an <span class="caps">AMI</span> manually Terraform can use the <span class="caps">AWS</span> APIs to find an image matching certain criteria automatically. This example is shown in the terraform docs for the <a href="https://www.terraform.io/docs/providers/aws/r/instance.html">aws_instance resource</a>.</p>
<p>In the module:</p>
<div class="highlight"><pre><span></span><code>data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical
}
resource "aws_instance" "mod" {
<span class="hll"> ami = "${data.aws_ami.ubuntu.id}"
</span> instance_type = "t2.micro"
}
</code></pre></div>
<p>When you terraform this The <code>aws_ami</code> data source will use the filters to search for available images in the <em>current</em> region (available means the image is public, one in your <span class="caps">AWS</span> account, or one to which you’ve been granted permissions on).</p>
<p><strong>Note</strong>: You can add tags to any AMIs you build, and also filter on them, but searching by tag only works within the same <span class="caps">AWS</span> account. This is why the name in the above example is so long.</p>
<p>When using <code>most_recent = true</code> it will find the most recently published <span class="caps">AMI</span> every time Terraform is run which can be a good thing or a bad thing depending on your point of view. Having a terraform destroy and re-create an <span class="caps">EC2</span> instance to update to the latest <span class="caps">AMI</span> might not be what every one wants, or they might want to use a different <span class="caps">AMI</span> “release” all together which they can’t do with this mechanism alone.</p>
<p>It is possibly to combine both of these first two tips to give us a bonus tip:</p>
<h3 id="3-automatically-find-ami-ids-but-allow-override">3. Automatically find <span class="caps">AMI</span> IDs but allow override.<a class="headerlink" href="#3-automatically-find-ami-ids-but-allow-override" title="Permanent link">¶</a></h3>
<p>The advantage of this mechanism is that it works with zero extra input in the typical case but gives users control if they need it. It looks like this in the module:</p>
<div class="highlight"><pre><span></span><code>variable "ami_id" {
<span class="hll"> default = "" # Note this is empty.
</span> description = "Use this specific AMI ID for our EC2 instance. Default is Ubuntu 16.04 LTS in the current region"
}
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical
}
resource "aws_instance" "mod" {
<span class="hll"> ami = "${var.ami_id != "" ? var.ami_id : data.aws_ami.ubuntu.id}"
</span> instance_type = "t2.micro"
}
</code></pre></div>
<p>This works by using the <a href="https://www.terraform.io/docs/configuration/interpolation.html#conditionals">conditionals feature of the interpolation language</a> introduced in Terraform version 0.8.0 (first released in December 2016): <code>"${var.ami_id != "" ? var.ami_id : data.aws_ami.ubuntu.id}"</code> — this will use the value from the <code>ami_id</code> variable if it is not empty and fall-back to the <span class="caps">AMI</span> the data source discovered.</p>
<p>The other advantage of this pattern is that it lets someone start with the module in “auto-discovery” mode, take the <span class="caps">AMI</span> that it found and feed it back in under the <code>ami_id</code> input to pin to a specific <span class="caps">AMI</span> – putting the user in control of when an upgrade happens.</p>
<p>Or they could use their own <code>aws_ami</code> data source to find and track an <span class="caps">AMI</span> of another distribution without us having to provide input variables for every constant used in our <span class="caps">AMI</span> filter.</p>
<p>(You can view all of the code samples for this post on my <a href="https://github.com/ashb/blog-code-samples/tree/reusable-terraform-modules-ami-ids">code-samples Github repo</a>.)</p>
<p>If you found this first part of my series on Terraform useful check back (or <a href="https://ash.berlintaylor.com/index.html">subscribe to the feed</a>) for more. In future posts I’ll cover:</p>
<ul>
<li><a href="https://ash.berlintaylor.com/writings/2017/08/reusable-terraform-modules-extending-userdata/">how you can make user data/cloud-init extendable</a></li>
<li>applying user-supplied tags in combination with computed ones (i.e. a compute Name tag in addition to what ever else is given)</li>
<li>conditionally disabling data sources (which is especially useful with private AMIs in a big organization)</li>
</ul>