你会写DevOps的文档吗？

monicazhang

本帖最后由 monicazhang 于 2018-8-23 14:53 编辑

每个DevOps都一个百宝箱，里面放着各种命令行脚本，可以用来自动化各式任务。但若文档不全，即便是脚本的作者，时间一久也不敢随便乱用，毕竟运维的大部分工作是管理生产环境，要是出了错，不是轻描淡写就可以蒙混过关的。写好 DevOps 的文档其实也是一门技术活儿，这里给大家分享一些组织运维脚本及其文档的经验。

Fabric的任务管理与文档

在以前的文章中，我们曾经介绍过Glow使用了fabric来执行各种日常管理的任务。Fabric提供了非常好用的任务组织以及查阅任务文档的功能。

Fabric的主文件一般命名为fabfile.py，但任务多了，都写在一个文件里显然很难维护。Fabric有一个很实用的特性，就是当fabfile.py里导入其他模块时，会自动发现里面的fabric任务。利用这个特性，可以把各种任务分类写在不同的模块中，然后在fabfile.py中统一导入。比如 Glow 的 DevOps 代码库的结构大概长这个样子：

$ tree
├── __init__.py
├── fabfile.py
├── fab_scripts
│ ├── __init__.py
│ ├── monitors.py
│ ├── mysql.py
│ ├── nginx.py
│ ├── redis.py
│ ├── scaling.py
│ ├── services.py

在fabfile.py里除了一些最核心的任务脚本外，主要就是一些import语句

# fabfile.pyfrom fab_scripts import monitors
from fab_scripts import mysql
from fab_scripts import nginx
from fab_scripts import redis
from fab_scripts import scaling
from fab_scripts import services

这样我们就把散落在多个文件里的任务聚集到了一起，我们可以用fab -l来列出所有可执行的任务及其描述，其中任务描述来自于对应任务的第一行docstring。例如，

$ fab -lAvailable commands: monitors.get Get YAML definition of monitors monitors.list List names of all monitors monitors.mute Mute specific groups of monitors monitors.mute_all Mute all monitors globally monitors.unmute Unmute specific groups of monitors monitors.unmute_all Unmute all monitors globally mysql.connection_list Show number of DB connections group by host mysql.connection_sources Show number of DB connections group by process nginx.turn_off_maintenance Turn off site maintenance mode. nginx.turn_on_maintenance Turn on site maintenance mode. Return 503. redis.auto_save Update saving settings for redis instances redis.start_slave Set master and start replication. redis.stop_slave Stop replication and set its master to none. scaling.add_servers Launch more instances in specific server group scaling.create_image Create an image for provisioning new instances scaling.get_latest_tag Get latest tag of deployed code services.cycle Restart application services. services.start Start application services. services.stop Stop application services.

这里可以看到，将任务分写在不同的模块，模块名就起到了Namespace的作用。在显示命令列表时，在同一个Namespace下的命令被聚集到了一起，很好地起到了任务分类的作用。使用fab -d [task_name]可以显示该任务完整的docstring。规整的docstring可以让执行任务的用户清楚地理解其作用及参数用法。我们在写fabric任务的docstring时，一般分为三个部分

任务的简单介绍
任务的参数
具体用例

最后一点由为重要，有些任务参数众多，即使读了参数说明，仍会让人有些云里雾里。但几个典型的实际用例，对于用户了解任务的用法会起到至关重要的作用。在下面的例子中，我们展示了deploy任务（代码部署）的说明文档

$ fab -d deploy
Displaying detailed information for task 'deploy':

Deploy code to targeted server_group.
You need put ansible vault password at ~/.ansible_vault_passwd directory,
otherwise you would be prompt to enter vault password.

Args:
   server_group: Possible values include prod, stage
   release_tags: A list of release tags to be pushed

Examples:       # Deploy to stage
   fab deploy:stage,glow_stage_1446102452       # Deploy to production
   fab deploy:prod,glow_prod_1446102452       # Deploy multiple repo at once
   fab deploy:prod,glow_prod_1446102452,nurture_prod_1445102467

动态Docstring

在Python中，docstring其实就是函数的__doc__的属性，所以我们可以像修改普通变量那样动态修改docstring，这给我们生成动态文档或是重用公共的文档提供了可能。例如，我们的services模块下有cycle,start,stop三个任务，分别用来重启，开始，停止我们的microservice。我们当然希望在用fab -d来查看任务的文档说明时，同时可以显示所有可用的microservice。但hard-coded现有的microservice是一个愚蠢的做法，这样我们不但需要把同一段文档复制三份，并且每次新增一个microservice时还要记得来更新文档。这里我们用Python的decorator来动态地把可用服务的信息添加到docstring中。比如cycle任务的定义是这样的：

@task@services_docdef cycle(*services, **kwargs):
"""Restart application services.

Args:
   services: list of services need cycle (separate by comma)

Examples:
   fab services.cycle:glow-www,glow-forum
"""

注意到这里用了@service_doc这个decorator，它的定义如下：

def services_doc(func):
services = get_available_services()
doc = """
Possible values for services:
{}
""".format(" ".join(" " * 8 + x for x in services))
func.__doc__ += doc return func

我们通过get_available_services来动态取得当前环境下可用的microservice（这里我们不关心get_available_service是如何实现的），并将其添加到函数的docstring之后。这样，当我们查看cycle的方法时，所有可用的microservice也会显示出来。

$ fab -d services.cycleDisplaying detailed information for task 'services.cycle': Restart application services Args: services: list of services need cycle (separate by comma) Examples: fab services.cycle:glow-www,glow-forum Possible values for services: glow-www glow-user glow-forum ...

动态外部文档

除了docstring，我们也经常需要写独立的外部文档。在Glow，这些文档绝大部分都是用Markdown来写的。例如，我们需要写一个介绍生产环境架构的文档，其中肯定会加入生产环境中有哪些服务器，每个服务器的功能描述以及它们的hostname。我们可以用手动的方式来写，但每当为生产环境添加新服务器时，我们必须记得更新这份文档。

而实际情况是，我们从来不在AWS的控制台手动创建服务器，所有的服务器都是由Ansible来创建与维护的。也就是说，所有的服务器配置信息及其功能描述都已经存在于Ansible的playbook中。当我们写外部文档时，应该去引用Ansible中的信息，而不是重写手写一遍。

所以在我们的生产环境文档中会利用HTML注释来指定需要外部引用的部分，然后通过执行脚本将这些引用的内容填充至文档里。例如，在我们的生产环境文档中有这样一段：

## EC2 servers  | Server group | Instance type | Count | Description          |
|:-------------|:--------------|------:|:----------------------|
| bastion    | t2.small    |    1 | Bastion/Jumper server |
| www       | c3.large    |    4 | Web servers          |
...

这里和之间的表格就是一个外部引用，每次Ansible更新服务器配置时，会执行一个脚本，它会自动在文档中查找这对标签，并更新其中的内容。这是一个很简单的技术，但对于保持文档与实际环境同步很有帮助。

原创：叶剑烨

上一篇：DevOps未来趋势和展望
下一篇：DevOps平台实践落地-构建管理详解

你会写DevOps的文档吗？

评论