matsudaira 发布的文章

概念

https://blog.51cto.com/u_15048360/3202204
ansible是由python开发的自动化运维工具,是极限了了批量部署、命令执行等功能,ansible是基于模块工作的,本身只提供一个框架

核心组件:
ansible:核心程序
modules:核心模块及自定义模块
plugins:补充插件,如邮箱插件
playbooks:剧本,定义多任务配置文件,由ansible自动执行
inventory:定义管理的主机清单
connection plugins:负责和被监控端实现通信

特点:
无需再被监控端上安装agent
无服务器端
基于模块工作
使用yaml
基于ssh

执行过程:
加载配置文件,默认为/etc/ansible/ansible.cfg
加载模块文件
通过ansible将模块或命令生成对应的临时python文件并将文件传输至远程服务器
执行用户家目录的.ansible/tmp/.py文件
给文件+x
执行并返回结果,删除临时python文件,退出

使用

基础

安装

yum install epel-release.noarch -y
yum install ansible -y

配置免密

ssh-keygen
cat ~/.ssh/id_rsa.pub
在被操作机上
vi /root/.ssh/authorized_keys

配置文件

ls /etc/ansible/
ansible.cfg  hosts  roles

cat ansible.cfg
[defaults]

# some basic default values...

#主机清单
#inventory      = /etc/ansible/hosts
#文件存放位置
#library        = /usr/share/my_modules/
#module_utils   = /usr/share/my_module_utils/
#临时生成的文件在远程主机上的目录
#remote_tmp     = ~/.ansible/tmp
#临时生成的文件在本地主机的目录
#local_tmp      = ~/.ansible/tmp
#plugin_filters_cfg = /etc/ansible/plugin_filters.yml
#默认并发数
#forks          = 5
#默认线程数
#poll_interval  = 15
#sudo_user      = root
#ask_sudo_pass = True
#ask_pass      = True
#transport      = smart
#remote_port    = 22
#module_lang    = C
#module_set_locale = False

vi hosts
# This is the default ansible 'hosts' file.
#
# It should live in /etc/ansible/hosts
#
#   - Comments begin with the '#' character
#   - Blank lines are ignored
#   - Groups of hosts are delimited by [header] elements
#   - You can enter hostnames or ip addresses
#   - A hostname/ip can be a member of multiple groups

# Ex 1: Ungrouped hosts, specify before any group headers.

## green.example.com
## blue.example.com
## 192.168.100.1
## 192.168.100.10

# Ex 2: A collection of hosts belonging to the 'webservers' group

## [webservers]
## alpha.example.org
## beta.example.org
## 192.168.1.100
## 192.168.1.110

# If you have multiple hosts following a pattern you can specify
# them like this:

## www[001:006].example.com

# Ex 3: A collection of database servers in the 'dbservers' group

## [dbservers]
##
## db01.intranet.mydomain.net
## db02.intranet.mydomain.net
## 10.25.1.56
## 10.25.1.57

# Here's another example of host ranges, this time there are no
# leading 0s:

## db-[99:101]-node.example.com

常用参数

-a :指定模块的参数
-m :指定模块
-C :坚持执行结果
-e :指明变量名
-f :指定并发进程数
-i :指定主机清单文件
--syntax-check:检查执行命令是否存在语法错误

ansible-doc -l 列出所有模块
ansible-doc [-s]  xx 查看指定模块的用法,-s为列出简单信息
ansible <host-pattern> [-m module_name] [-a args] 执行
ansible-playbook 执行剧本
ansible-console 交互执行

ansible localhost -m 本地执行

playbook

playbook使用yaml编写
https://ansible-tran.readthedocs.io/en/latest/docs/playbooks_intro.html
每一个play包含一个task列表,task会根据你配置的hosts\remote_user等使用指定的模块去运行设定的命名,运行时是自上向下的,每个task必须有一个名称以用于区分
比如这样的,使用shell模块运行命令

tasks:
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand || /bin/true

我们用个完整点的例子来演示下
安装nginx并配置文件

第一步,准备文件存放目录
[root@master ~]# mkdir -p /root/ansible/{conf,bin}
第二步,书写YAML文件
[root@master bin]# cat nginx.yaml
- hosts: server2 #目标主机
  remote_user: root #在目标机上使用的账户
  vars: #变量
    hello: Ansible
  tasks: #第一个任务
  - name: Install epel #任务名
    yum:  #yum模块
      name: epel-release.noarch
      state: latest
  - name: Install nginx
    yum:
      name: nginx
      state: present
  - name: Copy nginx configure file
    copy: #copy模块
      src: /root/ansible/conf/site.conf
      dest: /etc/nginx/conf.d/site.conf
  - name: Start nginx
    service: #设置开机启动
      name: nginx
      state: restarted
  - name: Create index.html
    shell: echo "nginx1" > /usr/share/nginx/html/index.html

第三步,书写conf文件
[root@master bin]# cat site.conf
server {
listen 8080;
server_name 192.168.80.50:8080;
location / {
index index.html;
}
}
第四步,检查语法错误,没有错误则继续执行
[root@master bin]# ansible-playbook nginx.yaml --syntax-check
[root@master bin]# ansible-playbook nginx.yaml

当然,比起这种key: value,你也可以写成key=value的形式,不过注意并不是所有的模块都支持这种写法,比如说shell

---
- hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    yum: pkg=httpd state=latest
  - name: write the apache config file
    template: src=/srv/httpd.j2 dest=/etc/httpd.conf
    notify:
    - restart apache
  - name: ensure apache is running
    service: name=httpd state=started
  handlers:
    - name: restart apache
      service: name=httpd state=restarted

常用模块

ping:测试连通

[root@master ~]# vi /etc/ansible/hosts 
[root@master ~]# ansible -m ping test
172.17.120.142 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    }, 
    "changed": false, 
    "ping": "pong"

shell:远程执行指令
https://bingostack.com/2021/03/ansible-shell-command/
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/shell_module.html

ansible-doc -s shell
- name: Execute shell commands on targets
  shell:
      chdir:                 # Change into this directory before running the command.
      cmd:                   # The command to run followed by optional arguments.
      creates:               # A filename, when it already exists, this step will *not* be run.
      executable:            # Change the shell used to execute the command. This expects an absolute path to the executable.
      free_form:             # The shell module takes a free form command to run, as a string. There is no actual parameter named 'free form'. See the
                               examples on how to use this module.
      removes:               # A filename, when it does not exist, this step will *not* be run.
      stdin:                 # Set the stdin of the command directly to the specified value.
      stdin_add_newline:     # Whether to append a newline to stdin data.
      warn:                  # Whether to enable task warnings.
[root@master ~]# cat test-pwd.yaml 
- hosts: test
  tasks: 
  - name: ls tmp
    shell: cd /tmp && pwd && ls -l > ./log

[root@master ~]# ansible-playbook test-pwd.yaml --syntax-check

playbook: test-pwd.yaml
[root@master ~]# ansible-playbook test-pwd.yaml

[root@node01 ~]# cat /tmp/log 
total 8
srwxr-xr-x 1 root root    0 Feb  9 23:06 aliyun_assist_service.sock
drwx------ 2 root root 4096 Feb 23 17:01 ansible_command_payload_geujPy
-rw-r--r-- 1 root root    0 Feb 23 17:01 log
drwx------ 3 root root 4096 Feb  7 10:06 systemd-private-7cbd598af444427b8714fcd64c669e47-chronyd.service-Z4rJtW

service:配置开机自启

service:
      arguments:             # Additional arguments provided on the command line.
      enabled:               # Whether the service should start on boot. *At least one of state and enabled are required.*
      name:                  # (required) Name of the service.
      pattern:               # If the service does not respond to the status command, name a substring to look for as would be found in the output of the
                               `ps' command as a stand-in for a status result. If the string is found, the service will be
                               assumed to be started.
      runlevel:              # For OpenRC init scripts (e.g. Gentoo) only. The runlevel that this service belongs to.
      sleep:                 # If the service is being `restarted' then sleep this many seconds between the stop and start command. This helps to work
                               around badly-behaving init scripts that exit immediately after signaling a process to stop.
                               Not all service managers support sleep, i.e when using systemd this setting will be
                               ignored.
      state:                 # `started'/`stopped' are idempotent actions that will not run commands unless necessary. `restarted' will always bounce the
                               service. `reloaded' will always reload. *At least one of state and enabled are required.*
                               Note that reloaded will start the service if it is not already started, even if your chosen
                               init system wouldn't normally.
      use:                   # The service module actually uses system specific modules, normally through auto detection, this setting can force a
                               specific module. Normally it uses the value of the 'ansible_service_mgr' fact and falls

copy

  copy:
      backup:   覆盖文件前先备份
      content:  src
      dest:     目的目录

环境准备

https://www.jianshu.com/p/57e3819a2e7c
内核版本需要高于2.6.13
uname -r
使用ll /proc/sys/fs/inotify命令,是否有以下三条信息输出,如果没有表示不支持。

ll /proc/sys/fs/inotify

total 0
-rw-r--r-- 1 root root 0 Jan 4 15:41 max_queued_events
-rw-r--r-- 1 root root 0 Jan 4 15:41 max_user_instances
-rw-r--r-- 1 root root 0 Jan 4 15:41 max_user_watches

安装:
yum install inotify-tools

使用说明

inotify是一种监控工具,他用于监控一个文件直到某种指定的事件发生

inotifywait -h
inotifywait 3.14
Wait for a particular event on a file or set of files.
Usage: inotifywait [ options ] file1 [ file2 ] [ file3 ] [ ... ]
Options:
        -h|--help       Show this help text.
        @<file>         Exclude the specified file from being watched.
        --exclude <pattern>
                        Exclude all events on files matching the
                        extended regular expression <pattern>.
        --excludei <pattern>
                        Like --exclude but case insensitive.
        -m|--monitor    Keep listening for events forever.  Without
                        this option, inotifywait will exit after one
                        event is received.
        -d|--daemon     Same as --monitor, except run in the background
                        logging events to a file specified by --outfile.
                        Implies --syslog.
        -r|--recursive  Watch directories recursively.
        --fromfile <file>
                        Read files to watch from <file> or `-' for stdin.
        -o|--outfile <file>
                        Print events to <file> rather than stdout.
        -s|--syslog     Send errors to syslog rather than stderr.
        -q|--quiet      Print less (only print events).
        -qq             Print nothing (not even events).
        --format <fmt>  Print using a specified printf-like format
                        string; read the man page for more details.
        --timefmt <fmt> strftime-compatible format string for use with
                        %T in --format string.
        -c|--csv        Print events in CSV format.
        -t|--timeout <seconds>
                        When listening for a single event, time out after
                        waiting for an event for <seconds> seconds.
                        If <seconds> is 0, inotifywait will never time out.
        -e|--event <event1> [ -e|--event <event2> ... ]
                Listen for specific event(s).  If omitted, all events are 
                listened for.

Exit status:
        0  -  An event you asked to watch for was received.
        1  -  An event you did not ask to watch for was received
              (usually delete_self or unmount), or some error occurred.
        2  -  The --timeout option was given and no events occurred
              in the specified interval of time.

Events:
        access          file or directory contents were read
        modify          file or directory contents were written
        attrib          file or directory attributes changed
        close_write     file or directory closed, after being opened in
                        writeable mode
        close_nowrite   file or directory closed, after being opened in
                        read-only mode
        close           file or directory closed, regardless of read/write mode
        open            file or directory opened
        moved_to        file or directory moved to watched directory
        moved_from      file or directory moved from watched directory
        move            file or directory moved to or from watched directory
        create          file or directory created within watched directory
        delete          file or directory deleted within watched directory
        delete_self     file or directory was deleted
        unmount         file system containing file or directory unmounted

nginx支持的信号表

SIGNALS
     The master process of nginx can handle the following signals:
     SIGINT, SIGTERM  Shut down quickly.
     SIGHUP           Reload configuration, start the new worker process with a new configuration, and gracefully shut
                      down old worker processes.
     SIGQUIT          Shut down gracefully.
     SIGUSR1          Reopen log files.
     SIGUSR2          Upgrade the nginx executable on the fly.
     SIGWINCH         Shut down worker processes gracefully.
     
# 可以使用nginx -s,也可以使用kill,需要注意互相之间的映射关系
#http://io.upyun.com/2017/08/19/nginx-signals/

你可以使用strace -c cmd /-p pid来追踪某个进程或者指令的系统调用,他们的打印结果略有不同

[root@master ~]# strace -c pwd
/root
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 21.71    0.000104          11         9           mmap
 16.91    0.000081          16         5           close
 10.86    0.000052          17         3         3 access
 10.02    0.000048          12         4           mprotect
  8.98    0.000043          14         3           open
  8.14    0.000039          19         2           munmap
  6.68    0.000032           8         4           fstat
  6.05    0.000029           7         4           brk
  4.38    0.000021          21         1           write
  2.71    0.000013          13         1           execve
  2.09    0.000010          10         1           arch_prctl
  0.84    0.000004           4         1           getcwd
  0.63    0.000003           3         1           read
------ ----------- ----------- --------- --------- ----------------
100.00    0.000479                    39         3 total

当使用-p 的时候,打印出来的文本量可能会有点超出你的想象..

[root@master ~]# strace -p 1
strace: Process 1 attached
epoll_wait(4, [{EPOLLIN, {u32=1532991024, u64=93829388539440}}], 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {tv_sec=319337, tv_nsec=129368879}) = 0
recvmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="WATCHDOG=1", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=365, uid=0, gid=0}}], msg_controllen=32, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 10
open("/proc/365/cgroup", O_RDONLY|O_CLOEXEC) = 16
fstat(16, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32446e7000
read(16, "11:memory:/system.slice/systemd-"..., 1024) = 370
close(16)                               = 0
munmap(0x7f32446e7000, 4096)            = 0
timerfd_settime(3, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=319494, tv_nsec=691126000}}, NULL) = 0
epoll_wait(4, [{EPOLLIN, {u32=1532730112, u64=93829388278528}}], 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {tv_sec=319337, tv_nsec=131285975}) = 0
recvmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1 \0\0\0@\r\0\0\211\0\0\0\1\1o\0\25\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/DBus\0\0\0\2\1s\0\24\0\0\0"..., iov_len=168}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 168
epoll_wait(4, [{EPOLLIN, {u32=1532730112, u64=93829388278528}}], 33, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {tv_sec=319337, tv_nsec=131777119}) = 0
recvmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1@\1\0\0~/\0\0\305\0\0\0\1\1o\0\31\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., iov_len=512}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 512
sendmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\t\0\0\0\0E\1\0\207\0\0\0\1\1o\0\25\0\0\0/org/fre"..., iov_len=152}, {iov_base="\4\0\0\0:1.1\0", iov_len=9}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 161
recvmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\4\0\0\0\311\237\0\0=\0\0\0\6\1s\0\4\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
recvmsg(38, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base=":1.0\0\0\0\0\5\1u\0\0E\1\0\10\1g\0\1u\0\0\7\1s\0\24\0\0\0"..., iov_len=60}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 60
getuid()                                = 0
stat("/run/systemd", {st_mode=S_IFDIR|0755, st_size=400, ...}) = 0
mkdir("/run/systemd/system", 0755)      = -1 EEXIST (File exists)
stat("/run/systemd/system", {st_mode=S_IFDIR|0755, st_size=1200, ...}) = 0
umask(077)                              = 000
open("/run/systemd/system/.#session-811.scopeHaHZNz", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600) = 16
umask(000)                              = 077
fcntl(16, F_GETFL)                      = 0x8002 (flags O_RDWR|O_LARGEFILE)
umask(0777)                             = 000
fchmod(16, 0644)                        = 0
umask(000)                              = 0777
fstat(16, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32446e7000
write(16, "# Transient stub\n", 17)     = 17
rename("/run/systemd/system/.#session-811.scopeHaHZNz", "/run/systemd/system/session-811.scope") = 0
close(16)                               = 0
munmap(0x7f32446e7000, 4096)            = 0
stat("/run/systemd/system", {st_mode=S_IFDIR|0755, st_size=1220, ...}) = 0
mkdir("/run/systemd/system/session-811.scope.d", 0755) = 0
umask(077)                              = 000
open("/run/systemd/system/session-811.scope.d/.#50-Slice.confd5TEip", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600) = 16
umask(000)                              = 077
fcntl(16, F_GETFL)                      = 0x8002 (flags O_RDWR|O_LARGEFILE)
umask(0777)                             = 000
fchmod(16, 0644)                        = 0
umask(000)                              = 0777
fstat(16, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32446e7000
write(16, "[Scope]\nSlice=user-0.slice\n", 27) = 27
rename("/run/systemd/system/session-811.scope.d/.#50-Slice.confd5TEip", "/run/systemd/system/session-811.scope.d/50-Slice.conf") = 0
close(16)                               = 0

如果你想保存打印结果,你可以使用 strace -o outputfilename pwd这种方式,他会输出一个打印日志在当前目录下
如果你想知道某个系统调用在某个命令当中实时的调用情况,你可以这样这(以open这个syscall和ls这个命令为例子)

[root@master ~]# strace -t -e open ls
10:26:01 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libpcre.so.1", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
10:26:01 open("/proc/filesystems", O_RDONLY) = 3
10:26:01 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
flannel.yaml  jen-dep.yaml  jen-pv.yaml  jen-role.yaml  jen-svc.yaml  output
10:26:01 +++ exited with 0 +++

如果你想追踪基于某个文件运行起来的进程的子进程,那么你可以使用strace -f filename

一个进程结束后,会返回一个返回值,父进程会用wait()来获取这个返回值,等获取到之后,系统会回收进程的pcd并删除对应的进程,但如果父进程一直不获取,那么这个进程会处于僵尸状态

处理思路:

僵尸进程是已经结束的进程,无法用kill杀死,首先得去看看他的父进程出了什么事情,为什么不获取

ps -p pid -o stat | tail -n 1

如果父进程停止了(T),那么恢复父进程就行
kill -SIGCONT pid

如果父进程正常,用strace -p pid去追踪他的行为,看看是不是发生了死锁之类的

如果是孤儿进程(父进程先一步退出了),那么这种进程一般会被初始进程接管,这种情况下要是还是出了问题,那看看是不是初始进程出事了,方法参上

注:高版本的初始进程不再会被kill发送的停止信号影响,也不能被strace

https://blog.csdn.net/lovely_nn/article/details/123043447

- job_name: kubelet/metrics
  honor_labels: true
  honor_timestamps: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - role: node
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: false
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: kubelet
    action: replace