蓝鲸单机离线部署:app_mgr组件安装失败解决

之前在腾讯蓝鲸智云-单机离线部署测试中,遇到了几个安装问题,本文记录下3.2 app_mgr组件安装失败的解决过程,因为这个问题卡了很久(可能也是因为笔者对python相关知识和蓝鲸产品不够熟悉),虽然最终解决了,但过程本身更值得记录。

1.问题描述

离线安装app_mgr组件时失败:
安装命令:./bk_install app_mgr
报错信息如下:

                  create virtualenv for paas_agent                  
Requirement already satisfied: pbr in /usr/local/lib/python2.7/site-packages
Requirement already satisfied: virtualenvwrapper in /usr/local/lib/python2.7/site-packages
Requirement already satisfied: virtualenv-clone in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: stevedore in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: virtualenv in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: pbr>=1.6 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
[192.168.1.6]20200303-174651 224   mkvirtualenv -a /data/bkce/paas_agent/paas_agent --extra-search-dir=/data/install/pip --no-download -p /usr/local/bin/python paas_agent
Already using interpreter /usr/local/bin/python
New python executable in /data/bkce/.envs/paas_agent/bin/python
Installing setuptools, pip, wheel...done.
Setting project for paas_agent to /data/bkce/paas_agent/paas_agent
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): pbr in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): virtualenvwrapper in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): virtualenv-clone in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): stevedore in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): virtualenv in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): pbr>=1.6 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): six>=1.9.0 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): supervisor in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): six in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): meld3>=0.6.5 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from supervisor)
[192.168.1.6]20200303-174801 233   generate env variable settings.
[192.168.1.6]20200303-174801 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)
Collecting Django==1.8.11 (from -r requirements.txt (line 1))
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91150>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91d50>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91f10>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c110>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c2d0>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Could not find a version that satisfies the requirement Django==1.8.11 (from -r requirements.txt (line 1)) (from versions: )
No matching distribution found for Django==1.8.11 (from -r requirements.txt (line 1))
[192.168.1.6]20200303-174900 177   pip install (--no-cache-dir ) for paas_agent.  FAILED
[192.168.1.6]20200303-174900 47   Abort

注意:离线安装就是指安装环境无法连接互联网,如果你的部署环境允许可以连接外网,测试过该组件安装会非常顺利。

2.初步分析

首先,比较奇怪的是只有离线安装app_mgr这个组件时,报错无法连接网络,回顾上面的报错日志,发现安装这个组件时:

[192.168.1.6]20200303-174801 233   generate env variable settings.
[192.168.1.6]20200303-174801 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)

看起来这个pip 命令没有使用--find-links参数指定本地的路径,所以尝试连接外网的pip源。
而在其他组件安装时,都是有指定这个参数到各自本地路径的:

--比如安装fta:
[192.168.1.6]20200302-001610 233   generate env variable settings.
[192.168.1.6]20200302-001610 151   exec: pip install --no-cache-dir --no-index --find-links=/data/src/fta/support-files/pkgs -r requirements.txt (/data/bkce/fta/fta)

--比如安装bkdata
[192.168.1.6]20200302-003237 233   generate env variable settings.
[192.168.1.6]20200302-003237 151   exec: pip install --no-cache-dir --no-index --find-links=/data/src/bkdata/support-files/pkgs -r requirements.txt (/data/bkce/bkdata/dataapi)

可以看到这类组件安装在同样类似的步骤时,都有使用--find-links参数各自指定本地包存放的路径。

初步进行了一些尝试:

2.1 直接使用pip离线安装后再次尝试单独安装app_mgr

pip install --no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs -r /data/bkce/paas_agent/paas_agent/requirements.txt

pip离线安装成功,但是再调用安装./bk_install app_mgr 组件依然报错,说明手工提前安装无效。
这大概是因为程序是进入到对应的virtualenv执行的,而虚拟环境相对是独立的。

2.2 找到一些pip.conf的配置文件,备份原文件,修改配置指定本地路径
尝试修过的配置文件:/data/src/.pip/pip.conf、/data/install/pip/pip.conf,内容改为:

[global]
find-links = /data/src/paas_agent/support-files/pkgs
[install]
find-links = /data/src/paas_agent/support-files/pkgs

但是调用安装./bk_install app_mgr 组件依然报同样错误,说明无效。
后面其他尝试会发现有更多的pip.conf,全部修改也是不行。

2.3 设置环境变量
官方文档搜到一个环境变量PIP_FIND_LINKS:

export PIP_FIND_LINKS=/data/src/paas_agent/support-files/pkgs

再次尝试调用./bk_install app_mgr 安装组件,报错不变。
这大概是因为写死在程序里的,类似crontab定时任务一样,在外部设置变量干预也没用,必须找到里面的设置。

2.4 其他尝试
比如在bk_install中app_mgr模块下手工加入上面的环境变量设置,也不行,报错不变。

3.集思广益

问题有些陷入僵局,而且显然是有问题,与客户反馈上述分析,一致认为很可能是bug,找蓝鲸客服进行反馈。
客服人员的答复是离线安装建议配置完整的本地pip源,考虑到全量pip源要接近2T的空间申请,转换为进行指定包的pip源搭建。
而且这个解决方案更像是workaround,跳过了问题本质,因为实际其他组件都不需要,会使用find-links参数指定本地的包目录。

因为之前没接触过,配置本地pip源也耗费了不少时间搜索验证:

[root@rbtnode1 bin]# find /data -name pip.conf
/data/install/pip/pip.conf
/data/install/pip.conf
/data/src/service/.pip/pip.conf
/data/src/.pip/pip.conf
/data/src/pip.conf


cat /data/install/pip/pip.conf
cat /data/install/pip.conf
cat /data/src/service/.pip/pip.conf
cat /data/src/.pip/pip.conf
cat /data/src/pip.conf
cat ~/.pip/pip.conf

不清楚究竟会用到哪个pip.conf,所以所有配置文件备份,然后内容统一都改为本地pip源:

[global]
trusted-host = 192.168.1.6
index-url = http://192.168.1.6:8080/simple

关于本地pip源的具体配置,可参考网上这两篇文章:

但是尝试安装还是报错。修改globals.env配置文件:

# 设置访问网络资源如yum源所使用的HTTP代理地址, 如: BK_PROXY=http://192.168.0.1:8833
export BK_PROXY=http://192.168.1.6:8080/simple

和同事也聊到这个事情,从逻辑上来看还是应该解决如何跟其他组件一样可以指定find-links参数才可以。
思路只能是自己从脚本源头去找,看有没有对应的设置。从bk_install这个主脚本开始为入口。

4.最终解决

开始看脚本没多久就看下去了,因为自己很少运用脚本能力,本身也是弱项。从bk_install到bkcec就看到里面调用了好多文件,一时找不到头绪。此时又回头看最初的报错日志,看报错之前有这样一行,像是脚本的输出内容:

[192.168.1.6]20200303-174801 233   generate env variable settings.
[192.168.1.6]20200303-174801 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)

依据”generate env variable settings”搜索/data/install下所有的文件,发现只有utils.fc文件包含:

[root@rbtnode1 install]# grep "generate env variable settings" *
grep: agent_setup: Is a directory
grep: appmgr: Is a directory
grep: bcs: Is a directory
grep: bin: Is a directory
grep: build: Is a directory
grep: deck: Is a directory
grep: extra: Is a directory
grep: health_check: Is a directory
grep: migrate: Is a directory
grep: pip: Is a directory
grep: scripts: Is a directory
grep: setuptools-36.0.1: Is a directory
grep: support-files: Is a directory
grep: templates: Is a directory
grep: uninstall: Is a directory
utils.fc:    log "generate env variable settings."
grep: verify: Is a directory
[root@rbtnode1 install]# ls -l utils.fc
-rw-r--r-- 1 root root 38897 Jan  9 16:11 utils.fc
[root@rbtnode1 install]# scp utils.fc 192.168.1.61:/tmp/

拷贝下来去看发现有这样一段代码比较像:

_install_pypkgs () {
    local module=$1
    local project=$2
    local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs
    local pip_options="--no-cache-dir "

    local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) )

    if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then
        _ordered_requirement_files=( requirements.txt )
    fi

    for reqr_file in ${_ordered_requirement_files[@]}; do
        if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then
            pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
        fi

        log "exec: pip install $pip_options -r $reqr_file ($PWD)"
        http_proxy=$BK_PROXY https_proxy=$BK_PROXY \
            pip install $pip_options -r $reqr_file      <-- 这里pip install 带的参数$pip_options很可能没有find-links参数

        nassert "pip install ($pip_options) for $venv_name"
    done
    #shopt -s nullglob
}

上面标注的那一行,指出这里pip install 带的参数$pip_options很可能没有find-links参数,因为上面赋予pip_options变量的是在if条件里面,暂时来不及整体梳理分析,尝试直接修改 utils.fc 文件加入pip_options的定义:

_install_pypkgs () {
    local module=$1
    local project=$2
    local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs
    local pip_options="--no-cache-dir "

    local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) )

    if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then
        _ordered_requirement_files=( requirements.txt )
    fi

    for reqr_file in ${_ordered_requirement_files[@]}; do
        if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then
            pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
        fi

        log "exec: pip install $pip_options -r $reqr_file ($PWD)"
        http_proxy=$BK_PROXY https_proxy=$BK_PROXY \
            #pip install $pip_options -r $reqr_file     <-- 之前的这一行注释,下面两行是新增,指定pip_options参数值后再调用pip install
            pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
            pip install $pip_options -r $reqr_file

        nassert "pip install ($pip_options) for $venv_name"
    done
    #shopt -s nullglob
}

修改 utils.fc 后再次测试,发现之前报错的位置不再报错(虽然显示还没有find-links参数,但实际已经有了):

[192.168.1.6]20200303-214725 235   generate env variable settings.
[192.168.1.6]20200303-214726 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)
Ignoring indexes: http://192.168.1.6:8080/simple
Collecting Django==1.8.11 (from -r requirements.txt (line 1))
Collecting PyMySQL==0.6.7 (from -r requirements.txt (line 2))

省略部分输出..

Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
  Could not find a version that satisfies the requirement idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) (from versions: )
No matching distribution found for idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
[192.168.1.6]20200303-214856 177   pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent.  FAILED
[192.168.1.6]20200303-214856 47   Abort
[root@rbtnode1 install]# 

但最后又因为缺包中止了安装。
这个 idna<2.9,>=2.5 在paas_agent的requirements.txt中实际没有列出来,但实际需要。可以将其他位置的包都统一打包到一个目录(/data/localpip),然后拷贝其他的包到这个目录下:

[root@rbtnode1 pkgs]# pwd
/data/src/paas_agent/support-files/pkgs
[root@rbtnode1 pkgs]# ls -l |wc -l
62

[root@rbtnode1 pkgs]# cp -n /data/localpip/* ./
[root@rbtnode1 pkgs]# pwd
/data/src/paas_agent/support-files/pkgs
[root@rbtnode1 pkgs]# ls -l |wc -l
281

然后再尝试安装app_mgr:

[root@rbtnode1 pkgs]# cd /data/install/
[root@rbtnode1 install]# ./bk_install app_mgr

这次终于成功了,日志如下,可以看到appt安装成功后接下来还是安装appo,都可以成功:

Collecting chardet<3.1.0,>=3.0.2 (from requests==2.21.0->-r requirements.txt (line 3))
Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
Collecting certifi>=2017.4.17 (from requests==2.21.0->-r requirements.txt (line 3))
Installing collected packages: Django, PyMySQL, urllib3, chardet, idna, certifi, requests, pytz, amqp, anyjson, kombu, billiard, celery, django-celery, redis, httplib2, xlrd, xlwt, MarkupSafe, Mako, Jinja2, pycrypto, gunicorn, six, SQLAlchemy, suds, supervisor, uWSGI, pytest-runner, setuptools-scm
  Running setup.py install for anyjson: started
    Running setup.py install for anyjson: finished with status 'done'
  Running setup.py install for billiard: started
    Running setup.py install for billiard: finished with status 'done'

省略部分输出..

Successfully installed Django-1.8.11 Jinja2-2.8 Mako-1.0.4 MarkupSafe-0.23 PyMySQL-0.6.7 SQLAlchemy-1.0.12 amqp-1.4.9 anyjson-0.3.3 billiard-3.3.0.23 celery-3.1.18 certifi-2019.3.9 chardet-3.0.4 django-celery-3.2.1 gunicorn-19.6.0 httplib2-0.9.1 idna-2.8 kombu-3.0.35 pycrypto-2.6.1 pytest-runner-2.8 pytz-2016.6.1 redis-2.10.5 requests-2.21.0 setuptools-scm-1.11.1 six-1.10.0 suds-0.4 supervisor-3.3.1 uWSGI-2.0.13.1 urllib3-1.24.1 xlrd-1.0.0 xlwt-1.1.2
[192.168.1.6]20200303-222848 175   pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent.  OK
[192.168.1.6]20200303-222858 453   apps isolate mode: virutalenv
Ignoring indexes: http://192.168.1.6:8080/simple
Requirement already satisfied (use --upgrade to upgrade): Django==1.8.11 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): PyMySQL==0.6.7 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 2))

省略部分输出..

[192.168.1.6]20200303-222926 151   install python package for virtualenv paas_agent done.
[192.168.1.6]20200303-222927 468   local nginx is required for paas_agent. going to install it.
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Package 1:nginx-1.12.2-2.el7.x86_64 already installed and latest version
Nothing to do
[192.168.1.6]20200303-222934 175   render: #etc#nginx.conf -> /data/bkce//etc/nginx.conf.  OK
[192.168.1.6]20200303-222935 175   render: #etc#nginx#paasagent.conf -> /data/bkce//etc/nginx/paasagent.conf.  OK
[192.168.1.6]20200303-222936 322   PLACE HOLDER __SID__ is replaced into empty
[192.168.1.6]20200303-222937 322   PLACE HOLDER __TOKEN__ is replaced into empty
[192.168.1.6]20200303-222937 175   render: #etc#paas_agent_config.yaml.tpl -> /data/bkce//etc/paas_agent_config.yaml.  OK
[192.168.1.6]20200303-222938 175   render: #etc#supervisor-paas_agent.conf -> /data/bkce//etc/supervisor-paas_agent.conf.  OK
[192.168.1.6]20200303-222939 56   install appt(allproject) done

                         initdata for appt()                         
[192.168.1.6]20200303-222946 182   exec initdata_appt on 192.168.1.6
[192.168.1.6]20200303-222958 262   update config file: paas_agent_config.yaml
[192.168.1.6]20200303-222958 268   register appt succeded.
[192.168.1.6]20200303-222958 502   create database bksuite_common
[192.168.1.6]20200303-222958 504   add version info to db
[192.168.1.6]20200303-223001 98   starting appt(ALL) on host: 192.168.1.6
[192.168.1.6]20200303-223052 77   activate appt(192.168.1.6) succeded

#这里appt已经安装成功,接下来安装appo

省略部分输出..

                          install appo(all)                          
[192.168.1.6]20200303-223102 112   check dependences for paas_agent

省略部分输出..

                         initdata for appo()                         
[192.168.1.6]20200303-223509 182   exec initdata_appo on 192.168.1.6
[192.168.1.6]20200303-223533 262   update config file: paas_agent_config.yaml
[192.168.1.6]20200303-223534 268   register appo succeded.
[192.168.1.6]20200303-223535 502   create database bksuite_common
[192.168.1.6]20200303-223535 504   add version info to db
[192.168.1.6]20200303-223541 98   starting appo(ALL) on host: 192.168.1.6
[192.168.1.6]20200303-223613 77   activate appo(192.168.1.6) succeded
[192.168.1.6] paas_agent()    paas_agent                       RUNNING   pid 23792, uptime 0:06:10
[192.168.1.6] nginx: RUNNING
[192.168.1.6] paas_agent()    paas_agent                       RUNNING   pid 23792, uptime 0:06:42
[192.168.1.6] nginx: RUNNING
[192.168.1.6] rabbitmq: RUNNING

如果以上步骤没有报错, 你现在可以完成正式环境及测试环境的部署,可以:
 1. 通过./bk_install saas-o bk_nodeman 部署节点管理app, 或
 2. 通过开发者中心部署app.
若要安装蓝鲸监控, 日志检索, 需要先通过 ./bk_install bkdata 安装 bkdata
[root@rbtnode1 install]# 

终于跌跌撞撞的解决了这个困惑许久的问题。后续自己还需要加强python和shell的脚本能力。

This entry was posted in 蓝鲸智云 and tagged , , , . Bookmark the permalink.