小知识:TFA收集日志报错空间不足

今天在某客户环境下分析某节点驱逐的故障,发现有安装TFA,所以使用一键收集包含故障时刻的日志

tfactl diagcollect -from "2020-08-14 03:00:00" -to "2020-08-14 05:00:00" -all

结果收集日志时报错空间不足:

Not enough space in Repository or TFA_BASE to run collections

实际df查看对应目录的空间充足,这实际上是受限TFA repository的Maximum Size (MB) 设置,默认一般是10GB,客户的环境由于保存osw时间过长,导致已超出这个大小,进而使收集日志报错空间不足。
根据MOS文档:TFA Diagcollection Reports “Not enough space in Repository or TFA_BASE to run collections” (Doc ID 2300038.1)
有明确的解决方案:

  1. tfactl set reposizeMB=10240
  2. tfactl print repository

Notably, The repository location can be changed using tfactl set repositorydir=

根据MOS的方案,我们查看当前值,结合实际情况设置为合适的值,注意需要使用root用户操作:

tfactl print repository
tfactl set reposizeMB=20480

甚至在极端场景下,目录空间不够,可以设置其他有空间剩余的目录:

mkdir /tmp/repository
tfactl set repositorydir=/tmp/repository

再次尝试TFA快速收集相关日志:

tfactl diagcollect -from "2020-08-14 03:00:00" -to "2020-08-14 05:00:00" -all

可以成功收集所需日志:

[root@db01 grid]# tfactl diagcollect -from "2020-08-14 03:00:00" -to "2020-08-14 05:00:00" -all
The -all switch is being deprecated as collection of all components is the default behavior. TFA will continue to collect all components.
Collecting data for all nodes
Scanning files from aug/14/2020 03:00:00 to aug/14/2020 05:00:00

Collection Id : 20200814235440db01

Detailed Logging at : /tmp/repository/collection_Fri_Aug_14_23_54_41_CST_2020_node_all/diagcollect_20200814235440_db01.log
2020/08/14 23:54:51 CST : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2020/08/14 23:54:51 CST : Collection Name : tfa_Fri_Aug_14_23_54_41_CST_2020.zip
2020/08/14 23:54:51 CST : Collecting diagnostics from hosts : [db01, db02]
2020/08/14 23:54:52 CST : Scanning of files for Collection in progress...
2020/08/14 23:54:52 CST : Collecting additional diagnostic information...
2020/08/14 23:55:37 CST : Getting list of files satisfying time range [08/14/2020 03:00:00 CST, 08/14/2020 05:00:00 CST]
2020/08/14 23:55:50 CST : Collecting ADR incident files...
2020/08/14 23:56:49 CST : Completed collection of additional diagnostic information...
2020/08/14 23:56:50 CST : Completed Local Collection
2020/08/14 23:56:50 CST : Remote Collection in Progress...
.---------------------------------.
|        Collection Summary       |
+------+-----------+-------+------+
| Host | Status    | Size  | Time |
+------+-----------+-------+------+
| db02 | Completed | 803kB | 128s |
| db01 | Completed | 1.2MB | 118s |
'------+-----------+-------+------'

Logs are being collected to: /tmp/repository/collection_Fri_Aug_14_23_54_41_CST_2020_node_all
/tmp/repository/collection_Fri_Aug_14_23_54_41_CST_2020_node_all/db01.tfa_Fri_Aug_14_23_54_41_CST_2020.zip
/tmp/repository/collection_Fri_Aug_14_23_54_41_CST_2020_node_all/db02.tfa_Fri_Aug_14_23_54_41_CST_2020.zip

我这里是测试环境演示,没什么太多信息所以日志比较小,实际生产环境,这个压缩文件一般会大一些。

This entry was posted in Oracle故障处理 and tagged . Bookmark the permalink.