设为首页 收藏本站
查看: 1477|回复: 0

[经验分享] /limits.conf Oracle bug引起的进程不够用

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2015-8-20 08:31:57 | 显示全部楼层 |阅读模式
今天在检查SMIDB的时候,发现CRS的告警日志中出现很多错误,具体为:
spacer.jpg 2015-08-19 17:12:21.745:
1
2
3
4
5
[/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"
2015-08-19 17:13:09.986:
[/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"
2015-08-19 17:13:21.758:
[/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"



进一步跟踪日志发现:
spacer.jpg
1
2
3
4
5
6
7
8
2015-08-19 17:14:09.993: [ora.LISTENER.lsnr][1342174976]{1:63186:26462} [check] clsn_agent::check: Exception SclsProcessSpawnException
2015-08-19 17:14:21.744: [ora.asm][1342174976]{0:21:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2015-08-19 17:14:21.759: [ora.asm][1342174976]{0:21:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl.
2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] (:CLSN00008:)Utils:execCmd scls_process_spawn() failed 1
2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] (:CLSN00008:) category: -2, operation: fork, loc: spawnproc28, OS error: 11, other: forked failed [-1]
2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] clsnUtils::error Exception type=2 string=
CRS-5013: Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"




ONS的日志:
spacer.jpg
1
2
3
4
5
6
7
8
9
10
11
[grid@smidb11 logs]$ tail ons.out
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
pthread_create() Resource temporarily unavailable
[2015-05-07T03:09:22+08:00] [ons] [TRACE:2] [] [internal] ONS worker process stopped (0)




报这个错误说明是由于系统资源不足而导致的进程无法启动,检查ulimit设置
spacer.jpg
spacer.jpg
1
2
[grid@smidb11 logs]$ ulimit -u
10240



limit.conf
spacer.jpg
1
2
3
4
5
# End of file
grid soft nproc 10240
grid hard nofile 65536
oracle soft nproc 10240
oracle hard nofile 65536



limit.conf配置有一些问题,没有配置hard  nproc 和 soft nofle,下周一重启前进行修正
spacer.jpg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[grid@smidb11 pam.d]$ cat login
#%PAM-1.0
auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so
auth       include      system-auth
account    required     pam_nologin.so
account    include      system-auth
password   include      system-auth
# pam_selinux.so close should be the first session rule
session    required     pam_selinux.so close
session    required     pam_loginuid.so
session    optional     pam_console.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session    required     pam_selinux.so open
session    required     pam_namespace.so
session    optional     pam_keyinit.so force revoke
session    include      system-auth
-session   optional     pam_ck_connector.so
[grid@smidb11 pam.d]$




/etc/pam.d/login 文件没有添加资源限制模块,这里应该添加一行
session required /lib64/security/pam_limits.so
经过网上查找资料,发现Oracle MOS上面的一个文档,和我们的情况完全一致:
The processes and resources started by CRS (Grid Infrastructure) do not inherit the ulimit setting for "max user processes" from /etc/security/limits.conf setting (文档 ID 1594606.1)
spacer.jpg
通过验证,发现虽然我们的grid用户的ulimit -u已经设置为10240.但是实际运行的时候依然是1024.
这个是Oracle的一个Bug 17301761 ,我们的数据库版本是11.2.0.4,正好是这个bug的影响范围.
解决办法有两个,
1. 打补丁
2. 通过MOS给出的办法进行规避,如下:

The ohasd script needs to be modified to setthe ulimit explicitly for all grid and database resources that are started bythe Grid Infrastructure (GI).

1) go to GI_HOME/bin

2) make a backup of ohasd script file

3) in the ohasd script file, locate thefollowing code:

    Linux)
        # MEMLOCK limit is for Bug 9136459
        ulimit -l unlimited
        if [ "$?" != "0"]
        then
            $CLSECHO -phas -f crs -l -m 6021 "l" "unlimited"
        fi
        ulimit -c unlimited
        if [ "$?" != "0"]
        then
            $CLSECHO -phas -f crs -l -m 6021 "c" "unlimited"
        fi
        ulimit -n 65536

In the above code, insert the following linejust before the line with "ulimit -n 65536"

       ulimit -u 16384

4) Recycle CRS manually so that the ohasdwill not use new ulimit setting for open files.
After the database is started, please issue "ps -ef | grep pmon" andget the pid of it.
Then, issue "cat /proc/<pid of the pmon proces>/limits | grepprocess" and find out if the Max process is set to 16384.
Setting the number of processes to 16384 should be enough for most serverssince having 16384 processes normally mean the server to loaded veryheavily.  using smaller number like 4096 or 8192 should also suffice formost users.
In addition to above, the ohasd template needs to be modified to insure thatnew ulimit setting persists even after a patch is applied.
1) go to GI_HOME/crs/sbs

2) make a backup of crswrap.sh.sbs

3) in crswrap.sh.sbs, insert the followingline just before the line "# MEMLOCK limit is for Bug 9136459"

       ulimit -u 16384
Finally, although the above setting is successfully used to increase the numberof processes setting, please test this on the test server first before settingthe ulimit on the production.



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-101417-1-1.html 上篇帖子: 安装Oracle的时候报SWAP空间不足的处理方法 下篇帖子: 使用awrrpt.sql 生成AWR报告的方法 Oracle
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表