Running “ls -la /” hangs, yet running “ls -la” to other root directories (i.e. ls -la /usr) does NOT hang. And the system logs ( i.e. /var/adm/messages) shows NFS related errors, even though this is NOT a true NFS client.
Here are the some sample errors that may appear in the /var/adm/messages file when the “ls -la /” hangs:
Mar 28 09:23:19 moe nfs: [ID 333984 kern.notice] NFS server for volume management (/vol) not responding still trying
Mar 28 09:31:13 moe nfs: [ID 664466 kern.notice] NFS getattr failed for server for volume management (/vol): error 23 (RPC: Unitdata error)
In this particular situation, this client was mounting a CD remotely from another system, which was shutdown before unsharing the CD, and before the client could unmount the remote CD mount. The tail end of a truss shows that it was hanging on /vol as well (line numbers set and it was hanging on line 256-257):
# cd /
# truss -fall -vall -wall -rall ls -la
249 4884/1: lstat64(“./xfn”, 0xFFBEFAC0) = 0
250 4884/1: d=0×04680002 i=7 m=0040555 l=1 u=0 g=0 sz=1
251 4884/1: at = Mar 27 14:15:01 EST 2002 [ 1017256501 ] 252 4884/1: mt = Mar 27 14:15:01 EST 2002 [ 1017256501 ] 253 4884/1: ct = Mar 8 20:24:51 EST 2002 [ 1015637091 ] 254 4884/1: bsz=8192 blks=1 fs=autofs
255 4884/1: acl(“./xfn”, GETACLCNT, 0, 0×00000000) = 4
256 4884/1: lstat64(“./vol”, 0xFFBEFAC0) (sleeping…)
257 4884/1: lstat64(“./vol”, 0xFFBEFAC0) Err#131 ECONNRESET
258 4884/1: Received signal #2, SIGINT [default] 259 4884/1: *** process killed ***
Err#131 ECONNRESET says that ‘Connection reset by peer’ that means A connection was forcibly closed by a peer. This normally results from a loss of the connection on the remote host because of a timeout or a reboot.
Follow below instructions to troubleshoot:
When ” ls -la / ” hangs, check the /etc/mnttab file for a PID associated with /vol. If you run ” ps -ef | grep vol-PID ” and it does not come back with any processes. Use below command to get the PID
# grep “/vol” /etc/mnttab
moe:vold(pid222) /vol nfs ignore,dev=39c0001 1017179906
# ps -ef | grep 222 <== no output returns
The real solution is to unmount /vol:
# umount /vol
You may have to force the unmount. The -f option (forcibly umount) is only available in Solaris 8 Operating Environment.
# umount -f /vol
If you are NOT running Solaris 8, you may have to do a reboot to clear the “ls -la” hang.
Note: Check in the /var/statmon/sm and /var/statmon/sm.bak directories to see if there is a connection still open for this server. If there is, then there is a chance of the system looking to remount the filesystem after reboot. The system will not try to remount if the umount command is successful
Republished by Blog Post Promoter
You might be interested to read below :
- Solaris Troubleshooting : Display 3 timestamps ( at, ct, mt) for a file – in solaris
- Solaris Troubleshooting : netstat -a reports connections in the BOUND state
- Solaris Troubleshooting NIS : Adding user to NIS, when passwd file is not in /etc
- Eleven Reasons that causes INIT: Cannot create /var/adm/utmpx – Solaris
- Hands on Lab – Replacing Failed Disks from ZFS Pools ( RaidZ2 / RaidZ3 ) – Part2
- Enabling SVM in Failsafe and password recovery in Solaris.
- Hands on Lab – Replacing Failed Disks from ZFS Pools ( Simple / Mirrored / RaidZ )
- Oracle Server Hardware Reference ( 3D View)
- Powerdown parameter for Solaris boxes.
- System Controller battery (BATTERY at SC/BAT/V_BAT has exceeded low warning) failure in T-series servers.