Use the information in this topic to troubleshoot problems that
you are having with client access to data.
Problem
This
topic describes how to determine why a client cannot access or update user
data.
Investigation
- If a client cannot create a new file
- Perform the following steps until you resolve the problem:
- Verify that the master metadata server is online.
- Verify that the client is connected to the master metadata server:
- From the client, attempt to ping or establish a Secure SHell (SSH) session
with the metadata server.
- From the Administrative command-line interface (CLI), run the lsclient command
to see a list of all clients currently being served by the metadata servers
in the cluster.
- Use the ls -l command to list the directory
in which the file is to be created to verify that the path is correct and
accessible.
- Verify that you are logged into the client with a user name that has permission
to write files to directories:
- From a UNIX® client,
use the id command.
- From a Windows® client, right-click the directory. Then click .
- Verify that your user name has permission to write files to the specific
directory.
Important: If you are logged into a UNIX client as
root or a Windows client as Administrator, make sure that you
are running on a privileged client.
- Make sure you have space in your system pool.
- If using hard quotas, check the quota of the fileset in which the parent
directory resides to ensure that there is sufficient space to accommodate
the new file and that the fileset is attached. If allocating blocks fails,
check the quota. When it meets the hard quota, the client cannot allocate
blocks for a file until some of the old blocks are free.
- Access the master metadata server through either the SAN File System console
or the CLI.
- List the filesets to view the server to which the fileset is attached,
the quota percentage and type, the attach point, and the directory:
- From the CLI, run the sfscli lsfileset -l command.
- From the SAN File System console, click .
- Make sure that the storage pool in which the file is stored has sufficient
space. Check the active policy set to validate the placement rule that applies
to your filename.
- Verify that the client can access the storage device:
- If the client is running Data Path Optimizer (DPO), use the datapath
query command to ensure that the operating system can access
the storage device.
- On an AIX® client,
use the stfsdisk command to determine which LUNs
can be accessed. Then, use the lsvol -l command
to correlate the LUN with the device. You can also use the lsdev
-C command to see the physical LUNs.
- On a Linux client, entercat /proc/fs/sanfs/client/<client
name>/disk/access to determine which LUNs you can access. Then
use the lsvol -1 command to correlate the LUN with
the device. Also, check the System Log for any messages about inaccessible
LUNs and volumes.
- Suspect a problem with corrupt SAN File System metadata.
- If a client cannot access an existing file
- Perform the following steps until you resolve the problem:
- Verify that your user name has permission to read that specific file.
- Verify that the master MDS is online.
- Verify that the client has access to the cluster using SSH and the lsclient command.
- Run the lsclient command on the server fileset
to ensure that the client lease is valid with the server.
- If the client is trying to access root-privileged files, ensure that the
client has administrative privileges (AdmPriv) on the server.
- Suspect a problem with corrupt SAN File System metadata.
- If a client cannot update the attributes of an existing file
- Perform the following steps until you resolve the problem:
- Verify that your user name has permission to write to the file.
- On a Windows client, you can change attributes on Windows domain
files only. On a UNIX client, you can change attributes on UNIX domain files
only.
- Verify that the master MDS is online.
- Verify that the client has access to the cluster using SSH and the lsclient command.
- Run the lsclient command on the server fileset
to ensure that the client lease is valid with the server.
- If the client is trying to access root-privileged files, ensure that the
client has administrative privileges (AdmPriv) on the server.
- Make sure you are a local root superuser or a Windows administrator.
If you are and the client does not have administrative privileges, you might
not be able to update the attributes of an existing file.
- Suspect a problem with corrupt SAN File System metadata.
- If a client cannot access any data
- Perform the following steps until the problem is resolved:
- Verify that the client can access the cluster:
From the client, sign
on to the SAN File System console. If you cannot sign on, suspect one of the
following conditions:
- One or more metadata servers in the cluster are down. See Troubleshooting the cluster.
Tip: The administrative agent automatically
attempts to restart a metadata server if it goes down for any reason. Therefore,
if you cannot access a metadata server, wait a few minutes and try again to
ensure that it is not in the process of restarting.
- An IP network problem occurred between the client and the cluster. See Isolating problems with the SAN File System.
View the system logs and look for errors that might indicate I/O
errors. Attempt to resolve these errors. - From a client running Windows, use the Event Viewer to view
the Event Log or the use the sanfstrace utility to determine errors.
- From a client running on any supported UNIX platform,
you can view logging with the syslog facility and tracing
output using the sanfstrace utility.
- From the CLI on the master metadata server, run the lslun command
to verify that the LUNs are available. To see the LUNs that are visible to
each client, use the command lslun -client <client_name>
for each client.
If the LUNs are not available, you can rediscover all
LUNs by running the following commands from the master metadata server:
- Run the stopserver command from the CLI to stop SAN File System.
- Run the rmmod qla2300command.
- Run the modprobe qla2300command.
- Run the /etc/rc.d/init.d/sanfs start command to start SAN File
System.
- Run the startserver command from the CLI to start the metadata
server.
- Run the lsclient command on the server to see
if this client is recognized and has an active session with the server.
- Verify that the client can access the storage device:
- If the client is running DPO, use the datapath query device command
to ensure that the operating system can access the storage device.
- On a client running AIX, use the stfsdisk command
to determine which disks can be accessed. Then, from the CLI, use the lsvol
-l command to correlate the disk with the device. You can also
use the lsdev -C command to see the physical disks.
- On a Linux client, entercat /proc/fs/sanfs/client/<client
name>/disk/access to determine which LUNs you can access. Then,
run the lsvol -1 command from the CLI to correlate
the LUN with the device.
To rediscover all LUNs on a client running AIX, run the client command stfsdisk
-discover.
To rediscover SAN File System LUNs on a Linux
client, run the following command: $ echo 1 > /proc/fs/sanfs/client/<client
name>/disk/discover
The LUNs should automatically
be rediscovered on a client running Windows. If they are not, use the following
steps:
- Right-click My Computer.
- Click Manage.
- Click .
- Click .
- Restart clients.
- For clients running on UNIX, using the following steps:
- Run the rmstclient command to unmount the global
namespace, remove the virtual client, and unload the file-system driver.
- Run the setupclient command to load the file-system
driver, create the virtual client, and mount the global namespace.
- On clients running Windows, restart the system.
- If the client cannot access user data, suspect the SAN route from the
client. See Isolating problems with the SAN File System.
- Suspect a problem with corrupt SAN File System metadata.
- If a client running Windows receives delayed write failure
errors
- A delayed write failure error might appear as a message box on the client
desktop or in the Event Log. This message indicates that an error occurred
when writing data from the local file system cache to a storage device. Perform
the following steps until you resolve the problem:
- The message includes the name of the file where the error occurred. Note
the name of this file because the data it contains might have been corrupted,
and the application using this file might encounter problems using this file's
data.
- If the file is not part of SAN File System, refer to your system documentation
for resolving file system errors. If the file is part of SAN File System,
view the Event Log and resolve errors that might relate to this problem.
- Suspect a communication problem with either the SAN or between the client
and the metadata server.
- If a client running UNIX receives delayed write failure errors
- A delayed write-failure error appears in the System Log. These errors
are usually preceded in the log by SCSI or other input/output errors.