TCI Systems, Inc.

15-07 132nd Street   College Point, NY  11356-2441

718-888-8898      www.tcisystems.com      Fax 718-888-8945

 

The following information is “Confidential Work Product”. of TCI Systems, Inc.  It is made available on this Web Site as a service to TCI’s customers.  This information is not for distribution, in any form, to any third party without TCI Systems, Inc.’s written permission.

 

Although this information is believed to be accurate and has been tested in the TCI laboratory, all recipients of this information are advised to use their own judgment in utilizing this information.  TCI Systems, Inc. will not be responsible for any damages that may result from the use of the procedures or information presented on this Web Site.  By using the information contained herein, the customer agrees to hold TCI Systems, Inc. harmless

 

 

ARCSERVE Warning

 

Do Not Use the “FULL Erase” command!

 

“Full Erase” is a LongSCSI command that causes substantial delays until the command is completed.

 

TCI has evaluated the workarounds and has determined that the most reasonable course of action is to avoid the “FULL ERASE”.

 

Here is an excerpt from a CA Support Document:

During "long SCSI commands", you will experience some or all of the following symptoms:

  1. Server utilization will fluctuate to as high as 100% until the process is done, resulting cpu hog may cause server to abend
  2. Server will appear sluggish to end users that are using the file system
  3. Users may be disconnected from the server
  4. Server console will not respond normally, i.e., you enter a command and the command is not processed until the scsi operation is finished. In the case of a full erase this can be hours
  5. Server may abend if the device manager is used from a workstation

What is a "long SCSI command"?
They are scsi commands that take some time to complete such as LOAD, Full ERASE operations, REWIND from the end of tape, and changer specific commands like MOVE MEDIUM and INIT ELEMENT STAT. They are considered "long" because they can take minutes or hours to run as opposed to fractions of a second for more regularly used commands like READ, WRITE, etc.

Why does this happen?
This occurs on a NetWare server due to the use of Real Mode Interrupts. Because NetWare sits on top of a DOS kernel (16 bit OS) it must use Real Mode interrupts to accomplish certain time slicing of the CPU instead of Protected Mode interrupts. For instance, when a Read command is sent to the SCSI bus, the command is executed and an interrupt is returned to signal completion of that command which in turn allows the execution of the next command queued for CPU access. Since any single Read command may take very little time to complete the server and the user do not notice the wait. If, however, a Rewind command is sent to the SCSI bus and the tape is at the End Of Media mark this command can take as long as 5 to 10 minutes depending on the tape and drive used. During the time NetWare is waiting to receive such an interrupt, the server hangs. After the command is completed the interrupt is returned as with the completion of a read command and the server begins to respond. With Protected Mode Interrupts, the control of the CPU would be shared and the server would not hang.

 

Here is an excerpt from a Novell Document that further explains some of the issues:

The problem occurs because the server switches to real mode during a long scsi command. This results in high utilization on the server with the loss of all IO to the workstations. This is done so that requests made in protected mode are preserved until the server switches back to protected mode. The call NPA_Squelch_All_IO in the nwpa.nlm is what does this. Usually this does not constitute a problem except when devices are used that use long scsi commands such as a tape library. Users will lose access to the server and appear to hang.

The fix is to cause the server to switch to real mode less often. The most effective means is the dosfat.nss contained in this file. This will mount the server's local drives as netware volumes. Since the local drives are no longer in the dos enviornment, the server does not have to switch to real mode to access them.

If there is software on the server that causes the switch to real mode, then you will still have problems.

 

Here is an excerpt that is the basis for our recommendation:

 

Dosfat.nss recommends that Auto Restart After Abend = 0 to prevent corruption of the DOS Partition. To clarify just what was meant by this, when auto restart after abend is set to just suspend a thread and continue with normal server operations, you run the risk of introducing corruption on a mounted DOS volume. (We are talking about corruption of the DOS files here, not NetWare files.). Traditional NetWare volumes are protected against Abends by TTS, or corrected using VRepair. NSS volumes are protected becausethey're journaled. DOS, however, does not have that same protection. That is why the recommendation is in place. (Also, nss has its own cache apart from dos. When the server is in the debugger to get a coredump, the dump uses dos cache and not nss. If nss happend to be in the middle of writing a file at the time the server abended, the coredump could overwrite the nss file causing corruption, because each cache system does not know about the other)

 

Now that the C partition is mounted as a NetWare volume by DOSFAT.NSS, this chance of corruption now extends to the DOS partition. In NetWare 5 and 6, the use of the C partition to store information has increased. Not only does the server write the Abend.log file there (as in the past) but that is where the server registry is located. Additionally, other third party vendors utilize the C partition for their files as well. These vendors include but are not limited to QLogic and Compaq. Based on this information it was the engineers recommendation to set auto restart after abend = 0.

 

In the event of file corruption on NetWare / NSS volumes, utilities exist to help correct problems with files should any be found. However, there aren't utilities / checks in place to repair or prevent that corruption from affecting the files on the DOS partition.
Other precautions should be taken if you choose to set Auto Restart After Abend to 2. These precautions include making a periodic copy of the existing startup directory on the C partition (nwserver) to an alternate directory or CD to ensure that a file or the entire directory can be restored quickly in the event of disk corruption. In addition to this, the scan disk utility can be added to the DOS partition if it not already there and it can be launched out of the AUTOEXEC.BAT file for server boot. Since the amount of data on any server's DOS partition should be quite low, the utility can be used to correct any issues that may pop up and it would do it quickly.