smartd - SMART Disk Monitoring Daemon
smartd will attempt to enable SMART monitoring on ATA devices (equivalent to smartctl -s on) and polls these and SCSI devices every 30 minutes (configurable), logging SMART errors and changes of SMART Attributes via the SYSLOG interface. The default location for these SYSLOG notifications and warnings is /var/log/messages. To change this default location, please see the '-l' command-line option described below.
In addition to logging to a file, smartd can also be configured to send email warnings if problems are detected. Depending upon the type of problem, you may want to run self-tests on the disk, back up the disk, replace the disk, or use a manufacturer's utility to force reallocation of bad or unreadable disk sectors. If disk problems are detected, please see the smartctl manual page and the smartmontools web page/FAQ for further guidance.
If you send a USR1 signal to smartd it will immediately check the status of the disks, and then return to polling the disks every 30 minutes. See the '-i' option below for additional details.
smartd can be configured at start-up using the configuration file /etc/smartd.conf (Windows: ./smartd.conf). If the configuration file is subsequently modified, smartd can be told to re-read the configuration file by sending it a HUP signal, for example with the command: killall -HUP smartd. (Windows: See NOTES below.)
On startup, if smartd finds a syntax error in the configuration file, it will print an error message and then exit. However if smartd is already running, then is told with a HUP signal to re-read the configuration file, and then find a syntax error in this file, it will print an error message and then continue, ignoring the contents of the (faulty) configuration file, as if the HUP signal had never been received.
When smartd is running in debug mode, the INT signal (normally generated from a shell with CONTROL-C) is treated in the same way as a HUP signal: it makes smartd reload its configuration file. To exit smartd use CONTROL-\ (Cygwin: 2x CONTROL-C, Windows: CONTROL-Break).
On startup, in the absence of the configuration file /etc/smartd.conf, the smartd daemon first scans for all devices that support SMART. The scanning is done as follows:
smartd then monitors for all possible SMART errors (corresponding to the '-a' Directive in the configuration file; see CONFIGURATION FILE below).
Read smartd configuration Directives from FILE, instead of from the default location /etc/smartd.conf (Windows: ./smartd.conf). If FILE does not exist, then smartd will print an error message and exit with nonzero status. Thus, '-c /etc/smartd.conf' can be used to verify the existence of the default configuration file.
By using '-' for FILE, the configuration is read from standard input. This is useful for commands like:
echo /dev/hdb -m user@home -M test | smartd -c - -q onecheckto perform quick and simple checks without a configuration file.
Windows only: The "debug" mode can be toggled by the command smartd sigusr2. A new console for debug output is opened when debug mode is enabled.
Note that the superuser can make smartd check the status of the disks at any time by sending it the SIGUSR1 signal, for example with the command:
kill -SIGUSR1 <pid>where <pid> is the process id number of smartd. One may also use:
killall -USR1 smartdfor the same purpose. (Windows: See NOTES below.)
If you would like to have smartd messages logged somewhere other than the default /var/log/messages location, this can typically be accomplished with (for example) the following steps:
local3.* /var/log/smartd.logThis tells syslogd to log all the messages from facility local3 to the designated file: /var/log/smartd.log.
Cygwin: Support for syslogd as described above is available starting with Cygwin 1.5.15. On older releases or if no local syslogd is running, the '-l' option has no effect. In this case, all syslog messages are written to Windows event log or to file C:/CYGWIN_SYSLOG.TXT if the event log is not available.
Windows: Some syslog functionality is implemented internally in smartd as follows: If no '-l' option (or '-l daemon') is specified, messages are written to Windows event log or to file ./smartd.log if event log is not available (Win9x/ME or access denied). By specifying other values of FACILITY, log output is redirected as follows: '-l local0' to file ./smartd.log, '-l local1' to standard output (redirect with '>' to any file), '-l local2' to standard error, '-l local[3-7]': to file ./smartd[1-5].log.
When using the event log, the enclosed utility syslogevt.exe should be registered as an event message file to avoid error messages from the event viewer. Use 'syslogevt -r smartd' to register, 'syslogevt -u smartd' to unregister and 'syslogevt' for more help.
nodev - Exit if there are no devices to monitor, or if any errors are found at startup in the configuration file. This is the default.
errors - Exit if there are no devices to monitor, or if any errors are found in the configuration file /etc/smartd.conf at startup or whenever it is reloaded.
nodevstartup - Exit if there are no devices to monitor at startup. But continue to run if no devices are found whenever the configuration file is reloaded.
never - Only exit if a fatal error occurs (no remaining system memory, invalid command line arguments). In this mode, even if there are no devices to monitor, or if the configuration file /etc/smartd.conf has errors, smartd will continue to run, waiting to load a configuration file listing valid devices.
onecheck - Start smartd in debug mode, then register devices, then check device's SMART status once, and then exit with zero exit status if all of these steps worked correctly.
This last option is intended for 'distribution-writers' who want to create automated scripts to determine whether or not to automatically start up smartd after installing smartmontools. After starting smartd with this command-line option, the distribution's install scripts should wait a reasonable length of time (say ten seconds). If smartd has not exited with zero status by that time, the script should send smartd a SIGTERM or SIGKILL and assume that smartd will not operate correctly on the host. Conversely, if smartd exits with zero status, then it is safe to run smartd in normal daemon mode. If smartd is unable to monitor any devices or encounters other problems then it will return with non-zero exit status.
showtests - Start smartd in debug mode, then register devices, then write a list of future scheduled self tests to stdout, and then exit with zero exit status if all of these steps worked correctly. Device's SMART status is not checked.
This option is intended to test whether the '-s REGEX' directives in smartd.conf will have the desired effect. The output lists the next test schedules, limited to 5 tests per type and device. This is followed by a summary of all tests of each device within the next 90 days.
ioctl - report all ioctl() transactions.
ataioctl - report only ioctl() transactions with ATA devices.
scsiioctl - report only ioctl() transactions with SCSI devices.
Any argument may include a positive integer to specify the level of detail that should be reported. The argument should be followed by a comma then the integer with no spaces. For example, ataioctl,2 The default level is 1, so '-r ataioctl,1' and '-r ataioctl' are equivalent.
On Cygwin, this option simply prevents fork'ing into background mode to allow running smartd as service via cygrunsrv, see NOTES below.
On Windows, this option enables the buildin service support. The option must be specified in the service command line as the first argument. It should not be used from console. See NOTES below for details.
smartd Runs the daemon in forked mode. This is the normal way to run smartd. Entries are logged to SYSLOG (by default /var/log/messages.)
smartd -d -i 30 Run in foreground (debug) mode, checking the disk status every 30 seconds.
smartd -q onecheck Registers devices, and checks the status of the devices exactly once. The exit status (the bash $? variable) will be zero if all went well, and nonzero if no devices were detected or some other problem was encountered.
Note that smartmontools provides a start-up script in /etc/rc.d/init.d/smartd which is responsible for starting and stopping the daemon via the normal init interface. Using this script, you can start smartd by giving the command:
/etc/rc.d/init.d/smartd startand stop it by using the command:
/etc/rc.d/init.d/smartd stopIf you want smartd to start running whenever your machine is booted, this can be enabled by using the command:
/sbin/chkconfig --add smartdand disabled using the command:
/sbin/chkconfig --del smartd
This can be annoying if you have an ATA or SCSI device that hangs or misbehaves when receiving SMART commands. Even if this causes no problems, you may be annoyed by the string of error log messages about block-major devices that can't be found, and SCSI devices that can't be opened.
One can avoid this problem, and gain more control over the types of events monitored by smartd, by using the configuration file /etc/smartd.conf. This file contains a list of devices to monitor, with one device per line. An example file is included with the smartmontools distribution. You will find this sample configuration file in /usr/share/doc/smartmontools-5.36/. For security, the configuration file should not be writable by anyone but root. The syntax of the file is as follows:
Here is an example configuration file. It's for illustrative purposes only; please don't copy it onto your system without reading to the end of the DIRECTIVES Section below!
################################################ # This is an example smartd startup config file # /etc/smartd.conf for monitoring three # ATA disks, three SCSI disks, six ATA disks # behind two 3ware controllers and one SATA disk # # First ATA disk on two different interfaces. On # the second disk, start a long self-test every # Sunday between 3 and 4 am. # /dev/hda -a -m [email protected],root@localhost /dev/hdc -a -I 194 -I 5 -i 12 -s L/../../7/03 # # SCSI disks. Send a TEST warning email to admin on # startup. # /dev/sda /dev/sdb -m [email protected] -M test # # Strange device. It's SCSI. Start a scheduled # long self test between 5 and 6 am Monday/Thursday /dev/weird -d scsi -s L/../../(1|4)/05 # # Linux-specific: SATA disk using the libata # driver. This requires a 2.6.15 or greater # kernel. The device entry is SCSI but the # underlying disk understands ATA SMART commands /dev/sda -a -d ata # # Four ATA disks on a 3ware 6/7/8000 controller. # Start short self-tests daily between midnight and 1am, # 1-2, 2-3, and 3-4 am. Starting with the Linux 2.6 # kernel series, /dev/sdX is deprecated in favor of # /dev/tweN. For example replace /dev/sdc by /dev/twe0 # and /dev/sdd by /dev/twe1. /dev/sdc -d 3ware,0 -a -s S/../.././00 /dev/sdc -d 3ware,1 -a -s S/../.././01 /dev/sdd -d 3ware,2 -a -s S/../.././02 /dev/sdd -d 3ware,3 -a -s S/../.././03 # # Two ATA disks on a 3ware 9000 controller. # Start long self-tests Sundays between midnight and # 1am and 2-3 am /dev/twa0 -d 3ware,0 -a -s L/../../7/00 /dev/twa0 -d 3ware,1 -a -s L/../../7/02 # # The following line enables monitoring of the # ATA Error Log and the Self-Test Error Log. # It also tracks changes in both Prefailure # and Usage Attributes, apart from Attributes # 9, 194, and 231, and shows continued lines: # /dev/hdd -l error \ -l selftest \ -t \ # Attributes not tracked: -I 194 \ # temperature -I 231 \ # also temperature -I 9 # power-on hours # ################################################
If the first non-comment entry in the configuration file is the text string DEVICESCAN in capital letters, then smartd will ignore any remaining lines in the configuration file, and will scan for devices. DEVICESCAN may optionally be followed by Directives that will apply to all devices that are found in the scan. Please see below for additional details.
The following are the Directives that may appear following the device name or DEVICESCAN on any line of the /etc/smartd.conf configuration file. Note that these are NOT command-line options for smartd. The Directives below may appear in any order, following the device name.
For an ATA device, if no Directives appear, then the device will be monitored as if the '-a' Directive (monitor all SMART properties) had been given.
If a SCSI disk is listed, it will be monitored at the maximum implemented level: roughly equivalent to using the '-H -l selftest' options for an ATA disk. So with the exception of '-d', '-m', '-l selftest', '-s', and '-M', the Directives below are ignored for SCSI disks. For SCSI disks, the '-m' Directive sends a warning email if the SMART status indicates a disk failure or problem, if the SCSI inquiry about disk status fails, or if new errors appear in the self-test log.
If a 3ware controller is used then the corresponding SCSI (/dev/sd?) or character device (/dev/twe? or /dev/twa?) must be listed, along with the '-d 3ware,N' Directive (see below). The individual ATA disks hosted by the 3ware controller appear to smartd as normal ATA devices. Hence all the ATA directives can be used for these disks (but see note below).
If none of these three arguments is given, then smartd will first attempt to guess the device type by looking at whether the sixth character in the device name is an 's' or an 'h'. This will work for device names like /dev/hda or /dev/sdb, and corresponds to choosing ata or scsi respectively. If smartd can't guess from this sixth character, then it will simply try to access the device using first ATA and then SCSI ioctl()s.
The valid arguments to this Directive are:
ata - the device type is ATA. This prevents smartd from issuing SCSI commands to an ATA device.
scsi - the device type is SCSI. This prevents smartd from issuing ATA commands to a SCSI device.
marvell - Under Linux, interact with SATA disks behind Marvell chip-set controllers (using the Marvell rather than libata driver).
3ware,N - the device consists of one or more ATA disks connected to a 3ware RAID controller. The non-negative integer N (in the range from 0 to 15 inclusive) denotes which disk on the controller is monitored. In log files and email messages this disk will be identified as 3ware_disk_XX with XX in the range from 00 to 15 inclusive.
This Directive may at first appear confusing, because the 3ware controller is a SCSI device (such as /dev/sda) and should be listed as such in the the configuration file. However when the '-d 3ware,N' Directive is used, then the corresponding disk is addressed using native ATA commands which are 'passed through' the SCSI driver. All ATA Directives listed in this man page may be used. Note that while you may use any of the 3ware SCSI logical devices /dev/sd? to address any of the physical disks (3ware ports), error and log messages will make the most sense if you always list the 3ware SCSI logical device corresponding to the particular physical disks. Please see the smartctl man page for further details.
ATA disks behind 3ware controllers may alternatively be accessed via a character device interface /dev/twe0-15 (3ware 6000/7000/8000 controllers) and /dev/twa0-15 (3ware 9000 series controllers). Note that the 9000 series controllers may only be accessed using the character device interface /dev/twa0-15 and not the SCSI device interface /dev/sd?. Please see the smartctl man page for further details.
Note that older 3w-xxxx drivers do not pass the 'Enable Autosave' (-S on) and 'Enable Automatic Offline' (-o on) commands to the disk, if the SCSI interface is used, and produce these types of harmless syslog error messages instead: '3w-xxxx: tw_ioctl(): Passthru size (123392) too big'. This can be fixed by upgrading to version 1.02.00.037 or later of the 3w-xxxx driver, or by applying a patch to older versions. See http://smartmontools.sourceforge.net/ for instructions. Alternatively use the character device interfaces /dev/twe0-15 (3ware 6/7/8000 series controllers) or /dev/twa0-15 (3ware 9000 series controllers).
cciss,N - the device consists of one or more SCSI disks connected to a cciss RAID controller. The non-negative integer N (in the range from 0 to 15 inclusive) denotes which disk on the controller is monitored. In log files and email messages this disk will be identified as cciss_disk_XX with XX in the range from 00 to 15 inclusive.
3ware and cciss controllers are currently ONLY supported under Linux.
removable - the device or its media is removable. This indicates to smartd that it should continue (instead of exiting, which is the default behavior) if the device does not appear to be present when smartd is started. This Directive may be used in conjunction with the other '-d' Directives.
ATA disks have five different power states. In order of increasing power consumption they are: 'OFF', 'SLEEP', 'STANDBY', 'IDLE', and 'ACTIVE'. Typically in the OFF, SLEEP, and STANDBY modes the disk's platters are not spinning. But usually, in response to SMART commands issued by smartd, the disk platters are spun up. So if this option is not used, then a disk which is in a low-power mode may be spun up and put into a higher-power mode when it is periodically polled by smartd.
Note that if the disk is in SLEEP mode when smartd is started, then it won't respond to smartd commands, and so the disk won't be registered as a device for smartd to monitor. If a disk is in any other low-power mode, then the commands issued by smartd to register the disk will probably cause it to spin-up.
The '-n' (nocheck) Directive specifies if smartd's periodic checks should still be carried out when the device is in a low-power mode. It may be used to prevent a disk from being spun-up by periodic smartd polling. The allowed values of POWERMODE are:
never - smartd will poll (check) the device regardless of its power mode. This may cause a disk which is spun-down to be spun-up when smartd checks it. This is the default behavior if the '-n' Directive is not given.
sleep - check the device unless it is in SLEEP mode.
standby - check the device unless it is in SLEEP or STANDBY mode. In these modes most disks are not spinning, so if you want to prevent a laptop disk from spinning up each time that smartd polls, this is probably what you want.
idle - check the device unless it is in SLEEP, STANDBY or IDLE mode. In the IDLE state, most disks are still spinning, so this is probably not what you want.
When a periodic test is skipped, smartd normally writes an informal log message. The message can be suppressed by appending the option ',q' to POWERMODE (like '-n standby,q'). This prevents a laptop disk from spinning up due to this message.
normal - do not try to monitor the disk if a mandatory SMART command fails, but continue if an optional SMART command fails. This is the default.
permissive - try to monitor the disk even if it appears to lack SMART capabilities. This may be required for some old disks (prior to ATA-3 revision 4) that implemented SMART before the SMART standards were incorporated into the ATA/ATAPI Specifications. This may also be needed for some Maxtor disks which fail to comply with the ATA Specifications and don't properly indicate support for error- or self-test logging.
[Please see the smartctl -T command-line option.]
The delay between tests is vendor-specific, but is typically four hours.
Note that SMART Automatic Offline Testing is not part of the ATA Specification. Please see the smartctl -o command-line option documentation for further information about this feature.
error - report if the number of ATA errors reported in the ATA Error Log has increased since the last check.
selftest - report if the number of failed tests reported in the SMART Self-Test Log has increased since the last check, or if the timestamp associated with the most recent failed test has increased. Note that such errors will only be logged if you run self-tests on the disk (and it fails a test!). Self-Tests can be run automatically by smartd: please see the '-s' Directive below. Self-Tests can also be run manually by using the '-t short' and '-t long' options of smartctl and the results of the testing can be observed using the smartctl '-l selftest' command-line option.]
[Please see the smartctl -l and -t command-line options.]
To schedule a short Self-Test between 2-3am every morning, use:
-s S/../.././02To schedule a long Self-Test between 4-5am every Sunday morning, use:
-s L/../../7/04To schedule a long Self-Test between 10-11pm on the first and fifteenth day of each month, use:
-s L/../(01|15)/./22To schedule an Offline Immediate test after every midnight, 6am, noon,and 6pm, plus a Short Self-Test daily at 1-2am and a Long Self-Test every Saturday at 3-4am, use:
-s (O/../.././(00|06|12|18)|S/../.././01|L/../../6/03)
Scheduled tests are run immediately following the regularly-scheduled device polling, if the current local date, time, and test type, match REGEXP. By default the regularly-scheduled device polling occurs every thirty minutes after starting smartd. Take caution if you use the '-i' option to make this polling interval more than sixty minutes: the poll times may fail to coincide with any of the testing times that you have specified with REGEXP, and so the self tests may not take place as you wish.
Before running an offline or self-test, smartd checks to be sure that a self-test is not already running. If a self-test is already running, then this running self test will not be interrupted to begin another test.
smartd will not attempt to run any type of test if another test was already started or run in the same hour.
Each time a test is run, smartd will log an entry to SYSLOG. You can use these or the '-q showtests' command-line option to verify that you constructed REGEXP correctly. The matching order (L before S before C before O) ensures that if multiple test types are all scheduled for the same hour, the longer test type has precedence. This is usually the desired behavior.
Unix users: please beware that the rules for extended regular expressions [regex(7)] are not the same as the rules for file-name pattern matching by the shell [glob(7)]. smartd will issue harmless informational warning messages if it detects characters in REGEXP that appear to indicate that you have made this mistake.
To prevent your email in-box from getting filled up with warning messages, by default only a single warning will be sent for each of the enabled alert types, '-H', '-l', '-f', '-C', or '-O' even if more than one failure or error is detected or if the failure or error persists. [This behavior can be modified; see the '-M' Directive below.]
To send email to more than one user, please use the following "comma separated" form for the address: user1@add1,user2@add2,...,userN@addN (with no spaces).
To test that email is being sent correctly, use the '-M test' Directive described below to send one test email message on smartd startup.
By default, email is sent using the system mail command. In order that smartd find the mail command (normally /bin/mail) an executable named 'mail' must be in the path of the shell or environment from which smartd was started. If you wish to specify an explicit path to the mail executable (for example /usr/local/bin/mail) or a custom script to run, please use the '-M exec' Directive below.
Note that by default under Solaris, in the previous paragraph, 'mailx' and '/bin/mailx' are used, since Solaris '/bin/mail' does not accept a '-s' (Subject) command-line argument.
On Windows, the 'Blat' mailer (http://blat.sourceforge.net/) is used by default. This mailer uses a different command line syntax, see '-M exec' below.
Note also that there is a special argument <nomailer> which can be given to the '-m' Directive in conjunction with the '-M exec' Directive. Please see below for an explanation of its effect.
If the mailer or the shell running it produces any STDERR/STDOUT output, then a snippet of that output will be copied to SYSLOG. The remainder of the output is discarded. If problems are encountered in sending mail, this should help you to understand and fix them. If you have mail problems, we recommend running smartd in debug mode with the '-d' flag, using the '-M test' Directive described below.
The following extension is available on Windows: By specifying 'msgbox' as a mail address, a warning "email" is displayed as a message box on the screen. Using both 'msgbox' and regular mail addresses is possible, if 'msgbox' is the first word in the comma separated list. With 'sysmsgbox', a system modal (always on top) message box is used. If running as a service, a service notification message box (always shown on current visible desktop) is used.
Multiple -M Directives may be given. If more than one of the following three -M Directives are given (example: -M once -M daily) then the final one (in the example, -M daily) is used.
The valid arguments to the -M Directive are (one of the following three):
once - send only one warning email for each type of disk problem detected. This is the default.
daily - send additional warning reminder emails, once per day, for each type of disk problem detected.
diminishing - send additional warning reminder emails, after a one-day interval, then a two-day interval, then a four-day interval, and so on for each type of disk problem detected. Each interval is twice as long as the previous interval.
In addition, one may add zero or more of the following Directives:
test - send a single test email immediately upon smartd startup. This allows one to verify that email is delivered correctly.
exec PATH - run the executable PATH instead of the default mail command, when smartd needs to send email. PATH must point to an executable binary file or script.
By setting PATH to point to a customized script, you can make smartd perform useful tricks when a disk problem is detected (beeping the console, shutting down the machine, broadcasting warnings to all logged-in users, etc.) But please be careful. smartd will block until the executable PATH returns, so if your executable hangs, then smartd will also hang. Some sample scripts are included in /usr/share/doc/smartmontools-5.36/examplescripts/.
The return status of the executable is recorded by smartd in SYSLOG. The executable is not expected to write to STDOUT or STDERR. If it does, then this is interpreted as indicating that something is going wrong with your executable, and a fragment of this output is logged to SYSLOG to help you to understand the problem. Normally, if you wish to leave some record behind, the executable should send mail or write to a file or device.
Before running the executable, smartd sets a number of environment variables. These environment variables may be used to control the executable's behavior. The environment variables exported by smartd are:
EmailTest: this is an email test message.
Health: the SMART health status indicates imminent failure.
Usage: a usage Attribute has failed.
SelfTest: the number of self-test failures has increased.
ErrorCount: the number of errors in the ATA error log has increased.
CurrentPendingSector: one of more disk sectors could not be read and are marked to be reallocated (replaced with spare sectors).
OfflineUncorrectableSector: during off-line testing, or self-testing, one or more disk sectors could not be read.
FailedHealthCheck: the SMART health status command failed.
FailedReadSmartData: the command to read SMART Attribute data failed.
FailedReadSmartErrorLog: the command to read the SMART error log failed.
FailedReadSmartSelfTestLog: the command to read the SMART self-test log failed.
FailedOpenDevice: the open() command to the device failed.
Sun Feb 9 14:58:19 2003 CST
If the '-m ADD' Directive is given with a normal address argument, then the executable pointed to by PATH will be run in a shell with STDIN receiving the body of the email message, and with the same command-line arguments:
-s "$SMARTD_SUBJECT" $SMARTD_ADDRESSthat would normally be provided to 'mail'. Examples include:
-m user@home -M exec /bin/mail -m admin@work -M exec /usr/local/bin/mailto -m root -M exec /Example_1/bash/script/below
Note that on Windows, the syntax of the 'Blat' mailer is used:
- -q -subject "$SMARTD_SUBJECT" -to "$SMARTD_ADDRESS"
If the '-m ADD' Directive is given with the special address argument <nomailer> then the executable pointed to by PATH is run in a shell with no STDIN and no command-line arguments, for example:
-m <nomailer> -M exec /Example_2/bash/script/belowIf the executable produces any STDERR/STDOUT output, then smartd assumes that something is going wrong, and a snippet of that output will be copied to SYSLOG. The remainder of the output is then discarded.
Some EXAMPLES of scripts that can be used with the '-M exec' Directive are given below. Some sample scripts are also included in /usr/share/doc/smartmontools-5.36/examplescripts/.
This is useful, for example, if you have a very old disk and don't want to keep getting messages about the hours-on-lifetime Attribute (usually Attribute 9) failing. This Directive may appear multiple times for a single device, if you want to ignore multiple Attributes.
This is useful, for example, if one of the device Attributes is the disk temperature (usually Attribute 194 or 231). It's annoying to get reports each time the temperature changes. This Directive may appear multiple times for a single device, if you want to ignore multiple Attributes.
A common use of this Directive is to track the device Temperature (often ID=194 or 231).
If this Directive is given, it automatically implies the '-r' Directive for the same Attribute, so that the Raw value of the Attribute is reported.
A common use of this Directive is to track the device Temperature (often ID=194 or 231). It is also useful for understanding how different types of system behavior affects the values of certain Attributes.
A pending sector is a disk sector (containing 512 bytes of your data) which the device would like to mark as ``bad" and reallocate. Typically this is because your computer tried to read that sector, and the read failed because the data on it has been corrupted and has inconsistent Error Checking and Correction (ECC) codes. This is important to know, because it means that there is some unreadable data on the disk. The problem of figuring out what file this data belongs to is operating system and file system specific. You can typically force the sector to reallocate by writing to it (translation: make the device substitute a spare good sector for the bad one) but at the price of losing the 512 bytes of data stored there.
An offline uncorrectable sector is a disk sector which was not readable during an off-line scan or a self-test. This is important to know, because if you have data stored in this disk sector, and you need to read it, the read will fail. Please see the previous '-C' option for more details.
none - Assume that the device firmware obeys the ATA specifications. This is the default, unless the device has presets for '-F' in the device database.
samsung - In some Samsung disks (example: model SV4012H Firmware Version: RM100-08) some of the two- and four-byte quantities in the SMART data structures are byte-swapped (relative to the ATA specification). Enabling this option tells smartd to evaluate these quantities in byte-reversed order. Some signs that your disk needs this option are (1) no self-test log printed, even though you have run self-tests; (2) very large numbers of ATA errors reported in the ATA error log; (3) strange and impossible values for the ATA error log timestamps.
samsung2 - In more recent Samsung disks (firmware revisions ending in "-23") the number of ATA errors reported is byte swapped. Enabling this option tells smartd to evaluate this quantity in byte-reversed order.
Note that an explicit '-F' Directive will over-ride any preset values for '-F' (see the '-P' option below).
[Please see the smartctl -F command-line option.]
This Directive may appear multiple times. Valid arguments to this Directive are:
9,minutes - Raw Attribute number 9 is power-on time in minutes. Its raw value will be displayed in the form 'Xh+Ym'. Here X is hours, and Y is minutes in the range 0-59 inclusive. Y is always printed with two digits, for example '06' or '31' or '00'.
9,seconds - Raw Attribute number 9 is power-on time in seconds. Its raw value will be displayed in the form 'Xh+Ym+Zs'. Here X is hours, Y is minutes in the range 0-59 inclusive, and Z is seconds in the range 0-59 inclusive. Y and Z are always printed with two digits, for example '06' or '31' or '00'.
9,halfminutes - Raw Attribute number 9 is power-on time, measured in units of 30 seconds. This format is used by some Samsung disks. Its raw value will be displayed in the form 'Xh+Ym'. Here X is hours, and Y is minutes in the range 0-59 inclusive. Y is always printed with two digits, for example '06' or '31' or '00'.
9,temp - Raw Attribute number 9 is the disk temperature in Celsius.
192,emergencyretractcyclect - Raw Attribute number 192 is the Emergency Retract Cycle Count.
193,loadunload - Raw Attribute number 193 contains two values. The first is the number of load cycles. The second is the number of unload cycles. The difference between these two values is the number of times that the drive was unexpectedly powered off (also called an emergency unload). As a rule of thumb, the mechanical stress created by one emergency unload is equivalent to that created by one hundred normal unloads.
194,10xCelsius - Raw Attribute number 194 is ten times the disk temperature in Celsius. This is used by some Samsung disks (example: model SV1204H with RK100-13 firmware).
194,unknown - Raw Attribute number 194 is NOT the disk temperature, and its interpretation is unknown. This is primarily useful for the -P (presets) Directive.
198,offlinescanuncsectorct - Raw Attribute number 198 is the Offline Scan UNC Sector Count.
200,writeerrorcount - Raw Attribute number 200 is the Write Error Count.
201,detectedtacount - Raw Attribute number 201 is the Detected TA Count.
220,temp - Raw Attribute number 220 is the disk temperature in Celsius.
Note: a table of hard drive models, listing which Attribute corresponds to temperature, can be found at: http://www.guzu.net/linux/hddtemp.db
N,raw8 - Print the Raw value of Attribute N as six 8-bit unsigned base-10 integers. This may be useful for decoding the meaning of the Raw value. The form 'N,raw8' prints Raw values for ALL Attributes in this form. The form (for example) '123,raw8' only prints the Raw value for Attribute 123 in this form.
N,raw16 - Print the Raw value of Attribute N as three 16-bit unsigned base-10 integers. This may be useful for decoding the meaning of the Raw value. The form 'N,raw16' prints Raw values for ALL Attributes in this form. The form (for example) '123,raw16' only prints the Raw value for Attribute 123 in this form.
N,raw48 - Print the Raw value of Attribute N as a 48-bit unsigned base-10 integer. This may be useful for decoding the meaning of the Raw value. The form 'N,raw48' prints Raw values for ALL Attributes in this form. The form (for example) '123,raw48' only prints the Raw value for Attribute 123 in this form.
use - use any presets that are available for this drive. This is the default.
ignore - do not use any presets for this drive.
show - show the presets listed for this drive in the database.
showall - show the presets that are available for all drives and then exit.
[Please see the smartctl -P command-line option.]
Note that -a is the default for ATA devices. If none of these other Directives is given, then -a is assumed.
If you are not sure which Directives to use, I suggest experimenting for a few minutes with smartctl to see what SMART functionality your disk(s) support(s). If you do not like voluminous syslog messages, a good choice of smartd configuration file Directives might be:
-H -l selftest -l error -f.If you want more frequent information, use: -a.
If DEVICESCAN is not followed by any Directives, then smartd will scan for both ATA and SCSI devices, and will monitor all possible SMART properties of any devices that are found.
DEVICESCAN may optionally be followed by any valid Directives, which will be applied to all devices that are found in the scan. For example
DEVICESCAN -m [email protected]
will scan for all devices, and then monitor them. It will send one
email warning per device for any problems that are found.
DEVICESCAN -d ata -m [email protected]
will do the same, but restricts the scan to ATA devices only.
DEVICESCAN -H -d ata -m [email protected]
will do the same, but only monitors the SMART health status of the
devices, (rather than the default -a, which monitors all SMART
properties).
Example 1: This script is for use with '-m ADDRESS -M exec PATH'. It appends the output of smartctl -a to the output of the smartd email warning message and sends it to ADDRESS.
#! /bin/bash # Save the email message (STDIN) to a file: cat > /root/msg # Append the output of smartctl -a to the message: /usr/sbin/smartctl -a -d $SMART_DEVICETYPE $SMARTD_DEVICE >> /root/msg # Now email the message to the user at address ADD: /bin/mail -s "$SMARTD_SUBJECT" $SMARTD_ADDRESS < /root/msg
Example 2: This script is for use with '-m <nomailer> -M exec PATH'. It warns all users about a disk problem, waits 30 seconds, and then powers down the machine.
#! /bin/bash # Warn all users of a problem wall 'Problem detected with disk: ' "$SMARTD_DEVICESTRING" wall 'Warning message from smartd is: ' "$SMARTD_MESSAGE" wall 'Shutting down machine in 30 seconds... ' # Wait half a minute sleep 30 # Power down the machine /sbin/shutdown -hf now
Some example scripts are distributed with the smartmontools package, in /usr/share/doc/smartmontools-5.36/examplescripts/.
Please note that these scripts typically run as root, so any files that they read/write should not be writable by ordinary users or reside in directories like /tmp that are writable by ordinary users and may expose your system to symlink attacks.
As previously described, if the scripts write to STDOUT or STDERR, this is interpreted as indicating that there was an internal error within the script, and a snippet of STDOUT/STDERR is logged to SYSLOG. The remainder is flushed.
'Device: /dev/hda, SMART Attribute: 194 Temperature_Celsius changed from 94 to 93'Note that in this message, the value given is the 'Normalized' not the 'Raw' Attribute value (the disk temperature in this case is about 22 Celsius). The '-R' and '-r' Directives modify this behavior, so that the information is printed with the Raw values as well, for example:
'Device: /dev/hda, SMART Attribute: 194 Temperature_Celsius changed from 94 [Raw 22] to 93 [Raw 23]'Here the Raw values are the actual disk temperatures in Celsius. The way in which the Raw values are printed, and the names under which the Attributes are reported, is governed by the various '-v Num,Description' Directives described previously.
Please see the smartctl manual page for further explanation of the differences between Normalized and Raw Attribute values.
smartd will make log entries at loglevel LOG_CRIT if a SMART Attribute has failed, for example:
'Device: /dev/hdc, Failed SMART Attribute: 5 Reallocated_Sector_Ct'
Under Solaris with the default /etc/syslog.conf configuration, messages below loglevel LOG_NOTICE will not be recorded. Hence all smartd messages with loglevel LOG_INFO will be lost. If you want to use the existing daemon facility to log all messages from smartd, you should change /etc/syslog.conf from:
...;daemon.notice;... /var/adm/messagesto read:
...;daemon.info;... /var/adm/messagesAlternatively, you can use a local facility to log messages: please see the smartd '-l' command-line option described above.
On Cygwin and Windows, the log messages are written to the event log or to a file. See documentation of the '-l FACILITY' option above for details.
On Windows, the following built-in commands can be used to control smartd, if running as a daemon:
'smartd status' - check status
'smartd stop' - stop smartd
'smartd reload' - reread config file
'smartd restart' - restart smartd
'smartd sigusr1' - check disks now
'smartd sigusr2' - toggle debug mode
On WinNT4/2000/XP, smartd can also be run as a Windows service:
The Cygwin Version of smartd can be run as a service via the cygrunsrv tool. The start-up script provides Cygwin-specific commands to install and remove the service:
/etc/rc.d/init.d/smartd install [options] /etc/rc.d/init.d/smartd removeThe service can be started and stopped by the start-up script as usual (see EXAMPLES above).
The Windows Version of smartd has buildin support for services:
'smartd install [options]' installs a service named "smartd" (display name "SmartD Service") using the command line '/installpath/smartd.exe --service [options]'.
'smartd remove' can later be used to remove the service entry from registry.
Upon startup, the smartd service changes the working directory to its own installation path. If smartd.conf and blat.exe are stored in this directory, no '-c' option and '-M exec' directive is needed.
The debug mode ('-d', '-q onecheck') does not work if smartd is running as service.
The service can be controlled as usual with Windows commands 'net' or 'sc' ('net start smartd', 'net stop smartd').
Pausing the service ('net pause smartd') sets the interval between disk checks ('-i N') to infinite.
Continuing the paused service ('net continue smartd') resets the interval and rereads the configuration file immediately (like SIGHUP):
Continuing a still running service ('net continue smartd' without preceding 'net pause smartd') does not reread configuration but checks disks immediately (like SIGUSR1).
When smartd makes log entries, these are time-stamped. The time stamps are in the computer's local time zone, which is generally set using either the environment variable 'TZ' or using a time-zone file such as /etc/localtime. You may wish to change the timezone while smartd is running (for example, if you carry a laptop to a new time-zone and don't reboot it). Due to a bug in the tzset(3) function of many unix standard C libraries, the time-zone stamps of smartd might not change. For some systems, smartd will work around this problem if the time-zone is set using /etc/localtime. The work-around fails if the time-zone is set using the 'TZ' variable (or a file that it points to).
Casper Dik (Solaris SCSI interface) Christian Franke (Windows interface and Cygwin package) Douglas Gilbert (SCSI subsystem) Guido Guenther (Autoconf/Automake packaging) Geoffrey Keating (Darwin ATA interface) Eduard Martinescu (FreeBSD interface) Frederic L. W. Meunier (Web site and Mailing list) Keiji Sawada (Solaris ATA interface) Sergey Svishchev (NetBSD interface) David Snyder and Sergey Svishchev (OpenBSD interface) Phil Williams (User interface and drive database)Many other individuals have made smaller contributions and corrections.
If you would like to understand better how SMART works, and what it does, a good place to start is with Sections 4.8 and 6.54 of the first volume of the 'AT Attachment with Packet Interface-7' (ATA/ATAPI-7) specification. This documents the SMART functionality which the smartmontools utilities provide access to. You can find Revision 4b of this document at http://www.t13.org/docs2004/d1532v1r4b-ATA-ATAPI-7.pdf . Earlier and later versions of this Specification are available from the T13 web site http://www.t13.org/ .
The functioning of SMART was originally defined by the SFF-8035i revision 2 and the SFF-8055i revision 1.4 specifications. These are publications of the Small Form Factors (SFF) Committee. Links to these documents may be found in the References section of the smartmontools home page at http://smartmontools.sourceforge.net/#references .
Закладки на сайте Проследить за страницей |
Created 1996-2024 by Maxim Chirkov Добавить, Поддержать, Вебмастеру |