Sunday, 8 June 2014

How to read LCC of your hard drive

Many hard drives use power saving options like Head Parking (taking the arms off the platters) after a period of inactivity. Problem is, the number of head parkings of a hard drive are usually rated for about 300.000 cycles. In order to extend the life expectancy of your hard drive (and stop the clicking noise), you want to either increase the gap between the head parkings (from default: 8) to 300 seconds or to switch head parking off completely. First we need to learn how to read out the  LCC (Load Cycle Count) of a hard drive.

1 - download the program: S.M.A.R.T. Monitoring Tools
2 - install it
3 - run smartctl.exe
4 - type: smartctl d: -a (d: is the drive letter of my hard drive) to read the S.M.A.R.T. attributes
5 - look up the following line 193:

These are the attributes of my new WD30EFRX-68EUZN0 (manufactured: March 20 2014) which has only been running for a few hours.
At #193 "Load_Cycle_Count" you can see a "RAW_VALUE" of 7. It means the hard drive has parked its heads 7 times by now. If we divide the number of head parkings (LCC) by the number of hours the hard drive has been in operation (#9 Power_On_Hours), we get the number of head parkings per hour.

If your number of head parking is too high, use this method to modify the power profile of certain WD hard drives.

S self M monitoring A analysis R reporting T technology

ID# = is the attribute number, usually a decimal number between 1 and 255. The only ones you can count on seeing are: 1, 3, 4, 5, 7, 9, 10, 187, 190 or 194, 193, 195, 197, 198, and 199. 

ATTRIBUTE NAME =  The relatively standardized attribute name. There are a few that seem only used by a single manufacturer. 

FLAG =  Of no interest to us - ignore it. 

VALUE =  One of the most important values in the table. It is stored in a single byte on the drive for each SMART attribute, so its range is from 0 to 255. The values of 0, 254, and 255 are reserved for internal use, so you never see them. The value of 253 usually always means "Not Used Yet", so when you see it, you are probably looking at a brand new drive. Sometimes though, there can be a few attributes that take awhile before they are used, so may stay 253 for longer. VALUE is almost always used as a normalized scale of perfectly good to perfectly bad, usually starting at VALUE=100, then dropping toward a worst case of VALUE=1. You can generally think of it as representing a scale starting at 100% good, then slowly dropping until failure at some predetermined percentage number, in the THRESHOLD column.  Someone realized that if the values only run from 100 to 1, then they are wasting the possible values from 101 to 252, so some SMART programmers have decided to stretch the scale for certain attributes to start at 200 instead of 100, providing twice the data points. Unfortunately, which attributes are scaled from 200 to 1 is completely inconsistent, with almost all SMART reports showing some attributes starting at 100, and other attributes starting at 200. In general, you can think of 200-type scales as 100 times 2. The temperature attributes 190 and 194 are exceptions to the scaling. They are either temperatures or forms of the temperature, and they don't scale.  The error rate attributes 1 and 7 are also exceptions, although of a different kind. Raw read and seek errors are a natural part of normal operation, so even in a brand new and perfect drive, there is a factory-determined optimal rate of read and seek errors. They are nothing to worry about, they're the natural result of temperature expansion and other things, and they are used to help the drive constantly recalibrate itself. But because these error rates are non-zero, you essentially cannot have a perfect error rate of zero that you declare is a VALUE of 100. So manufacturers determine what an optimal error rate should be and call it 100. But often, drives may achieve an error rate (especially when they are new) that is even better than the optimal one set by the manufacturer, which results in an error rate that is HIGHER than 100.

WORST = The lowest VALUE ever recorded

THRESH = The manufacturer determined lowest value that WORST should be allowed to fall to, before reporting it as a FAILED quantity. Some are counters, some are informational such as temperature or hours use.

TYPE = Can either be Pre-fail or Old_age.

If it is Pre-fail, then the attribute is considered a critical attribute, one that participates in the overall SMART health assessment (PASSED/FAILED) of the drive. If the value of WORST falls below THRESH, then the drive FAILS the overall SMART health test, and complete failure may be imminent. The Pre-fail term means that if this attribute fails, then the drive is considered 'about to fail'. 

If it is Old_age, then the attribute is considered (for SMART purposes) a noncritical attribute, one that does not fail the drive. The Old_age term means that the attribute is related to normal aging, normal wear and tear of the drive. 

When new attributes are introduced, they may seem like a critical item, perhaps even with an appropriate THRESH set. But if they are marked as Old_age, then they do NOT fail the drive, even if WORST falls below THRESH. Naturally, this could be highly concerning, but there is no authoritative interpretation available, so no definitive conclusions can be made. These attributes should be considered Experimental

UPDATED = Supposedly, this is an indicator when the attribute is updated, Always or Offline. If Always, then it is assumed that the attribute is updated whenever a relevant event occurs. In other words, it is always 'live'. If Offline, then supposedly the attribute is only updated when offline tests are being performed. 

WHEN_FAILED = Usually and thankfully blank! If not blank, then it indicates the last operational hour (from attribute 9 Power_On_Hours) that this attribute failed. 

RAW_VALUE = A manufacturer controlled raw number, which may or may not be of interest to us.

1 Raw_Read_Error_Rate = This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors when attempting to read sectors, but as long as the number remains small, there is NO issue with the drive. Error correction information and retry mechanisms are in place to catch and fix these errors. Manufacturers therefore determine an optimal level of errors for each drive model, and set up an appropriate scale for monitoring the current error rate. For example, if 3 errors per 1000 read operations seems near perfect to the manufacturer, then an error rate of 3 per 1000 ops might be set to an attribute VALUE of 100. Please completely ignore the RAW_VALUE, as it is not meaningful as a decimal number.

3 Spin-Up Time = Average time of spindle spin up from zero RPM to fully operational [milliseconds]. 

4 Start/Stop Count = A tally of spindle start/stop cycles. The spindle turns on, and hence the count is increased, both when the hard disk is turned on after having before been turned entirely off (disconnected from power source) and when the hard disk returns from having previously been put to sleep mode.

5 Reallocated Sectors Count = When the hard drive finds a read/write/verification error, it marks that sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping, and reallocated sectors are called "remaps". The raw value normally represents a count of the bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This allows a drive with bad sectors to continue operation; however, a drive which has had any reallocations at all is significantly more likely to fail in the near future.While primarily used as a metric of the life expectancy of the drive, this number also affects performance. As the count of reallocated sectors increases, the read/write speed tends to become worse because the drive head is forced to seek to the reserved area whenever a remap is accessed.

7  Seek Error Rate = Rate of seek errors of the magnetic heads. If there is a partial failure in the mechanical positioning system, then seek errors will arise. Such a failure may be due to numerous factors, such as damage to a servo, or thermal widening of the hard disk. The raw value has different structure for different vendors and is often not meaningful as a decimal number.

9 Power-On Hours = Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state.

10  Spin Retry Count = Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.

11 Calibration_Retry_Count = This attribute indicates the count that recalibration was requested (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.

12 Power Cycle Count = This attribute indicates the count of full hard disk power on/off cycles.

192  Power-off Retract Count = Count of times the heads are loaded off the media. Heads can be unloaded without actually powering off.

193  Load_Cycle_CountCount of load/unload cycles into head landing zone position.Western Digital rates their VelociRaptor drives for 600,000 load/unload cycles,and WD Green drives for 300,000 cycles; 1the latter ones are designed to unload heads often to conserve power. Some laptop drives and "green power" desktop drives are programmed to unload the heads whenever there has not been any activity for a very short period of time, such as about five seconds. Many Linux installations write to the file system a few times a minute in the background. As a result, there may be 100 or more load cycles per hour, and the load cycle rating may be exceeded in less than a year.

194 Temperature_Celsius = Current internal temperature.

196  Reallocation_Event_Count = Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area. Both successful & unsuccessful attempts are counted.

197 Current_Pending_Sector = Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.[33] However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors.

198 Offline_Uncorrectable =  The total count of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.

199 UltraDMA_CRC_Error = The count of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check).

200 Multi-Zone_Error_Rate = The count of errors found when writing a sector. The higher the value, the worse the disk's mechanical condition is.


No comments :

Post a Comment