reid.ai: Western Digital "green" drives Load_Cycle

Western Digital claim their "green" hard disk drives are very power efficient. One of the methods they use to reduce hdd power usage is to aggressively park or "unload" the hard disk's heads. This results in less friction on the disk platters and less power consumption.

About 2 years ago I bought 4 of these drives for a media server. Using linux software RAID5, I set them up as a 3TB RAID array running xfs:

root@orbit:~# for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Device\ Model; done;

Device Model: WDC WD10EACS-00C7B0

Device Model: WDC WD10EACS-00C7B0

Device Model: WDC WD10EACS-00C7B0

Device Model: WDC WD10EACS-00C7B0

Several months later a post drew my attention to the fact that some WD hard disks have been loading/unloading very frequently. A quick check of the SMART drive data using smartctl (sudo apt-get install smartmontools) revealed the issue:

root@orbit:/# for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Load_Cycle_Count; done;

193 Load_Cycle_Count        0x0032   120   120   000    Old_age   Always       -       240530

193 Load_Cycle_Count        0x0032   122   122   000    Old_age   Always       -       236124

193 Load_Cycle_Count        0x0032   119   119   000    Old_age   Always       -       244179

193 Load_Cycle_Count        0x0032   119   119   000    Old_age   Always       -       244882



root@orbit:/# for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Power_On_Hours; done;

9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       7306

9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       7306

9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       7306

9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       7306

These WD10EACS drives are rated at only 300,000 reloads in their lifetime. After only 10 months of use they had averaged 33 reloads per hour... and used 80% of their designed lifetime max!

The drives can be switched into a low-noise mode using hdparm's "Automatic Acoustic Management". This causes the drives to slow down their head movements making them quieter and potentially increasing life expectancy, while making them only slightly slower:

root@orbit:~# for dev in /dev/sd[a-d]; do hdparm -M128 $dev; done;

In this mode the only noise I can hear from the WD10EACS's is the actual clicking of the load cycles.

Over several weeks of testing and experimenting I was unable to halt the rapidly increasing load cycle count. I was slightly concerned; RAID5 arrays can only handle a single disk failure and are not very safe if multiple drives are likely to fail at any moment. In an attempt to allow the drives to remain in low-power state for longer I bought and migrated the Ubuntu OS to a 64GB SSD.

I checked the drives about 5 month later and was alarmed to find that the load cycle count had accelerated even more and was at nearly double the manufacturers design limit:

root@orbit:~# for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Load_Cycle_Count; done;

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       611032

193 Load_Cycle_Count        0x0032   002   002   000    Old_age   Always       -       595279

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       604626

193 Load_Cycle_Count        0x0032   003   003   000    Old_age   Always       -       593278

The drives were now averaging 84 reloads per hour. At this point I did more tests and more googling. One such test to confirm that no data was being written to the drives /dev/sd[a-d] and their parent RAID arrays /dev/md0 and /dev/md1 over a 60 second period:

root@orbit:~# iostat 60

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0

sdd               0.00         0.00         0.00          0          0

sde               6.90         0.93        96.00         56       5760

md0               0.00         0.00         0.00          0          0

md1               0.00         0.00         0.00          0          0

Here only the SSD /dev/sde was being accessed, the zeros (no data read/written) should have meant the drives stayed in low-power mode. Yet over a few minutes the drives still load cycled 13 times.

In this middle of this forum post I found the hint that identified the problem: querying the WD hard disks' SMART data brings the drives out of low-power mode. My proactive hard drive monitoring was in fact wearing the drives out faster! Since the initial load cycle issues were discovered I had started using hddtemp to monitor the hard drive temperatures via their SMART data and collectd to log the data into RRD databases.

To confirm hddtemp was causing the excessive load cycles:

root@orbit:~# /etc/init.d/hddtemp stop

* Stopping disk temperature monitoring daemon hddtemp                                                                  [ OK ]



root@orbit:~# tail /var/log/syslog Nov 24 22:43:31 orbit collectd[3032]: hddtemp plugin: connect (127.0.0.1, 7634) failed: Connection refused

Nov 24 22:43:31 orbit collectd[3032]: hddtemp plugin: Could not connect to daemon.

Nov 24 22:43:31 orbit collectd[3032]: read-function of plugin `hddtemp' failed. Will suspend it for 10 seconds.

Monitoring the load cycles confirmed this was part of the problem and the drives were now staying in low-power mode for longer. The only viable solution at this point was to disable the hard disk temperature monitoring.

Another 4 months later and the hard disks had definitely slowed down:

root@orbit:~/# for dev in /dev/sd[a-d]; do smartctl -a $dev | grep Load_Cycle_Count; done;

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       713744

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       682175

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       691961

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       675499

Calculating (713744-611032)/(14807-11688) = 33 reloads per hour. This is back at the rate they were advancing when they were first installed. Its better, but still a concern. All 4 drives are now about 2.5 times the manufacturers rating and appear to be working fine. Higher values have been reported in various forums, so I'm not panicking just yet.

Update June 2010:

To periodically monitor the load cycle, the counters can be output into CSV format:

root@orbit:~/# A=`date +\%s` && for dev in /dev/sd[a-d]; do A=$A,`smartctl -a $dev | grep Load_Cycle_Count | awk '{print $10}'`; done && echo $A

1277268248,816341,776081,786689,769881

root@orbit:~/# A=`date +\%s` && for dev in /dev/sd[a-d]; do A=$A,`smartctl -a $dev | grep Load_Cycle_Count | awk '{print $10}'`; done && echo $A

1277268315,816342,776082,786690,769882

The first number is the current unix time-stamp (change the '+%s' for a more human readable format eg. '+%F %R' gives '2010-06-23 16:10'). The next 4 numbers are the load cycle counts for sda[a-d]. To record the values once every hour on the hour I added the following line to the end of the /etc/crontab file.

00 * * * * root A=`date +\%s` && for dev in /dev/sd[a-d]; do A=$A,`smartctl -a $dev | grep Load_Cycle_Count | awk '{print $10}'`; done && echo $A >> /var/log/hdd.csv

Note the back-quoting of the date formatter. The CSV readings are appended to the file /var/log/hdd.csv every hour. Note that running this will take the drives out of low-power mode and increment the cycle counter... I can hear all 4 drives clicking when this script runs.

Update July 2010:

Further investigation and it turns out the swap partitions were causing the rest of the excessive load cycling:

root@orbit:~# cat /proc/swaps 

Filename                                Type            Size    Used    Priority

/dev/sda2                               partition       979960  229844   0

/dev/sdb2                               partition       979960  230644   0

/dev/sdc2                               partition       979960  234002   0

/dev/sdd2                               partition       979960  230882   0

These pictures below say it all. The server was being used lightly during this period:

The excessive load cycles were stopped by flushing the swap file:

swapoff -a && swapon -a

Relevant links:

Kerneltrap posting confirming WD drive issue.
Synology forum post confirming the WD10EACS drive is included.
WD's RE2GP utility knowledge base, patch.
Silentpcreview post suggesting hddtemp was causing the loads.

4 comments:

UnknownFriday, November 19, 2010 at 10:01:00 AM PST
Great article, I was having the same problem. I ended up using WDIdle3 on another computer to disable head parking.
Greg G.Sunday, December 4, 2011 at 12:34:00 PM PST
I agree with Brian. I just learned about load_cycle_count this morning and checked it on a drive I had been running for almost two years. The drive is a WD 2.5" that I use in my linux router machine (runs 24hrs a day) and it has always made the bzzz-tick noise. I RMA'd the first drive to newegg right after I bought it because of the noise, when I got another drive that did the same thing I figured it was normal (but very annoying). This morning, using WDIdle3 I was able to disable the bzzz-tick; which is, apparently, the noise that the drive makes when it parks or unparks the heads (load_cycle). I used smartctl to check my load_cycle_count and it is almost 3 million! My drive was set up for load-cycle at 800ms, and it performed an average of 185 load-cycles an hour over the last ~2 years! :( Still it appears to still be working and, hopefully, it will continue to work.
SimonTuesday, January 29, 2013 at 3:54:00 PM PST
I would love to know were you stand at the moment in the same place you are and have the drives on about 700k

This also might help http://forums.storagereview.com/index.php/topic/29253-newer-western-digital-hdd-head-parking-and-you/
RobTuesday, January 29, 2013 at 11:05:00 PM PST
Sure, 2.5 years later and these 4 WD drives are still running fine. For the last 2 years I have had the main Ubuntu OS running from a separate SSD, including swap.

Assuming these rarely used drives have been idling in low power mode for the last 2 years I was very surprised to see:

# for dev in /dev/sd[b-e]; do smartctl -a $dev | grep Load_Cyc | awk '{print $10}'; done;
2627999
2604655
2619279
2670247

About 2.6 million cycles each, or 8.6 times their rating! I now have the RAID5 array backed-up to an external 3TB drive.

reid.ai

Monday, March 15, 2010

Western Digital "green" drives Load_Cycle_Count on Linux

4 comments: