Disk Drive Considerations: Buying a new system or upgrading? Here are some new insights into a "commodity product" most of us (even ebay technical directors?) thought we understood .,. well at least the physicists can probably explain it .,.
Last Updated 11/25/06
Disk storage is often overlooked and might not be that well understood. [1] From an Architectural, Computer Science Engineering and/or Materials Science perspective things can get complicated (choke - probably a big understatement). Luckily there are some simple things any ebay-er can do, perhaps even ebay's operations and engineering staff. [2][5] OR (Alternatively) ask a staff physicist to explain whats behind Disk Drive OEM[8] thin film platter patents and quantum magnetic 1/2 spin (yikes!!!). Simply put, Entropy is not our friend .,.
Heat is a big enemy of disk drives. Every Disk Drive needs at
least one fan blowing directly on it and should never become hot to the
touch. This can even be done for a laptop by using a three fan
USB cooling plate underneath (Search for "Laptop Cooling Pad",
"Notebook Cooler Pad" or my favorite "Super Quiet 3Fan Notebook
Cooler"). Best Laptop/Notebook Money you will ever spend.
You will need a good AC power supply and/or batteries. Also if
you have to run a USB hub make sure it supplies 5volts, some of the
newer ones (Adaptec 1:4) output 3volts and thats just not enough to
drive the fans).
Most of us know an 8MB cache is faster than a 2MB one and that a 10000 rpm drive is faster than a 5400 rpm drive. However, bigger and/or faster drives tend to put out more heat such that cooling becomes a very big issue when data is consistently moved on or off the drive for a period longer than several minutes.
Disk drives are most likely to go bad during their first six months of (or after about five years) service, so buying new drives can create a lot of trouble if used before their probation burn in period is up (plan on running systems in parallel for six months if possible). The consequences of ignoring this can be losing everything. But take heart; there are ways to recover most data even from this type of failure (TestDisk can fix Partition/MFT Corruption to some extent, as can a number of other tools).
For example, my last “lightly used” drive off ebay had been in service for only a month, this may not be as good a thing as one might think. As it turns out 3-6 months of use is much more reliable and is what normally comes back from the OEM when a drive goes bad under warranty.
Large Sites can recycle OEM drives (with 3 year warranties). Not a bad idea but probably not an optimal one for most serious ebay-ers, small businesses and professionals. Most of us just want them to work without unpleasant surprises including loss of performance (like cpu speed cut because of excess disk drive heat).
The type of system selected makes a big difference, All non-server Microsoft Systems are known to be prone to things like not reporting disk errors, dysfunctional chkdsk behavior and other strange bit rot issues. Almost all other Operating Systems (Unix, Linux, Mac OS/X and other variants) do not have this problem.
Planning disk drive needs and layouts is usually educated guess work. Typical Desktops need 20Gb Bootable Partitions and 40Gb Data Partitions with the expectation that their nominal usage should be about 50% full. Around 85% full can be a bad thing with windows because corruption is much more likely since defrag and chkdsk will not run. About 10Gb of hard disk data will fit nicely on a DVD with light (but still fast) compression.Backing up a 400Gb data drive may best be done to another (if not cheaper) drive (or drives) that might be slower, older and have had more than it's fair share of sector re-allocations and disk wipes. The back up drive(s) can even be stored off-line or near-line. Off-line disk failure is a possible so a second disk or tape drive may be a good idea for really important data (redundancy is our friend .,.).
A new system with an internal 16x DVD can probably backup 100GB data partitions in a reasonable amount of time across about 5 optical disks (assuming 50% usage and data that compresses well). As partition size grows there is an increased risk that an error will occur on the Hard Disk or even on the DVD set (unless it is archival which can be as expensive as the Hard Drive). Backing up important data redundantly is as good a practice now as it was twenty years ago.
Redundancy is our friend.
Using a single IDE or SATA drive (or even SCSI) is reckless since (in the advent of a disk failure) the system will be useless until a second physical drive is installed and booted from. If there is no image backup of the down drive to restore, everything on the new drive (including the OS) will need to be re-installed. If two drives are being used and the boot drive fails then booting from the second one is all that is needed to have the system working again.
Drives without a fan blowing directly on them have been known to fail in short order and fail at a higher rates over time. In geek-speak this means more sector re-allocation as a minimum and more "clicks of death". Drives should never become hot to the touch. Having two or more bootable physical drives that are running hot can be worse than just one because twice the heat is confined in the same space.
Bootable and Data partitions need to be backed up to a secondary drive about once a month (or sooner if important data as been added but not backed up). Trying to use non-server Microsoft software to do this is risky since it has never worked (nor probably ever will). SCSI drives have better error reporting but are more expensive than IDE (PATA) and SATA. [3] Both IDE and SATA can be used with a RAID PCI HBA card and be Mirrored. RAID, however safer, is probably going to cost as much or more than a SCSI setup.
Fiber Channel are even better in terms of performance and
reliability but are very expensive and out of scope here. These
HBAs are a typically part of large HPC (High Performance Computing)
installations. Some favorites are Emulex, Mylex, Infiniband and
others. Not for the typical ebay-er, but understanding that CPU
offloading is very important when performing high speed/high volume DMA
and/or DISK data transfer can be useful.
Unfortunately Controller Cards (or HBAs) with CPU offloading tend to be a lot more expensive than those that do not. [4]
A welcome side effect the DiskWipe procedure is that it may increase performance. This is the lowest level of maintenance recommended (perhaps with the exclusion of Disk and/or Partition Table Recovery and Forensics or factory low level prep). Some specific examples are mentioned here but thats not to say there are many other options that will work just as well.
Quick Formatting has its place along with other utilities and tools like FreeDOS, TestDisk and Bootmanagers. FreeDOS is mentioned for two reasons (and only as an example):
1.) Many Microsoft DOS versions are known to be defective as boot
disks
2.) FreeDOS is known to work Open Source, and hosts
- TestDisk (Disk/Partition recovery program)
- Ghost (Fast Partition to Partition and Achieving Program, 10 Minutes or faster for 10Gb of data)
- Gdisk (Low Level (dod selectable)) DISKWIPE program and MBR manager that comes with Ghost.
After a DiskWipe is performed on 100% of the disk,
destroying all information, the disk may need to be prepared for
formatting by the OS. In Some Cases Only OEM utilities (and/or
developer libraries) that are drive specific can do this however,
anything not certified for that purpose and that specific hardware can
destroy the disk.
Once a disk is Wiped, OEM prepared for file systems, Disk Signature written and Formatted by the OS, Spinrite can be used to scan for and remove all sectors that might go bad in the immediate future (if OEM diagnostics are not available to do the job), Subsequent to this a quick check by SMART[6] to see if the sector reallocation count has changed is a good idea (if so it is not ready for prime time and may need further exercising and scanning or even need to be replaced).
Some Drives (Travelstar) may not need pre-formatting initialization after low level wiping. The Format option in Disk Manager will be grayed out so right click on the red dot and select Initialize Disk to write a Disk Signature. Now you can right click in the newly Windows initialized disk to format it.
Using a Long Format (not quick one) is a good idea because
it covers the entire disk and works in the background while windows
and/or some other operating system is running from another
drive... Again, use SMART[6] to check for any more sector
reallocation.
Any time reallocation of sectors start to occur in significant
amounts its a good idea to consider disk wiping, long formatting and
media scanning (perhaps with spinrite). However, once the drive exceeds
the maximum recommended SMART limit (or over 4-5 years of heat dependant spin time) it
might be a good idea to consider replacing the drive and/or contacting
the OEM and asking if it might have reached its Useful End Off Life.
When a drive is going bad quickly it will (except for non-server
Microsoft Systems) be reported in the System Error Log which should
always be monitored on a regular basis. If under warranty it can
be sent back, but only if you have two or more drives.
Since TWO DRIVES ARE THE
RECOMMENDED MINIMUM it might make sense to update the DVD
backups and Diskwipe the bad drive after doing a complete low level
media scan (with Spinrite). Removing Bad Sectors from the good
sector pool should be done as soon as possible. These bad
magnetic domains should never be used again.
By doing Media Scanning and Disk Wipe maintenance you will have your
second drive available in a day or two instead of waiting for it to
show up in the mail. It also makes to do this if the drive
already been spinning for six months but less than half of its warranty
period. If the drive is sent back it is a good time to think
about adding more fans.
Enough good things cannot be said for Defragging and CheckDisk, except under non-server windows, if they are run frequently (as they should) there is the likely hood the machine will experience an Under or Over Voltage condition (in the US because of storms, heat waves and sub-standard equipment). If chkdsk or defrag are running while this happens the target disk will most likely be made unusable. Drive data can be restored (using TestDisk) but an OS restored in this manner should not be trusted just as any OS compromised by a Virus or exhibiting strange behavior should have its disk wiped 100%
It is always better to be safe than sorry. This Wiping the Disk Drive every year or so should allow use of Magnetic Domains (or now somewhat ill termed grains) to be formed on areas that are still performing well after a year's worth of service (perhaps under less than ideal conditions). Varying high frequency flux is known to hasten Magnetic Entropy in thin films. The certified OEM utilities should be able to test the suitability of on the metal magnetic domains and their GMR directly before preparing the platters to accept the higher level of formatting needed by Unix/Linux and Windows file systems.
There is something in the industry referred to as "Clearing a format degraded condition" which is more like discarding thin film areas on the drive that for what ever reason (perhaps some of the the same things that contribute to poor chip yields) have to many electrons that can no longer on que spin in the correct direction.[7]
Other types of Maintenance for Disk Drives are:
- Dust blown out of (fans, box, etc.) once a year (its conductive and can clog the disk drive air filter, perhaps resulting in a head crash).
- Replace the CMOS battery with a Lithium one, mark with a sharpie the month and year it was installed, if the battery dies during a chkdsk the disk may get be trashed
- Boot into a FreeDOS diskette and check all Disk Drive cables by gently wiggling them (only while in FreeDOS). If you hear a clicking sound the cable is probably bad. Avoid strain when possible and dump any ribbon cable (IDE/PATA) for the round type.
- Use SMART to monitor any and all errors. Small increases in errors over the years are not cause for alarm. But a sudden increase in errors may warrant contacting the factory for more information or (if max values are exceeded) perhaps replacing the drive early. For laptops there is a metric for bumps that have caused a re-seek, perhaps the laptop drives have an internal accelerometer.
Notes
[1] Pesky sub-atomic electron-spin-1/2 (sputtered
materials modeled with quantum physics and other nano/pico meter scale)
details with which the thin-film platters of today (starting around
20GB and up) would not be possible. Uncertainty from gamma rays, unstable solid state quantum scale structures, electical storms that only Tesla would like (among other things) can make undetectded/uncorrected/unreported bit errors a system safey hazard.
[2] An ideal HPC storage site would be well advised to use
drives that can have the electronics separated from the motor drive
housing by a few inches so that the maximum amount of surface area
would be exposed to the forced air cooling fans. Stable
Temperature, Humidity, RF noise, transport (LVDS) are some critical considerations.
[3] One advantage SATA BIOS has over IDE BIOS is that if
Physical Drive 0 blows up the box does not have to be opened and a
another drive placed in the master position on the cable. This assumes
the IDE drives are auto-detect jumpered, the user knows how to specify
a PCI scan on boot and a preferred boot drive in the BIOS (this is
actually pretty easy these days)..
More information will hopefully be made available later but would need to be in another guide since this one is about to exceed its 20K character limit. Begin Sidebar [eBay is really tight when it comes to storage space, For more see source. ] End Sidebar
Besides creating Disk Image Backups on a regular basis (to
another HDD and DVD) SMART[6] monitoring is a required (utilities make
this easy). Running SPRINRITE and Disk Wiping (including the MBR, MPT
and MFT) once a year (with the subsequent Low and High level format) might be discussed further (in another guide).
[4] The Microsoft 2003 Scalable Network Pack (TCP Chimney Offload, NetDMA) seems to be the latest effort to maintain parity with Unix/Linux in terms of HPC capabilities.
[5] Disk Drive
"Partner or System Integrator" OEM specific utilities in the right
hands can be very useful to large installations and/or obtaining high
performance/reliability. About 75% of Disk Drives sent back to
the
OEM 's are re-deployed. Wiping, Low Level Initializing, Writing a
new Disk Signature and Formatting will help keep most drives healthy
(and statistically dispatch Pareto's red x[9]).
[6] Self Monitoring Analysis & Reporting Technology
[7] Several magnetic failures (from heat, excessive polarity reversals, poorly understood quantun shotgun failure distributions in the worst possible places) can render directories operating systems and drives inoperable while the actual stored data is fine.
Many utilities and diagnosis systems cannot determine these extremely random and intermittent failures. The fix is to wipe the disk totally clean, re-initialize and re-format the drive allowing questionable magnetic domains to be permanently marked as bad and basically removed from the system.
Other failures may Include (but are not limited to) incorrect operation during shutdown procedures, power failure or spikes, under voltage, poor ground, excessive media failure and/or fragmentation, interrupted checkdisk or defrag procedure, disk full and random write corruption, poor quality software, misconfiguration, atmospheric and solar storms and last but not least, excessive heat.
[8] Hitachi, Western Digital, Maxtor, etc.
[9] 80-20 rule, 80% of the problems are cause by 20% of the defects, if we only have 5 defects then fixing the really bad one will dispatch 80% of the heart burn from the infamous "Degraded Format".
[10] eBay Guides with related Information that might be helpful:
Thank you for voting. If your vote meets our 