(Topic ID: 164195)

OT, anyone really good with computers?

By balboarules

7 years ago


Topic Heartbeat

Topic Stats

  • 48 posts
  • 28 Pinsiders participating
  • Latest reply 7 years ago by Tickerguy
  • Topic is favorited by 1 Pinsider

You

Linked Games

No games have been linked to this topic.

    You're currently viewing posts by Pinsider Tickerguy.
    Click here to go back to viewing the entire thread.

    #44 7 years ago

    Avast is arguably the best "free" AV program out there with minimal nag screen intrusion (but not zero.) Norton is not only trash it's harmful. Get that crap away from any computer you care about.

    NEVER, EVER use Combofix unless you (1) know what you're doing and (2) are desperate. It *can* fix some nasty problems but it can also trash your system's integrity and if it does the latter there's no coming back. It's a last-ditch attempt to save you from a reload. Malwarebytes is *much* safer.

    Never, ever trust your data if it's only in one place. Devise, *test* (to make sure it works) and use a backup system of some sort. If you care about your information at all that system must include some sort of off-site storage on some sort of basis (e.g. rotating physical media, etc.) I've got data on my systems from the 1980s and I've yet to lose it despite having suffered multiple malfunctions over the years, and this is why. Either do this right or you *will* lose something you care about eventually.

    An SSD is an excellent option and should be first on virtually anyone's list, but NEVER degrag them. They gain zero from doing it and it costs you material percentages of the drive's life. In addition TRIM has to be on, and it should be if you do the move to it properly (or a new install onto it.)

    Be aware that "consumer" SSDs (that's most of them) are NOT power-protected. If you lose power unexpectedly, and that includes hitting the switch without shutting down first, they can be (read that as "usually will be") corrupted silently, *including* data that was not being written at the time. The reasons for this are a bit complex but have to do with the fact that to write a block that once held data you must read/erase/write, and the units in which this is done are MUCH bigger than what you usually want to write (typically 4MB or more.) This means that data that was not being written has to be read and then re-written, so "static, at-rest" data is ALWAYS at risk when a write is being done.

    But.... the Intel 730 series SSDs, which are reasonably-priced, DO include power protection -- and I KNOW it works because I have software that can load-test it "in anger" and have verified that it in fact loses nothing on a cord-pull. IMHO they're worth the (small) extra scratch and the (small) loss in performance over the fastest SSDs for this reason. They are not only suitable for home purposes but also are suitable for light-duty applications in professional use (e.g. data centers and similar) for this reason -- and as such I won't buy anything lesser even for a personal, desktop machine. The 240Gb model is about $150 on Amazon, which is a very nice price for a fast and *safe* device. They also have a 480gb model for about $250.

    1 week later
    #46 7 years ago
    Quoted from markmon:

    @Tickerguy: the ssd data should only be at risk as the drive fills up. The os should use free sectors before doing a read-erase-write cycle. Since its so much faster to avoid the erase.

    This is NOT true and assuming it is will leave you crying sooner rather than later.

    First, the OS doesn't have any idea what the drive is doing behind its back. It used to be that a block #12345 was easily mapped by cylinder/head/sector to a given location on a disk. This stopped being necessarily true for spinning rust in some cases but it was never true for any solid-state device.

    NAND flash has a certain number of erase/write cycles that it can take before it fails to be reliable. For this reason the drive "levels" writes; it intentionally will write to a different place each time you change a given individual block and then update its internal mapping tables. If you change data in units smaller than the drive's allocation size (which is MUCH larger than your filesystem allocation size) this forces what is called "write amplification"; that is, the impact on drive life of a 5-byte write is exactly the same as that of a 4MB write, since blocks can only be erased (and thus re-written) in 4MB chunks. The reason for this restriction is that NAND flash can only be written to "0"; you cannot write a "1". When you erase a block it is set to all "1"s and you physically write the "0"s.

    That in turn means that the premise that you're "safe" until the drive gets close to being full is always flat-out false. In fact read/relocate/write cycles start happening when only a small percentage of the SSD has been allocated due to the above amplification issue and the fact that directory updates happen very frequently (access and modification times on files, for example) yet those writes are tiny in size.

    For this reason *all* writes are dangerous on a SSD drive if the drive is not power-fail protected, and because the drive has no knowledge of filesystem organization it can make no attempt to aggregate or disaggregate different types of data (e.g. file data itself, file-level metadata such as directory entries and filesystem metadata such as superblocks, block free lists, etc.) The drive concerns itself with leveling the wear that its NAND takes, nothing more or less, since access to one block of NAND for read is just as fast as access to any other (there's no physical head to move or latency of rotation to wait for.) But since the drive had no idea what a given 512-byte write was (directory entry, filesystem metadata or file data itself) and the drive is concerned about wear leveling rather than having block X be physically next to block X +1 (or X -1) what else is on that same drive-level allocation block could literally be anything.

    If power is lost when you are updating a block in a file and the drive is not power-fail protected this means that you may lose a material chunk of *unrelated* file data, an *unrelated* directory entry (!!) or even worse, *filesystem superblock data* rendering the entire filesystem unusable. The worst part of this is that the operating system has no idea it happened, and it will only be discovered later when you try to access that which is either corrupted or just flat-out gone.

    That firms sell drives with this design "feature" is an outrage, but they do, and what's particularly dangerous about it is that simply holding the power button on a laptop or desktop machine that appears to have hung (forcing an ACPI shutdown) is enough to screw you without any way to know that it happened. Depending on exactly what sort of corruption occurs the consequences can be hidden for a very long time; for example, a block (or worse, LOTS of blocks, potentially millions of filesystem *blocks*, not bytes!) that are actually allocated from the OS perspective may not be marked as such in the filesystem's allocation tables on the disk. If that happens the system may blithely scribble all over existing file data as time goes on from that point forward and you are unlikely to discover the destruction of your stored information for weeks, months or even years.

    Some SSDs advertise "partial" power-fail protection. Those devices hold enough stored energy to make sure their internal mapping tables can be written if the power goes off unexpectedly, but not enough to flush file data. Such drives are still unsafe; they protect against a catastrophic "lose everything on the drive" failure in the event of an unanticipated power loss but not against silent filesystem corruption.

    Even block-level checksummed filesystems (e.g. ZFS on Unix machines) do not defend against this sort of problem; they are more-likely to detect it before other, non-block-level checksummed filesystems with reasonable proximity in time to the destruction itself but that doesn't help you if the data has been destroyed. A parity-style RAID arrangement with multiple physical volumes *might* save you in that its unlikely the same file or metadata blocks will be damaged on more than one physical device at the same time, but few people use that sort of arrangement on personal systems and relying on it to detect and rebuild from such a failure is foolish.

    Buy SSDs that have full power-fail protection or make damn sure you have multi-level backups that you can restore from even if the destruction of your data is not detected for days, weeks or even months after it occurs -- and that you're willing to suffer the rollback to the previous stable state, which could be days, weeks or months in the past, if it happens.

    #48 7 years ago

    Nope, because the drive cannot erase a single physical/logical sector. NAND is only erasable in (much) larger drive allocation blocks, typically 4MB in size (expect this to get much larger as capacities go up, by the way, because it is a function of how the die is constructed and as density goes up it becomes easier to make the erase block size larger too.)

    This means that any time you write data in less than the drive's internal erase block size you are inherently reading and rewriting unrelated information that is not in the file you have open for writing. Said data is likely "at rest" and may have been at rest for a very long time. The drive has no way to know what it is; it might be directory information, a file's blocks, the filesystem free space bitmap or something else.

    In addition the drive has a mapping table between physical (on-disk) allocation blocks and offsets and logical (as seen by the OS) sectors. That has to be updated too. If the power goes off with the mapping table and data on the drive being incongruent you're screwed. Remember that the *mapping table* is subject to the same problem; it too is on NAND flash and can only be rewritten in 4MB chunks! This set of interdependencies mean that it is entirely possible for the mapping table to be damaged during an update (e.g. the power goes off while it's being written!) resulting in the destruction of huge amounts of data -- quite possibly (and frequently) including everything on the device.

    Finally, the drives lie. They tell the OS that an operation is complete when the data has been changed in on-drive RAM, *not* when it has been committed and all the metadata updates in the drive are complete. fsync() is supposed to not return until the drive has committed everything. Spinning rust drives sometimes honor this and sometimes don't; SSDs almost-universally do *not* flush all their buffer memory including their internal mapping table to a consistent state before returning "complete" under this circumstance.

    There's a little program running around called "diskchecker.pl" (it's a perl script) that will test all this on a pair of machines. The writer process runs on a machine that you leave powered; the I/O process runs on the machine you cord-pull. You start both and then while the writer is writing you yank the power cord on the I/O machine, then reboot it and restart the program on that box.

    When it comes back up the writer process gets the restart notification. It knows what the I/O machine *says* the drive had committed (because the I/O process had received a "complete" from the drive and passed it back to the writer) and therefore supposedly was complete; it then goes back, starting at the beginning of what it wrote, and verifies that every byte it wrote and got confirmation on is actually there.

    Nearly *all* SSDs fail this test, and most fail it dramatically with data corruption *far* from where the I/O was when the cord was yanked. If you can run this thing a half-dozen times and see no corruption you're odds-on to be ok. If you can run it a hundred times you can be very confident. Most SSDs will fail on the very first attempt and a good part of the time they won't even come back up with a coherent filesystem on them after the cord is replugged.

    The Intel 730s are pretty-much the only "consumer" drives that I've seen pass this test. Their S3500/3700 series pass as well, but those are a LOT more expensive and are marketed as data center devices.

    If you use an SSD as an "operating system and program" device only, have no personal and irretrievable data on it (that is all on a server somewhere, etc) and thus don't care if it gets corrupted because you can simply reload it then consumer-style SSDs are fine. Most people, however, keep a LOT of personal and irreplaceable data on their system and do not segregate it off on either a robustly-backed up file server or have some other system devised to prevent them from being hosed by a corrupted boot volume. If you're one of those "most" then using "consumer" style SSDs is literally playing with a device that may self-destruct your data without warning.

    BTW if you think the machine being on a UPS makes it "safe" you're wrong. This is what one of my production systems, which is on a UPS and has *never* taken an unclean power loss (the UPS notifies it when the battery gets low and it does a controlled shutdown in that instance) says when I ask the drive about its history when it comes to unsafe shutdowns:

    Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
    .....
    ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE

    174 Unsafe_Shutdown_Count -O--CK 100 100 000 - 19

    Now that doesn't mean that I would have gotten screwed 19 times had it not had power protection, but it does mean that I *might* have, and this is on a system *with* 100% UPS coverage that has never failed during the time that drive has been installed. However, it has been shut down for maintenance and such and during those shutdowns 19 times the drive had power removed from it before it had managed to commit everything to stable storage and verify that it was there. This is what the drive firmware itself tells me, not what the operating system believes.

    This is why I buy SSDs with functional power protection and if a manufacturer claims their drives have it I *verify* that claim before I trust them.

    You're currently viewing posts by Pinsider Tickerguy.
    Click here to go back to viewing the entire thread.

    Reply

    Wanna join the discussion? Please sign in to reply to this topic.

    Hey there! Welcome to Pinside!

    Donate to Pinside

    Great to see you're enjoying Pinside! Did you know Pinside is able to run without any 3rd-party banners or ads, thanks to the support from our visitors? Please consider a donation to Pinside and get anext to your username to show for it! Or better yet, subscribe to Pinside+!


    This page was printed from https://pinside.com/pinball/forum/topic/ot-anyone-really-good-with-computers?tu=Tickerguy and we tried optimising it for printing. Some page elements may have been deliberately hidden.

    Scan the QR code on the left to jump to the URL this document was printed from.