Wednesday, February 18, 2015

Data Domain Compression

I currently manage 7 Data Domain's (890's and 670's) and none of them are seeing compression above 5x. We obviously need to do some cleanup to get rid of data that is a bad candidate for dedupe, but the question is "where to start?" Have any of you successfully increased you dedupe compression through cleanup? If so what steps did you take?

6 comments:

  1. Hi Chad,

    I can understand that you want a higher dedupe rate as also marketing from EMC says you can have up to 50! dedup rate. From my experience is that dedupe rates above 10 are very rare, and if you have around 8-9 you are already very good.
    But besides this, the question is what does the dedup rate stand for? 1:5 for example says that you store only onefifth of the data. In percentage this means that you store only 20%. So this means that you already save 80% of space compared if you would store the data not deduped. When we look into 1:10 dedup rate. This stands for storing only onetenths - in percentage storing only 10% of the data.
    So the savings from 1:5 to 1:10 or then even to 1:50 is getting smaller and smaller.
    Now the question is if it is really worth to squeeze out another 10% of dedup rate, if you have to limit your data to only a certain type of data or do some special sorting, just that the dedupe rate looks better.
    IMHO when you reach a dedup rate around 5 it is good, the rest is then a plus.

    Cheers,
    Wolfgang

    ReplyDelete
  2. You are correct, a 5:1 ratio is good. But when EMC sales people claim you will get 10:1 ratios and you actually get is less than forecasted issues arise. Especially since your architects, who don't confer or listen to those that actually have experience with the product, purchase based off the higher compression ratio. This leaves you scrambling to buy more disk when you run out of space more quickly than was predicted.

    ReplyDelete
  3. Great topic!

    We too are struggling with poor dedupe rates on our DD890. Although the majority of data stored is composed of Exchange and DB2, we have tried to get rid of other type of data. Still the rates remain poor.

    ReplyDelete
  4. Chad,
    With Falconstor VTL, I am getting a range of deduplication numbers: 7:1 -9:1 on the entire solution. I search through the admin GUI find a virtual tape volume and find something that is dedup'ing at 1:1 (not at all) then in the dsmadmc I run : q content
    I find the TSM node and contents like .tif .jpg videos or MS SQL file back-ups with a .dat file extension. Then I move that TSM node into a domain that stores on physical tape and move its data off of the VTL onto tape.
    Anonymous Many of my virtual tapes that store Exchange back-ups do not deduplicate.
    to compensate we do only 37 days of back-ups

    ReplyDelete
  5. SANAdminNewbie

    The problem is that there are no tools I know of that can identify the "bad' files for dedupe and help you setup appropriate rules to tell TSM to send them to alternate "cheap disk" or tape. So I would have to do what you are doing on a client by client basis and it becomes a time killer. I wonder if Exchange is a bad candidate for dedupe due to server side compression or due to the attachments its maintaining?

    ReplyDelete
  6. Chad,

    We use a DD990 DD appliance . We get 40:1 dedup for EPIC database data which is not compressed .

    Below 4:1 for data which are compressed . DB files which are copied and compressed which TSM picks up.

    ReplyDelete