De-dup Q & A
Monday, March 17th, 2008The InfoStor posting Data de-duplication: Questions and answers
Eight questions that every IT organization should ask about data de-duplication before they deploy or upgrade.
Just what is data de-duplication?
Data de-duplication is arguably one of the most important new technologies to hit the storage market in years, and it’s a game-changing technology that can have an immediate impact on end-user environments.
By reducing the amount of physical disk capacity that is needed to store information, data de-duplication allows organizations to keep more information on disk-based systems-making it more accessible to the people and applications that need it.
What data de-duplication ratios can I expect? Survey says

De-duplication trade-offs
Currently, there are two distinct types of data de-duplication available: inline and post-process. Which is which can be determined by the answer to the following simple question: When is backup data de-duped? If it’s done before it is written to the target, then it is inline de-duplication. If it’s done after, then it is post-process.
There can be some performance degradation with inline de-duplication approaches as data is being ingested, and there is an up-front capacity consideration with post-process approaches. The performance impact of the inline approach depends on a number of variables, including the de-duplication technology itself, the size of the backup volume, the granularity of the de-duplication process, the aggregate throughput of the architecture, and the scalability of the solution. Some inline functions occur at the server, some as a “bump in the wire,” but most take place at the target itself.
With the post-process approach, more disk capacity is needed up-front to store the backup volume. But the size of this capacity reserve also depends on a number of variables, including the amount of data being backed up and how long the data de-duplication technology needs to hold onto the capacity before releasing it. Solutions that wait for the entire backup process to complete before releasing capacity have a greater “capacity overhead” than solutions that start the de-duplication process earlier as backup data is being stored.
Read the article for the rest of the Qs and As.
…John

