Binary Diff/patch For Large Files On Linux?
Answer :
You should probably take a look at the rsync-related tools: rdiff and rdiff-backup. The rdiff
command lets you produce a patch file and apply it to some other file.
The rdiff-backup
command uses this approach to deal with entire directories, but I'm guessing you're working with single-file disk images, so rdiff
will be the one to use.
xdelta can do everything you want. Fair warning though, if your images aren't very similar, you can end up with a very large patch, because xdelta uses half of the defined memory buffer for finding differences. More information is available at the TuningMemoryBudget wiki page. Increasing the buffer size may help out quite a bit.
bsdiff is another option, but it's very RAM hungry and completely inappropriate for anything the size of a disk image.
bsdiff is quite memory-hungry. It requires
max(17*n,9*n+m)+O(1)
bytes of memory, wheren
is the size of the old file andm
is the size of the new file. bspatch requiresn+m+O(1)
bytes.
Canonical Answer
Regarding rdiff the post, librsync 2.0.1 is a good read for the command functionality clarification so I've referenced that below to preserve the content to this answer if nothing else.
It's important to try to get a good understanding of the rdiff three steps to updating a file: signature, delta, and patch as talked about on the rdiff man page. I've also found an rdiff
command example script on GitHub that's helpful which I'll reference and quote.
Essentially...
- With a "starting" or base file [
file1
] and you create a signature file from it
- This is usually much smaller than the base/original file itself
- With the signature file you compare it against another file [
file2
] similar to your base file but different (e.g. recently updated) and create a delta file containing just the differences between the two files- Use the "differences only" or delta file and compare it with your base file [
file1
] to generate a new file containing the changes from the other file [file2
] matching the two.
Quick Commands (per rdiff-example.sh
)
rdiff signature file1 signature-file ## signature base file1 rdiff delta signature-file file2 delta-file ## delta differences file2 rdiff patch file1 delta-file gen-file ## compare delta to file1 to create matching file2
rdiff-example.sh
# $ rdiff --help # Usage: rdiff [OPTIONS] signature [BASIS [SIGNATURE]] # [OPTIONS] delta SIGNATURE [NEWFILE [DELTA]] # [OPTIONS] patch BASIS [DELTA [NEWFILE]] # Options: # -v, --verbose Trace internal processing # -V, --version Show program version # -?, --help Show this help message # -s, --statistics Show performance statistics # Delta-encoding options: # -b, --block-size=BYTES Signature block size # -S, --sum-size=BYTES Set signature strength # --paranoia Verify all rolling checksums # IO options: # -I, --input-size=BYTES Input buffer size # -O, --output-size=BYTES Output buffer size # create signature for old file rdiff signature old-file signature-file # create delta using signature file and new file rdiff delta signature-file new-file delta-file # generate new file using old file and delta rdiff patch old-file delta-file gen-file # test diff -s gen-file new-file # Files gen-file and new-file are identical
Introduction
rdiff is a program to compute and apply network deltas. An rdiff delta is a delta between binary files, describing how a basis (or old) file can be automatically edited to produce a result (or new) file.
Unlike most diff programs, librsync does not require access to both of the files when the diff is computed. Computing a delta requires just a short "signature" of the old file and the complete contents of the new file. The signature contains checksums for blocks of the old file. Using these checksums, rdiff finds matching blocks in the new file, and then computes the delta.
rdiff deltas are usually less compact and also slower to produce than xdeltas or regular text diffs. If it is possible to have both the old and new files present when computing the delta, xdelta will generally produce a much smaller file. If the files being compared are plain text, then GNU diff is usually a better choice, as the diffs can be viewed by humans and applied as inexact matches.
rdiff comes into its own when it is not convenient to have both files present at the same time. One example of this is that the two files are on separate machines, and you want to transfer only the differences. Another example is when one of the files has been moved to archive or backup media, leaving only its signature.
Symbolically
signature(basis-file) -> sig-file delta(sig-file, new-file) -> delta-file patch(basis-file, delta-file) -> recreated-file
Use patterns
A typical application of the rsync algorithm is to transfer a file A2 from a machine A to a machine B which has a similar file A1. This can be done as follows:
- B generates the rdiff signature of A1. Call this S1. B sends the signature to A. (The signature is usually much smaller than the file it describes.)
- A computes the rdiff delta between S1 and A2. Call this delta D. A sends the delta to B.
- B applies the delta to recreate A2. In cases where A1 and A2 contain runs of identical bytes, rdiff should give a significant space saving.
source
Comments
Post a Comment