Just when I thought I had done a pretty complete tuneup on the storage path from Veeam backup to an Iomega IX2-200 NAS, two things came up I wanted to test. The first one (why didn’t I think of that) is to set compression to “low”, saving CPU cycles and hopefully getting more throughput. The second one was starting a second job on the same Veeam VM to the same target storage.
Starting a second job to the same target storage
Veeam backups are running ok in my home lab now. The large file server is still a little slow, backing up at 6MB/s. Once the full backup is complete, Veeam will start to leverage CBT, which will put all problems concerning performance in the past (CBT is cool!).
But I wondered: Veeam only uses a single backup stream per job, but a second job will start a parallel stream (and unfortunately a second dedup store). So what would happen to the throughput once I started the second stream? Would speeds sink, and render the same total throughput as a single stream, or would the total throughput rise?
Since my full backup of the file server was still running, testing was simple.
Configuring a second job to Veeam
I configured a second job to the Veeam VM. I used my trusty testing VM to create a second reverse-incremental backup to the same target store as the primary backup stream was using. The troughput I got out of it was quite amazing:
As you can see in the VI client performance graph above, the write speed to the IX2-200 actually got a lot better during the parallel backup. Backing up a single VM the throughput was 4MB/s (Veeam showed a throughput of 6MB/s due to compression). As soon as the second VM backup kicked in, the total throughput to the IX2-200 went up to 6MB/s (Veeam showed 9MB/s summed up once again because of compression). That’s a significant gain. Downside is that Veeam creates a second dedup store for the 2nd job.
Looking at the CPU graphs, we see these:
As you can see, there is not much change in the amount of CPU usage; the CPU remains at an all time high value. Ready time spikes but does not show much difference during the parallel backup as well. But when you look at the Wait and System times of the vCPUs, you clearly see that the Wait time goes lower even more during the parallel backup (saturating the CPU even more), while the System times on vCPU0 (my guess is core0) increase by just a bit, probably due to the higher amount of I/O performed.
Setting compression to “low” in Veeam
Next thing to try was to set the Veeam compression to “Low”. This should spare CPU cycles (at a somewhat larger backup set). So it might boost performance on systems with weak CPUs. I did indeed see a lower CPU usage. The CPUs were now loaded to a much lower level (approx. 1200MHz per core versus the previous approx. 1800MHz per core). The speed increase was about 1MB/s, not that much. But looking at the resulting output file (full backup file), the size had increased from 4.4GB to 6.9GB (!!). So thing about changing this compression setting: speed increase is not too much, but the resulting backups are much much larger.
Watching zero blocks
When I was looking at my file server backup, in the end it suddenly shot up about 100GB within the hour. I had the idea that it had found a large number of zero blocks. Browsing through the VI client performance graphs, I found this graph:
As you can see in the graph above, the file server encountered a massive amount of null blocks. This hop in the graph is nowhere else to be seen in the performance graphs; only when I looked at the “virtual disk” counters at the read rate (not the default latency graphs) this “feature” popped up. It is safe to assume that at over 200MB/sec read speed vSphere “just knows” it’s a null block and does not bother to get it from disk (I l-o-v-e CBT).
Looking at the measurements above, it becomes clear that running multiple backup streams in parallel can help, at least in a tiny environment as this one. I guess you’ll have to decide what is more important: Increased backup speeds or a single dedup store (meaning less disk space required). No matter what you prefer; it is important to at least know that performance might benefit from using multiple backup streams in parallel (even on very weak CPUs).
When you have weak CPUs, setting the compression to “low” in Veeam will help as well, possibly even better when you run multiple jobs in parallel. But do not forget that there is quite a big backup size increase involved (I saw a size increase of over 38%!).