Determining the size of your Provisioning Services Write-Cache

I been prepping to get a new XenDesktop 5.6 FP1 / XenApp 6.5 environment to run with Provisioning Services 6.1.  In the last few weeks, I been collecting lots of information re: best practices, etc.   In the past I worked with PVS versions 5.0, 5.1, 5.2 and 5.6.  However I always ask my self and others “What is the correct size for my write-cache?” A good question and the answer, well “it depends.”  Below is some great information to use for determining the write-cache drive size.

What is Provisioning Services, a write-cache, and why does it matter?

Before we begin a little background on Citrix Provisioning Services (PVS) and how it works. Provisioning Services provides administrators the ability to virtualize a hard disk or workload and then stream it back out to multiple devices. The workloads, which can be server or desktop, are ripped from a physical or virtual disk into Microsoft’s virtual hard disk (VHD) format and treated as a golden master image called a vDisk. This master image is then streamed over the network from a Windows server running the stream service to multiple target devices that were PXE booted.

Device drivers (network and disk) installed on the target devices (physical or virtual) have the intelligence to route disk file requests over the network to the PVS streaming servers which in turn provide the requested files. The entire vDisk is not streamed across the network. Only the files requested by the operating system are streamed across the network. This means that a 30 GB Windows Server 2008 R2 workload that boots off a streamed vDisk may only see 200 MB of files transfer across the network.

When a vDisk is in private mode, the vDisk can be edited and all reads and writes to the vDisk and only one target device may be accessing the vDisk. When a vDisk is in standard mode, it is read-only and no changes can be made to it. Instead all disk write operations are redirected to what is referred to as a write-cache file. The intelligent device drivers are smart enough to redirect writes to the write-cache file and read newly written files from the write-cache file instead of the server when necessary. Also, whenever a target device is rebooted, the write-cache is deleted and recreated and the device boots in a pristine state.
What factors are important in determining the write-cache file size?
When using Citrix Provisioning Services with the vDisk in standard mode you have a write-cache drive location that holds all the writes for the operating system. If the write-cache file fills up unexpectedly, the operating system will behave the same as if the drive ran out of space without any warning, in other words it will blue screen.

The optimum size of write-cache drive does depend on several factors:

Frequency of server reboots. The write-cache file is reset upon each server boot so the size only needs to be large enough to handle the volume between reboots.

Amount of free space available on the c: drive. The space that will be used for new files written to the c: drive is considered the free space available. This is a key value when determining the write-cache drive size.

Amount of data being saved to the c: drive. Data that is written to the c: drive during operation will get stored automatically in the write-cache drive. New files will be stored in the write-cache file and decrease the amount of available space. Replacements for existing files will also be written to the write-cache file but will not marginally affect the amount of free space. For instance, a service pack install on a standard-mode disk will result in the write-cache file holding all the updated files, with very little change in available space.

Size and location of the pagefile. When a local NTFS-formatted drive is found, Provisioning Services moves the Windows pagefile off of the c: drive to the first available NTFS drive, which is also the location of the write-cache file. Therefore, in the default configuration, the write-cache drive will end up holding both the write-cache file and the pagefile. To learn more about correctly sizing your pagefile, see Nick Rintalan’s blog, “The Pagefile Done Right!”.

Location of the write-cache file. The location of the write-cache file is also a factor in determining its size. The write-cache file can be held on the target device’s local disk, the target device’s RAM, or on the streaming server.

  • Target device disk: If the write-cache file is held on the target device’s disk, it could be a local disk to client, local disk to the hypervisor, network storage to the hypervisor, or SAN storage to the hypervisor.
  • Target device RAM: If the write-cache file is held in the target device’s RAM the response time will be faster and in some cases the additional RAM is less expensive than SAN disk.
  • Streaming Server: If the write-cache file is on the server, no preset size is necessary. When using server-side write-cache file, the Provisioning Services streaming server must have enough disk space to hold the write-cache files for all target devices managed.

Determining the correct write-cache drive size is mostly a logical exercise once you understand the relationship of the write-cache file and the pagefile with the write-cache drive.

Guidelines for determining write-cache size
In the old days we would recommend running with server-side write cache for the duration of the pilot project and then find the largest write-cache file on the server before the target devices were rebooted. From there we would just double or triple the size and make that the default size for a write-cache file. That approach works most of the time, but the approach is not so efficient with disk space.

Below are the few guidelines I use when recommending a size for the client-side write-cache drive.

  1. Write-cache drive = write-cache file + pagefile (if pagefile is stored on the write-cache drive)
  2. Write-cache file size should be equal to the amount of free space left on the vDisk image. This will work in most situations, except those where servers receive large file updates immediately after booting. As a rule, your vDisk should not be getting updated while running in standard-mode.
  3. Always account for the pagefile location and size. If it is configured to reside on the c: or d: drive, include it in all size calculations.
  4. Set the pagefile to a predetermined size to make it easier to account for it. Letting Windows manage the pagefile size starts with 1x RAM but it could vary. Manually setting it to a known value will provide a static number to use for calculations.
  5. During the pilot, use server-side write caching to get an idea of the maximum size you might see a file reach between server reboots. Obviously, the server should have a full load and should be subject to the normal production reboot cycle for this to be of value.
  6.  If people die when servers blue screen, set the write-cache drive to the size of the vDisk plus the pagefile size.

In most situations, the recommended write-cache drive size will be free space available on vDisk image plus the pagefile size. For instance, if you have a 30GB Windows Server 2008 R2 vDisk with 16GB used (14GB free) and are running with an 8GB pagefile, I would recommend using a write-cache drive of 22GB calculated as 14GB free space + 8GB for the pagefile. If space doesn’t permit, you have a few options, not all of which may be available to you.

  1. If storage location for the write-cache drive supports thin-provisioning, configure thin-provisioned drives for the write-cache drive to save space.
  2. Use dynamic VHDs (instead of fixed VHDs) though this approach is generally only recommended for XenDesktop workloads. If you choose this approach, you will probably need to periodically reset the size of the dynamic VHD, which can be done with a PowerShell script.
  3. Reboot the servers more frequently which in turn will reduce the maximum size of the write-cache file.
  4. Move the pagefile to a different drive or run without a pagefile.
  5. Use the old school method mentioned earlier to select a write-cache file size that is equal to or larger than the largest write-cache file recorded during the pilot stage. Using this option though may still result in blue screen events.

Of course, if you require 100% uptime and you have the disk space available, the sure-fire write-cache drive size is to set it to the size of the vDisk plus the pagefile size when the pagefile will get placed on the write-cache drive. In other words, if the Windows Server 2008 R2 vDisk image is 30GB and you have an 8GB pagefile configured, setting the write-cache drive size to 38GB will protect against any unforeseen blue screens. However, not everyone has that kind of space available, especially when using the expensive SAN storage for the write-cache drives.

Scalability implications
Just a quick note that large-scale environments, the best practices recommendation is to place the write-cache drive on the client hard disk rather than on the server. Generally speaking, you get about 40-60% more target devices on a single Provisioning Server with client-side write-cache than you do with server-side write-cache drives.  In addition, failover works better as the client target device has its write-cache available no matter which server is streaming the vDisk.

The use of client-side write-cache provides the maximum scalability of the Provisioning Services streaming server because the server does not need to perform both reads and writes for all target devices; rather the server is only required to read the vDisk once, cache the contents, and then stream it out over the network. This saves both CPU and network bandwidth on the streaming server allowing it to manage more target devices.

Citrix Provisioning 6.1 and ESX 4.1 / 5.0 VMXNET3 NIC driver issue

Before you roll out Citrix Provisioning 6.1 make sure you install the latest Citrix PVS 6.1 Hotfixes. I had some really weird behavior when attemting to boot up VMs both on ESX 4.1 and 5.0

Issue:

I was not able to get target VMs working with ESX 4.1 / 5.0 utilizing the VMXNET3 NIC driver.  As most of you know, this is the recommended NIC to utilize in a VMware virtual infrastructure.  Unfortunately, VMs would not boot and had to rollback to the E1000 driver and run the 5.6 Target Agent (What?).  After successfully booting up the VM, I decided to do more digging on the VMXNET3 issue.

So after a few hours of pulling my hair this is what I found:

Install the latest PVS Hotfixes , in my case CPVS61E010 for PVS 6.1 (basically it is PVS 6.2).  However the following steps need to be followed:

  • Completely remove the existing install and re-install the core software
  • Upgrade the Database so make sure you have a good backup (the install process takes care of this for you)
  • Install the New target Agent.
  • This all works under 5.x, for some reason if you are running 4.1 like a lot of people, you will need to apply Hot fix CPVS56SP1E011 (What the heck?)  Once this was done I was able to utilize the VMXNET3 NIC driver on my Provisioned VM.

Windows 2008 R2 / Windows 7 with Citrix PVS – System reserved partition issue

Now that I am done with a new XenApp 6.5 Farm, it is time to set up Citrix Provisioning 6.1 and convert the environment to a PVS managed farm.

Unfortunately after booting up a converted Windows 2008 R2 XenApp 6.5 server to a PVS disk with XenCovert, I received a “Please insert System Disk” error when booting up.  After doing some research, I realized that that the System Reserved partition is not compatible for a Provisioned image. Note that this applies to Windows 7 as well.

Some facts:
If you install Windows 7/Windows 2008 on a clean disk with no existing partitions, it creates a System Reserved partition at the beginning of the disk and uses the remainder of the unallocated space to create your system drive. That small partition isn’t assigned a drive letter, so you won’t even know it exists unless you look in the Disk Management console or use a low-level utility, such as Diskpart, to inspect the disk structure.

Solutions:

1. Set up your Windows 7/ Windows 2008 R2 servers without the System Reserved partition:

Once Setup is loaded, press Shift + F10 keys at the first setup screen (which allows selection of language, keyboard and locale). A Command Prompt window will be opened.

Run Diskpart

Type in the following:

List disk (to show the ID number of the hard disk to partition, normally is Disk 0)

select disk 0 (change 0 to another number if applicable)

clean

create partition primary

select partition 1

active

format fs=ntfs quick

exit

Continue installation

2. Remove the System Reserve partion – In my case this is what I had to do since the server I wanted to convert to a PVS image was already finalized.

I found this solution in this article from terabyte however Carlo from VMwareinfo.com extracted the information into a simple to follow instructions.

Assign a drive letter to the System Reserved partition. I used S.

Unload the BCD registry hive by running the following command:

reg  unload  HKLM\BCD00000000

Copy the bootmgr file from the System Reserved partition to the C: Partition. 

robocopy  S:\  C:\  bootmgr

Copy the Boot folder from the (booting) partition to the C: partition.

robocopy  S:\Boot  C:\Boot  /s

To update the copied BCD file so it will boot correctly, run the following command:

bcdedit  /store  c:\boot\bcd  /set  {bootmgr}  device  partition=C:

Remove the Drive letter (S:) from Disk Manager and Reboot.

Once the System Restarts (Booting from the C: drive now), you are now free to delete the System Reserved partition.image

With only 1 active Volume now, you can assign a vDisk (in my case V) and proceed with your XenConvert imaging.  Be sure to use Volume to Volume.

Lastly as noted by MWaler in his post (thanks for the pointer) , make sure to set the active partition, if not you will run into issues when booting up.

SNAGHTML133c650

Injecting drivers to your PVS image

Ok… so if you been working with XenApp for a while, well you know that PVS is an awesome way of distributing virtual apps to internal/external users.

I am proud to have worked in a very complex and demanding PVS environment where I learned a ton from.  To give you some history, we started with an ESX 3.5 XenApp environment and decided to go with PVS images as we were hosting over 400 XenApp apps.  Because of the success and demand for Apps, we decided to go physical rather than virtual in order to get more users per XenApp server. (I did try XenServer but it just did not compare to a physical server)

One of the best messaging engineers I had the pleasure of working with injected in my brain that I should always consider running a hybrid virtualization environment, meaning utilize virtualization as much as you can, but always think what would happen if your visualization layer goes down (Thank you Mike).  For a company where XenApp was the ONLY way of working remotely, and very crucial business units ONLY utilized XenApp for their everyday work, downtime was not an option.

Well with that thought in mind… How the hell can we make virtual XenApp VMs boot up on physical servers?  Let me tell you that this process can be tedious.

Now the article below assumes you already have a PVS image.  If your process includes updating the PVS device software taget.  Please read this good article from Citrix first.

Lets get started

1 ) Boot existing VM in standard mode with image that will be updated

2) Add drive to VM (this can be done while VM is powered on).  Drive must be larger than the used capacity of the vdisk to be imaged.  Below is a sample screenshot of a new F: added to the VM

Image

3) Run BNImage.exe from C:\Program Files\Citrix\Provisioning Server.  Select new drive as destination for image

Image

4) When imaging completes, set partition as active in device manager.  Shutdown VM.

5) Set Machine to boot from hard drive on provisioning server

6) Change order of disks on VM so new drive is SCSI 0: 0.  Cache Drive should be SCSI 0: 1

7) Boot machine from local disk

8) Add SCSI drivers to image for target device (not necessary for G6 – G7 since the SCSI controller is the same)

9) Remove current version of provisioning server software.

10) Upgrade VMWare tools to latest version

11) Remove VMware Tools, VMWare user process, SunJavaUpdateSched from windows run key (HKLM\Software\Microsoft\windows\currentversion\run)

11) Upgrade VM to hardware version 7 (this requires shutting down VM). Right click on the VM in the VI client for this option.

13) While VM is shutdown, Change NIC for VM to VMXNet3.  Update provisioning server with new MAC address

14) While VM is shutdown, Temporarily add additional disk to VM on SCSI bus 1:xx.  This will add an additional SCSI controller.  Select LSI SAS.

15) Power back on virtual machine.  VMXnet3 will be detected, and LSI SAS controller.  Add drivers for LSI SAS controller. – http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068/index.html

16)  Shutdown VM and remove temporary drive on SCSI bus 1:00.  Change first SCSI controller to LSI SAS.

17) Boot VM with Acronis CD – [CTXStore_UnProtected_01] AcronisBootCD.iso.  Create Acronis image of VM on Network location

18) Boot target physical machine with Acronis CD.  Copy image created from VM to local drive on server.

19) Set physical machine to PXE boot from provisioning server.  Set machine to boot from hard drive on provisioning server.  Attach blank PVS VHD disk to machine that image can be copied to.

20) Boot physical machine into OS on local drive.  Machine will hang for a few minutes on the login screen while drivers are loaded in the background.  If the machine is not yet accepting keyboard input just wait a few minutes.  Once drivers are finished loading, login.

21) Install Provisioning server target device software (5.6 SP2).   PVS_Device.exe

Once installed new PVS volume should show up right away without reboot (as long as the server was PXE booted).

22) Run Bindcfg.exe from C:\Program Files\Citrix\Provisioning Services\.  Select additional NICs to bind PVS to.

23) Verify registry keys in Citrix KB article after PVS target reinstall – http://support.citrix.com/article/CTX117374

22) Run BNImage.exe software to image local disk to provisioning server drive.

23) Switch server to boot from vdisk, confirm server boots correctly from vdisk