Be careful of deploying SQL Server on SoFS if you use DPM… even if you’re using DPM 1807

January 2, 2019 Leave a comment

Even though DPM since version 1801 can backup VMs on SoFS, it doesn’t means that it can backup SQL Server with database located in SoFS. If you are looking for SQL Server VM cluster, if pass-through disk access is not your option, and your budget is not plenty enough to pay for SQL Server Enterprise Edition License, please use VHD Set, and place the VHD Set files onto SoFS.

I still cannot figure out how come Microsoft manufacture SQL Server, but cannot make sure all kind of deployment options are supported by their own backup tool.

Advertisements

DPM 1807 – Replica Inconsistent Fix (sort of)

January 1, 2019 Leave a comment

Hi there.  Haven’t been there for quite long… all projects are driving me crazy, particularly those Windows Server 2016 Storage Space Direct (S2D) hyper-converged ones are pretty much challenging, but the result is amazing.

After all, some customers are still using DPM, due to Microsoft’s SA their DPM 2016 is still upgrade-able to 1807.  Thus I have a chance to give it a try – on S2D volumes.

In fact that customer has old VMs migrated from traditional SAN to new S2D HCI cluster.  Pretty weird some VM cannot be backed up.  The symptom is: when the initial replica is created, it stop pretty quickly, in DPM console I can see only ~100MB storage being used:

Capture.JPG

When you check on the Monitoring tab, you will find the job has 0MB data transmitted.

As I’ve feedback to Microsoft product group face-to-face 2 years ago, why DPM cannot get famous, a very very key point is the bad quality help system.  In fact that has not been improved so far.  That’s why I have switched new customers to use Veeam.  However, that cannot be a sharp cut, and Veeam also have it’s own problems too.  Maybe we can look into it later on how to deliver a smooth Veeam operations on complex Hyper-V environment.  Now just focus on some old shxt – DPM.

I found there are primarily 3 reasons in my customer’s environment which delivers this error:

  1. A particular VM cannot create snapshot (this is Hyper-V level error)
  2. There are some “ghost” snapshot for a VM that cannot be removed (this is potentially a DPM level error)
  3. A VM that has been migrated via Hyper-V replica (the reason still unknown)

For scenario 1, the fix is pretty much simple:

  • Power-off the VM
  • Take the snapshot (create a checkpoint) when the VM is off
  • Delete the checkpoint, allow the VHD/VHDX file to merge
  • Power-on the VM again
  • Try taking snapshot again when VM is ON, now it should be done successfully.  If that’s OK, delete the checkpoint and allow VHD/VHDX to merge again
  • Go back to DPM, try consistence check again.  Now the backup should become normal – you will find there will be data transmission.

For scenario 2, the fix is simple too:

  • Remove the “ghost” checkpoint, but how?  Since most likely it cannot be done on Hyper-V manager UI, failover cluster manager as well as SCVMM console.  Here’s the procedures:
    • Start Windows PowerShell using Administrator privilege
    • Run the following command:

Get-VMCheckPoint -VMName xxxxx | Remove-VMCheckPoint

  • Done

For scenario 3, you can try moving the VM storage from the default Hyper-V replica destination folder to somewhere else.  I can’t figure out the reason, after the move the backup can return to normal.  I think that can be due to “VM storage move” actually is cloning the VHD/VHDX to a new file instead of just copying it, there might be some sort of configuration or binary corruption of the old VHD/VHDX file which block the backup from being normally executed.

Wish this can help you guys.

Exchange Server 2013 Blank Page when access OWA

January 6, 2016 Leave a comment

I want to share this troubleshooting experience with all of you and hopefully it can help saving the day.

Yesterday I was logging into one of my testing mailbox and I found the mailbox cannot be opened.  It kept prompting password again and again.  When I remote to the backend and login using RUNAS command with that user account everything works like a charm.  I wonder it’s an Exchange specific problem.  The most interesting point is: for other users and accounts we found no problem.

So, I have decided to reboot the Exchange Server during an allocated downtime last night.  I was still thinking: rebooting can resolve everything in Microsoft world.  In fact, I realized that this is a very old fashion thinking and that is no longer applicable in new Microsoft infrastructure.

After reboot, everyone cannot access Exchange.  I even cannot access Exchange Control Panel!

I examine the event log in detail, there is no Application log error.  However, I found some System log error, with Source HttpEvent and event ID is 15021:

ex-err1

I also notice that the endpoint is 0.0.0.0:444.  This is the Exchange backend webapp.  I start to examine the SSL endpoint and see what’s happening.  I used the following command:

netsh http show sslcert ipport=0.0.0.0:444

The following output will be generated:

ex-err2

 

I guess this is the source of problem.  I think I need to delete and re-import the certificate to resolve the problem.  So, I use the following command to perform the task

netsh http delete sslcert ipport=0.0.0.0:444
netsh http add sslcert ipport=0.0.0.0:444 
certhash=your_certificate_hash appid="{your_certificate_thumbprint}"

However I got no luck, I faced another problem during import:

SSL Certificate add failed, Error: 1312
A specified logon session does not exist. It may already have been terminated.

I search again from Internet, someone said export the certificate with private key and re-import can help.  So I tried.  But the same problem persists.

Finally, I resolved the problem by binding the port to an self-signed certificate and rebooting the server again.  After that, I use the following cmdlet to import the certificate from the CA (not from export) again.

Import-ExchangeCertificate

I wish this experience sharing can help.

Categories: Exchange

Reschedule SCDPM Scheduled Maintenance Tasks

December 27, 2015 Leave a comment

Do you know SCDPM does have system default schedule maintenance task?

Recently I have received a customer request, to move away the tape detail inventory tasks from a possible backup-to-tape time slot.  Actually, SCDPM has 2 types of system default maintenance tasks:

  • “CatalogPruning” –  This system schedule maintenance task removes index entries for expired tapes.
  • “DetailedInventory” – This system schedule maintenance task identifies new tapes and recognizes tapes SCDPM has seen before by reading the on-media identifier (OMID) on each tape.

By default SCDPM will perform “DetailedInventory” job every morning at 9:00am.  You can get this information by the following PowerShell command:

Get-DPMMaintenanceJobStartTime -DPMServerName YourDPMServerName 
-MaintenanceJob DetailedInventory

As you might have guessed it right.  You can change the system scheduled maintenance task via the PowerShell Cmdlet “Set-DPMMaintenanceJobStartTime”.

The following PowerShell command change the scheduled maintenance task “DetailedInventory” from default 9:00am everyday to 2:00am everyday:

Set-DPMMaintenanceJobStartTime -DPMServerName YourDPMServerName 
-MaintenanceJob DetailedInventpry -StartTime 02:00

Remember, this cmdlet is targeting to change the schedule, instead of adding new schedule.

Categories: SCDPM, System Center 2012

Troubleshooting DPM 2012 Tape Verification Problem (ID 30129)

September 29, 2014 Leave a comment

Few days before I have a customer facing a DPM 2012 (not yet R2) tape verification problem.  They found the verification process cannot be completed. When “show detail” link is clicked, the following dialog appeared:

DPM Error ID 30129

DPM Error ID 30129

The error described “DPM failed to generate the file list to complete the task (ID 30129).  The “Recommended Action” shows that “On the libraries tab in the Management task area, select the tape and then click View Contents in the Action pane”.  As you’ve expected, the “Recommended Action” did not help anything on resolving the problem.

Of course, the “More Information” link in the warning message will bring you online, and give you “link not found”.  This is very common for DPM.  What a shame, Microsoft.

When you search this error ID on the web, you will see some questions being posted but no one was answering.  I can see how frustrate those people will be when facing this problem.

In fact, DPM is not a bad tool.  However, you need to be very competent to Microsoft backend server and storage technology and theory to make it trouble-free.  If you cannot resolve a problem by yourself, and you are expecting someone on the web can help you out, that’s not quite possible.  You might need to pay for Microsoft Premier support, and let them play around the scenario for several weeks (reproduce the problem, log collection, engineer vacation, delay, etc.) before the case is escalated to product group and resolved.

Here I have managed to resolve the problem, however, that might not be exactly your case.  However, the fix is not destructive thus it is worthy to try though it takes time.

  1. Perform the “Recommended Action” first.  I have done this but it has no effect.  However, I cannot say this does not affect the final resolution, so better perform it first.
  2. Perform tape full inventory.
  3. Find out which data source is having problem.
  4. Create long term (tape) protection point.
  5. Dismiss the warning.

After this my error is resolved.  You can see if the verification can be completed successfully or not.  Please give me a response if this helps resolving your problem too 🙂

Categories: SCDPM

Virtual Fiber Channel or Shared VHDX Cluster?

September 19, 2014 1 comment

One major enhancement of Windows Server 2012 R2, is “Shared VHDX” guest cluster.  However, how to make a choice between virtual fiber channel and shared VHDX guest clusters?  I have done some personal comment and see if they are useful for you to make your own choice:

Virtual Fiber Channel

Pros

  • Familiar – this is what you have been doing so far for physical hosts and SAN: WWN, zoning, etc.
  • Native performance – one less layer is one less layer.  Virtual FC delivers native SAN performance.
  • Migration – possible to failover a cluster to/from physical node from/to virtual node

Cons

  • LiveMigration trick – tricky steps need to be done to avoid drive missing after LiveMigration
  • Backup – not supporting host-based VSS snapshot backup
  • Snapshot – not supporting snapshot and snapshot rollback
  • WWN hell – create a great deals of WWN and mess administration
  • LUN provisioning – requires LUNs to be provisioned.  Storage operations.
  • Not supporting Hyper-V replication

Share VHDX

Pros

  • Shield out direct LUN operations
  • Can leverage advanced Windows storage features – CSV cache, SSD tiering, etc.
  • No virtual SAN switch and virtual FC NIC – easier for Hyper-V farm hardware design
  • Simple to configure
  • Easy to manage – handling VHDX file instead of LUN

Cons

  • Not supporting Hyper-V replication
  • Backup – not supporting host-based VSS snapshot backup
  • Snapshot – not supporting snapshot and snapshot rollback

Then, what’s your choice?  I choose shared VHDX 😉

Categories: Uncategorized

Is SCVMM an optional item? A more practical way of provisioning Hyper-V 2012/R2 and Hyper-V cluster – Part 1

September 19, 2014 Leave a comment

Many customer asked me “is SCVMM a must for Hyper-V environment?”  The answer is definitely no”.  However, you will found yourself in great pain without SCVMM.  And the most important point is: you don’t know that yourself is suffering, and just yelling to the “limited capability” of Hyper-V 2012 when comparing to VMware.

If I heard your yell, I will ask you the following questions:

  • “Did you use SCVMM before?”
  • “What was the last version of SCVMM you have used?”
  • “Do you know what SCVMM 2012 SP1/R2 can do for you over SCVMM 2008 R2?”
  • “Are you a PowerShell Guru?”

In Hyper-V 2012/R2, there are whole brunch of new features.  A large amount of these new features are not published in intrinsic Hyper-V Manager UI, thus you will need to configure them using PowerShell.  In fact, SCVMM 2012 SP1/R2 is your shortcut to bypass PowerShell as much as you can.

Unfortunately, Microsoft did not provide effective market awareness and training to their partners and customers, about how powerful task SCVMM 2012 SP1/R2 can do in a structural way, and making your Hyper-V infrastructure can even do more than VMware + vSphere but a lot easier.

I witnessed a so call “regional architect” of a hardware vendor, when submitting SCVMM 2012 SP1 design to customer, was just copying from corporate “stock SCVMM 2008 R2 design”, and did search and replace “2008 R2” to “2012 SP1”.  That was a shame.  That guy even declared that design was following “Microsoft best practices”.  Of course that design was right the way rejected by a local Microsoft guy and me, as the 2 guest reviewers XD

SCVMM 2012 SP1/R2 and SCVMM 2008 R2 are 2 different products.  Not just a simple version upgrade.

The primary reason behind, is SCVMM 2012 SP1/R2 is a cloud ready hypervisor management tool, which has anticipated network fabric, storage fabric, service delivery, hypervisor lifecycle management and more.

I will leave SCVMM 2012 SP1/R2 introduction to a later post.  In this version SCVMM is a real complex and powerful product that I cannot go through it in 1 single post without destroying my fingertips.  I want to pull myself back to “a more practical way of provisioning Hyper-V 2012/R2 and Hyper-V cluster”.

Traditionally, I saw people deploy Hyper-V before SCVMM.  This is totally wrong in 2012/R2.

You will need to deploy SCVMM 2012 SP1/R2 before your first Hyper-V host, if you don’t want to pull down your whole infrastructure and rebuild in the future, since you will find you have missed out some very key SCVMM and Hyper-V capability if you reverse the order; and this cannot be easily fixed after you have put Hyper-V into production.

It starts from hardware design.  How many NICs will you be using for Hyper-V?  For me, I will use 2+ GBE for host, and 2+ 10GBE for VM traffics.  Of course, if you prefer 10GBE over GBE, go ahead to make 2+ 10GBE for host and 2+ 10GBE for VM traffics.

Why 10GBE?  That’s simple mathematics.  It’s common to have 20+ VMs in nowadays hardware.  If you use 4x GBE, anticipating 20% Ethernet overhead, will give you a 3.2Gbps total throughput ideally.  That’s translated to 400MB/s total throughput for 20 VMs.  Are you OK to have 20MB/s (= 2x 100Mbps) average throughput for a production VM for traffic volume as of today?

The next thing is: I will do teaming on these NICs, while I will put them into 2 teams – the host team, and the guest team.

Here comes the 1st trick: configure host team and the virtual interfaces first, and don’t touch the guest team before the Hyper-V host is managed by SCVMM.

When NIC are teamed before managed by SCVMM, SCVMM will treat the team as a single NIC.  In this case you will have no way to create virtual interface via SCVMM.  That means, you cannot use SCVMM fabric management feature to sculpture your virtual network.  You can only use Windows NIC teaming utility: LBFOADMIN.EXE console to partition your virtual network in an abstract way.

If you team your NICs in SCVMM, you can create virtual network segments in SCVMM, and synchronize that setting back to Windows OS.  in this case, you don’t need to touch Windows LBDOADMIN.EXE any more for virtual network segments.

I will continue this topic in part 2.