Looks like we are dealing with faulty HBAs. Before concluding, can you try installing 6.0 U3 and check the status?
Cheers,
Supreet
Looks like we are dealing with faulty HBAs. Before concluding, can you try installing 6.0 U3 and check the status?
Cheers,
Supreet
Well, it turns out that my Win/7-Pro desired & recommended monitor settings were the culprit. I had the recommended resolution settings but had my text and other items set to 125% size. Once adjusted back to 100%, the pop-up menu displays fully and normally.
Now, I just gotta accept (and get used to) these smaller settings. Ugh!
The dump you provided is for a datastore named datastore_local which seems to be a newly formatted volume with no directories or VMs stored.
To send larger files use Dropbox or a free filehoster.
Do not consider changing anything for the disk from the screenshot.
It should have the 20 VMs we are looking for.
Discussion will be continued via Skype.
We will post updates for everyone else interested in this case
Hello
Some time ago I noticed a VERY large number of messages in the log of vmkernel.log similar to:
Partition: 648: Read from primary gpt table failed on "naa.600a...."
Almost all datastore devices(absolutly different datastores and LUNs) are listed in the log.
The output of the "partedUtil getptbl" command is:
Error: The primary GPT table is corrupt, but the backup appears OK, so that will be used. Fix primary table ? diskPath (/dev/disks/naa.600a...) diskSize (23622320128) AlternateLBA (1) LastUsableLBA (23622320094)
I have tried the "partedUtil fixGpt" command and it fixes the GPT.
However, I have questions:
1. How safe is it to use in a production environment?
2. What are the unpredictable consequences of this command?
3. What can happen if you ignore these messages?
4. How can I see what exactly is damaged in the Primary GPT?
Output of the "partedUtil getptbl" command before fixing and after is the same:
gpt
1229833 255 63 19757268992
ps ESXi 6.5 U2, datastores connect to hosts via FC
Right, Looks like I am going to have to buy a replacement RAID card to extract the data.
The screwed firmware of the Dell host is another mater entirely.
When it rains, , it pours
1. How safe is it to use in a production environment?
If you receive the error message: The primary GPT table is corrupt, but the backup appears OK, so that will be used.
as opposed to the message: The primary GPT table is corrupt/ missing
then this fix is the best thing you can do.Even better if you create a backup first by dumping the first MB of the volume to another datastore.
This will allow you to revert the fix in the improbable case something goes wrong.
2. What are the unpredictable consequences of this command?
In some really rare cases the size of the datastore is reported incorrectly. If you hit such a case you would not be able to mount the datastore again after a reboot.
In this case you would use the partedUtil commands that show the max size and should be able to adjust the size accordingly.
I would not recommend to run the command while the datastore is highly active with backups for example but other than that I am not aware of further unpredictable consequences.
3. What can happen if you ignore these messages?
In worst case the backup GPT table gets lost too - in this case you would have to create the partition from scratch - which is way less desirable but still manageable.
If both tables are bad and you reboot you will not be able to mount the datastore without recreating the partitiontable first.
4. How can I see what exactly is damaged in the Primary GPT?
You can run
hexdump -C /dev/disks/device | less
this will not be really helpful unless you eat hexdumps for supper.
A GPT-table uses a strict syntax and if only a few bits are wrong partedUtil will not display anything at all.
If you ask this because you are surprised why a modern OS would corrupt the partitiontable at all - consider that ESXi tries to keep info like the partitiontable in RAM most of the times.
So unpredictable events like powerfailures have more severe consequences as you are used to with OS like Windows for example.
Summary:
I regard replacing the bad primary table with the healthy backup table as one of the few well documented and safe options you have when dealing with VMFS-problems.
Ulli
Thank you for the detailed answer
Even better if you create a backup first by dumping the first MB of the volume to another datastore.
This will allow you to revert the fix in the improbable case something goes wrong.
Can you give an example of how I can dump and then load back the first megabyte from the partition?
(May be this? For dump: dd if=/vmfs/devices/disks/naa.ID of=/vmfs/volumes/otherDatastore/dump.bin bs=1M count=1)
If both tables are bad and you reboot you will not be able to mount the datastore without recreating the partitiontable first.
Do I understand correctly that when the host is rebooted, the problem with the datastore will only be on this host?Other hosts will continue to work with the datastore without any problems until they are rebooted?
You got it already !
(May be this? For dump: dd if=/vmfs/devices/disks/naa.ID of=/vmfs/volumes/otherDatastore/dump.bin bs=1M count=1)
that creates the backup. To revert use
dd of=/vmfs/devices/disks/naa.ID if=/vmfs/volumes/otherDatastore/dump.bin bs=1M count=1 conv=notrunc
> Do I understand correctly that when the host is rebooted, the problem with the datastore will only be on this host?
> Other hosts will continue to work with the datastore without any problems until they are rebooted?
So you have a VMFS-volume on shared storage in a cluster ?
This sometimes can have strange effects in a cluster. The situation may look infectious and appears to be deteriorating accross the cluster.
Keep cool: try to isolate the datastore if possible to a single host. Then do the fix there and reboot that single host. If that is not possible do the fix and reboot each host as soon as production allows.
But I have not seen such issues in quite a while - I saw them more frequently with ESXi 5.x.
Basically an ESXi host should be able to continue operation if the partitiontable gets lost after the host has finished booting.
Yes, the VM used 2 virtual disks.
The multiple vmdks are from snapshots i guess?
Yea starting the VM that many times was not good, a bit of miscommunication with my colleague there.
Similar issues seen while upgrading via VUM or ISO image on the hosts that have RDMs mapped. Can you try upgrading using the offline bundle via the command line?
Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
Cheers,
Supreet
"More details":
Power On VM
Key: haTask-4-vim.VirtualMachine.powerOn-79278534
Description: Power On this virtual machine
Virtual machine: web3
State: Failed - Insufficient resources.
Errors: Out of resources
I am guessing it is out of disk space somewhere but not quite sure where to check as the 2 existing data stores have about 25+ GB free each!
The VM was working OK the previous day. No configuration changes were made to the erroring VM, however another VM was added which I suspect took up the "resources". Subsequently removed the new VM but the old was still not powering on.
I've searched the forums and around but not found anything related to this specific error.
Any suggestions on how I can get this machine started again?
This message just not indicates resource crunch with datastore but also with respective to memory , CPU etc..
Do you see any warning/message on the summary page of host where this VM's residing , also on VM's summary page ?
This is may be an issue with HA admission control as well.
There are situation that if admission control is enabled It checks for available constraints depending on which it powers on VM's .
Check out this blog for more understanding on this:http://geekswing.com/geek/vmware-cpu-and-ram-reservations-fixing-insufficient-resources-to-satisfy-configured-failover-l…
NOTE:
Are you facing issue only with one host or what happens if you migrate this VM to another host and try to power on ?
Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
regards
Gayathri
Not sure, but you may want to check whether Alt-F12 is already available at this stage, to see what's going on.
André
@ancechou thank you!
This wasted much time, after upgrading from free ESXi 6.0 to vSphere Essentials Kit ESXi 6.7
Now I can build and code sign my software releases using my Aladdin Knowledge Token JC. Thank you! Thank you!
By default, ESXi >= 6.5 will not permit pass through connection of CCID USB devices such as the Aladdin Knowledge Token JCto the guest VM.
CCID (chip card interface device):
*** Error displayed by ESXi on connect attempt:
usb.generic.allowCCID TRUE
Thanks for the prompt response GayathriS.
HA is not currently enabled, and no other configuration changes were made to the VM since it was last seen working. No obvious warnings or indications are shown on the summary page (CPU and RAM are nominal). With no other VMs running, this still doesn't power on and produces the error.
Something from the error log may help:
- Swap: 3683: Failed to create swap file '/vmfs/volumes/52d71515-0f264c58-2f2f-38eaa7abfa2e/web3/vmx-web3-2638557098-1.vswp' : Not found
Interestingly I've also noticed I can't make configuration changes either.
- Failed to reconfigure virtual machine web3. Unable to write VMX file: /vmfs/volumes/52d71515-0f264c58-2f2f-38eaa7abfa2e/web3/web3.vmx.
I can see that the VMX file is there (looking via bash and datastore browser).
So again I suspect disk space, but not sure exactly where to check or free it up. There are 2x 1TB datastores both with approx 25 GB free. Specifically, approx 930GB capacity, 905GB provisioned, and 25GB free. Could it be that within the provisioned storage we're out of space?
I can try the suggestion of using another host, however, as last resort as the vmdk's are huge - it should work on the existing host, and did at least recently.
Any further suggestions are welcome!
right click on VM and go to edit settings take a screen shot(which shows how many disk are added)
high light each disks and you should be able to see on which datastore the respective VMDK is stored, this will help you in checking respective datastore available space.
Yes it can be with storage, but you need to check .
On this message : - Swap: 3683: Failed to create swap file '/vmfs/volumes/52d71515-0f264c58-2f2f-38eaa7abfa2e/web3/vmx-web3-2638557098-1.vswp' : Not found
Check if you see vswp file from datastore, if you dont find which means its not getting created.
If it is available but still throwing this message then it is some thing to do with lock/ access
regards
Gayathri
It should be possible to recreate the descriptor files.
However, in order to see whether the virtual disks have been thin, or thick provisioned, please run ls -lisa > files.txtand attach the files.txt to a reply post.
Also let us know how much free disk space you currently have on that datastore, to determine if it is save to delete the snasphots.
André