Don't Panic! (or when things go wrong...)
During package updates
%pre, %post, %preun, %postun errors
%pre, %post, %preun, and %postun are scriplets used by rpm: %pre and %post run pre and post install; %preun and %postun run pre and post uninstall. These error occur when the corresponding scriplet fails.
| Investigating scriptlet errors |
Running rpm with -vv for very verbose might give us more clues as to what is going on. |
| fix |
To uninstall a package safely you need to reinstall it after yanking it out with --nosccripts, then try to uninstall it with yum. It also doesn't hurt to update everything else, then try updating the problem package again and see if you still get the error. See also RemovingPackages |
-
- Example
-
[root@ahat ~]# rpm -e --nodeps --allmatches avahi-0.6.17-1.fc7.i386
error: %postun(avahi-0.6.17-1.fc7.i386) scriptlet failed, exit status 1
[root@ahat ~]# rpm -e --noscripts avahi-0.6.17-1.fc7
[root@ahat ~]# yum install avahi
transaction errors
possible fix: yum clean all
and if that doesn't work try: rpm -e --nodeps --allmatches < packagename > |
During the update of the curl package (using yum) I experienced this error. Googling the error provided the possible fix listed.
See also RemovingPackages |
-
- Example
-
Transaction Check Error:
file /usr/lib/libcurl.so.4.1.0 from install of libcurl-7.18.2-6.fc9.i386 conflicts with file from package curl-7.18.2-6.fc8.i386
Error Summary
-------------
[root@ahat ~]# rpm -e --nodeps --allmatches curl-7.18.2-6.fc8.i386
Regaining access to your Server
[edit]
Ok, so you have been following all of our security guidelines (
ManagingSecurity) and you have locked yourself out of your server. Now what?
- Reboot your server, hitting the magic key (varies, depending on your BIOS, use Google to find out what it is) to enter the bootserver (in our case, grub) menu. Edit the line for the kernel you want to boot (you almost always will have more than one kernel entry - choose the most recent.)
- If you have password-protected grub, you'll need to enter the password. If you forgot the password, you'll have to open your server and use jumpers to reset the password.
- The line you need to edit is the line that starts
kernel /vmlinuz. Add a 1 to the end of the line to boot you into superuser mode. Save your changes, and hit b to finish booting your server.
- Fix the problem, then reboot again to restore normal conditions. Tada!
[back to top]
Server Recovery - Help! My server won't boot!
[edit]
If your server won't boot, you need to gather all evidence of its current dysfunctional behaviour before proceeding. It is important to explore what kind of error you are dealing with, software or hardware.
Stay calm, and take your time to think about the problem and any possible solutions you might pursue. Avoid trying something that is irreversible unless you have really thought about all the alternatives.
Hardware Error
Generally, as long as the hard drive(s) are still functioning, you can either replace the malfunctioning system component or move the drive to a functioning system. If the hard drive itself is malfunctioning, then at least you have your backups, right? If you don't have backups, are really desperate and willing to spend some money ($100's per drive), there are data recovery services available.
Software Error
Listed below are various situations where we experienced software errors that prevented our servers from booting up normally.
Post System Update/Upgrade - Missing core system software
Background info: I just finished updating/upgrading all the packages on of our servers. I also took the liberty of removing various packages I felt were just bloat we didn't need. However, I used
rpm --erase --nodeps PACKAGENAME for many of them, disregarding the dependencies. Even though I took extreme care not to delete any clearly system critical packages, I deleted one too many packages that some system critical packages depended on. Woe is me...
The system on startup started hanging in certain places. Eventually you got a hopeless login prompt if you waited about 4 minutes: after you typed in your username, you would just see the login prompt again - with no way to log in.
Next, we tried to interactively interface with the boot up procedure, but pressing "I" at the right time wasn't working. Error messages were clearly indicating that some core services were malfunctioning (i.e. mount).
Next, we tried to interact with the bootloader. We wanted to edit the options the kernel arguments to load up in a different run level, or to boot up into the recovery mode, for example. Mainly we were trying to gain some kind of access to the hard disk. Look in your /etc/init.d directory. These are the daemons that are available to run at different run levels. On your system you can type:
chkconfig --list | grep ":on"
to see what daemons will run at various run levels (this shows you daemons that are set to run for at least one run level). Here is an example from one of our machines...
[rsajan@maja init.d]$ chkconfig --list | grep ":on"
acpid 0:off 1:off 2:off 3:on 4:on 5:on 6:off
auditd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
cpuspeed 0:off 1:on 2:on 3:on 4:on 5:on 6:off
crond 0:off 1:off 2:on 3:on 4:on 5:on 6:off
denyhosts 0:off 1:off 2:on 3:on 4:on 5:on 6:off
haldaemon 0:off 1:off 2:on 3:on 4:on 5:on 6:off
httpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off
irqbalance 0:off 1:off 2:off 3:on 4:on 5:on 6:off
kudzu 0:off 1:off 2:off 3:on 4:on 5:on 6:off
mdmonitor 0:off 1:off 2:on 3:on 4:on 5:on 6:off
messagebus 0:off 1:off 2:off 3:on 4:on 5:on 6:off
network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
postfix 0:off 1:off 2:on 3:on 4:on 5:on 6:off
postgresql 0:off 1:off 2:on 3:on 4:on 5:on 6:off
restorecond 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rsyslog 0:off 1:off 2:on 3:on 4:on 5:on 6:off
sshd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
udev-post 0:off 1:off 2:off 3:on 4:on 5:on 6:off
yum-updatesd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Now peruse the listing above, and notice for each different run level how no services are listed at run level 0, only cpuspeed at run level 1, more at run level 2, and all that can be set to on for run levels 3, 4, and 5.
Here you see some short descriptions of the various run levels:
| runlevel |
description |
| 0 |
shut down the system |
| 1 |
single-user mode |
| 2 |
local multiuser with networking, but without network service (like NFS) |
| 3 |
full multiuser with networking, console login only |
| 4 |
not used / user defined |
| 5 |
full multiuser with networking, console login and X Windows GUI |
| 6 |
reboot the system |
We were trying to set run level 1 to enter single user mode (no users, minimal setup).
Alas, when we tried to alter any settings, they weren't being reflected. There seemed to be no way to write changes to the disk. At this point it became clear that we need to boot in an alternate manner, not via the hard drive. We needed to gain access to the hard drive. We didn't know if we were missing one strategic package, or a host of them.
But we did write down the mount point of the disk (where it shows up in the
/dev hierarchy, i.e.
/dev/sda ) listed as (this may be different on your machine):
/dev/VolGroup00/LogVol00
/VolGroup00/LogVol00 indicates that our system is using LVM (Logical Volume Management). This basically allows you to partition the drives in a more flexible manner. It is beyond the scope of this discussion, but you can read more about it if you like at:
[edit resources]
So one possible fix at this point is to make a Fedora Live installation CD. Pop it into the CD drive if you have one, and boot off of the install CD. This is the path we followed and which we will describe hereinafter...
Once you load up via the Live CD, you will be placed in X windows. Memory is used to provide some amount of writing capability, limited to some portion of the available RAM on your system.
How do we access the hard drive? At this point we have loaded Fedora Core 10 via Live CD. The hard drive was not readily visible, so we had to perform a mount operation. It was a while since we did this, so we read man pages on
mount and
lvm.
We created
/ahathome as the directory we mounted the hard drive at.
fstab, the file system table, tells you how things are mounted. We could tell that it could see the hard drive by going into the
/dev directory and we saw an entry for
/VolGroup00/LogVol00, so at this point we were pretty sure it wasn't a hardware problem.
Note that the format for using
mount is as follows:
mount [-fnrsvw] [-t vfstype] [-o options] device dir
We then ran
mount with
-f to test the mount command:
mount -nvf -t ext3 -o defaults /dev/VolGroup00/LogVol00 /ahathome
-f |
causes everything to be done except for the actual system call...this ‘‘fakes’’ mounting the file system. |
So this is a good way to see what it will do, before you do it for real.
The output form the above
mount command was acceptable, so we then ran the command:
mount -nv -t ext3 -o defaults /dev/VolGroup00/LogVol00 /ahathome
-n |
mount without writing to /etc/mtab |
-v |
verbose mode |
-t |
allows you to indicate the file system type |
ext3 |
the file system type on our system. We saw this originally listed in the output from the bootloader for this hard drive device. |
-o |
allows you to specify options |
defaults |
the default options which are: rw, suid, dev, auto, nouser, and async (see man mount for more info on these). |
Now we have access to our drive. Awesome.
But now for yum. We want to reinstall packages that are missing with yum. But yum was using config files from the CD. We read the man pages on yum and yum shell, and used yum -c to state which config file to use. But it was using the wrong rpm and the wrong db and producing a ton of errors.
At this point we were faced with trying to painstakingly reinstalling all the missing holes or just grab the data we need off the drive, and install a fresh new build. We choose to reinstall.
If you have a similar experience and you were able to reinstall all the packages with yum, please feel free to share your story.

We will try this experiment again sometime, deleting fewer packages this time, and see if we can reinstall the missing links, thus avoiding a complete reinstall of linux.
Post Fedora Upgrade (8 to 9) - boot error hang
We tried to resuscitate a non-booting server via a Fedora 9 Live CD. The server would begin to boot up, but very quickly it would just appear to hang. At the bottom of the screen a single inauspicious line was displayed:
BUG int 14 CR2 ffb41000
googling this error provided some additional information that this error was a particular bug specific to the kernel that was included in the Fedora Live CD that was incompatible with a few ASUS motherboards. The bug was fixed in subsequent kernels.
The fix? We downloaded a Fedora 10 Live CD and the system booted up normally.
[back to top]
[see also UpgradingFedoraWithYum]
Notes
:
,
: