Best practices for pocket research and development labs in informatics. Integrates agile industrial practices with ethnomethodology, interaction design, sense-making and more.
Guests are welcome to view our materials. To subscribe, edit, view raw markup, etc., you'll need to register for an account. Accounts are free (and will always be free) - your involvement helps us directly and indirectly (by demonstrating that our work matters to our funders...) StartingPoints has more info.
ServerAdminGuidelines is an all-in-one presentation of the topics covered in the LabPrimer project. The topics covered here can also be accessed individually from the links on the left navigation bar.
Please feel free to join our effort and contribute your thoughts, ideas, experience and questions. See the Content Model topic (below) for an explanation of how we organize topics. Anyone can register for an account and get involved.
[edit]
Establishing and maintaining a successful academic computing research lab requires a different infrastructure design than those used for industrial computing research labs or academic computing teaching labs. The approach that has worked for us is an agile approach best described as managed chaos. Our infrastructure design prioritizes:
interests of lab personnel
flexible use of lab resources sufficient to support a dizzying array of projects
interests of university IT
a sense-making process about the computing infrastructure, status of various projects, and project methodologies
efficient management of a core architecture
interests of other university stakeholders
(Note that other aspects of how a lab is managed may have other priorities - these are just the priorities for infrastructure design.)
I think the best way to explain how these priorities affect the design is to point out where they come into play in the particulars. Once I've done that, I'll collect those notes and come back here and write up a summary of each.
Laziness, Impatience, Hubris, Diligence, Patience and Humility
[edit]
One of the most difficult balancing acts for an academic researcher running a research lab is striking the right balance between (1) getting things done, (2) professional development, and (3) materials development. In Diligence, Patience, and Humility, Larry Wall expands his well-known discussion of the three virtues of a programmer to three virtues of an agile (my word, not his) community. The key is, as Larry says, to recognize that these seemly opposing needs can actually be met at the same time.
Tools like this wiki are a crucial part of the process. Using them in an agile manner is another crucial part of the process. Letting go of the need to be perfect is yet another - you have to develop the willingness to put interim versions of your work out there for others to see. And be ready for some people to not understand and be fairly nasty about it.
Once you develop the willingness, then everything else follows. Instead of just doing things the way you already know how to do them, always take some time to explore alternatives. Track however far you get on those alternatives in the wiki. When you get a chance, you can keep going. In the meantime, someone else may pick up where you left off. No matter what, folks who are newer in the lab (or in another lab somewhere else) will learn from your notes and thought process. Everybody wins. You loose the vast majority of the benefit of exploring alternatives, though, if you don't put your explorations out somewhere public, because computing systems move too fast for the individual 30 minutes to 3 hour chunks of time you get to add up without community. -- HilaryHolz - 10 May 2008
back to top
Work and play well with others
[edit]
Our labs live in a larger university environment. What's more, we tend to comprise a very small portion of that university computing environment. Nonetheless, so-called 'pocket labs' have historically been responsible for many of the worst university computing catastrophes.
Our relationship with university IT is truly collaborative (Hilary sits on the UIT committee.) I (Hilary) believe that we've been able to evolve that relationship based on mutual respect and an excellent lab security record.
In general, the lab follows the Center for Internet Security Standards. Details in various places. -- HilaryHolz - 10 May 2008
back to top
Less is more
[edit]
It's tempting to think that a lab machine is just a research machine, so it isn't really critical to worry about issues of security, optimizing performance, etc. However, lab machines are favorite targets of hackers precisely because that mindset is so common. Leave aside concerns about loss of your precious work for a moment. You would be astonished what portion of a machine's resources are wasted simply by fending off attacks using the default profile of a distribution - that is what a denial of service attack is, after all. Also, you'll also be (pleasantly) astonished at the speedup resulting from disabling (or even better, removing) all the bloatware that even linux now seems to think we all just cannot do without. (Do I really need 4 or 5 competing solutions for wireless connectivity on my server, which has no wireless card? Fellas, I'll download it if/when I need it...) -- HilaryHolz - 10 May 2008
Hanging on to all that extra stuff that you aren't using wastes your resources in every sense of the word
it slows down the machine, especially daemons, which enlarge process tables, virtual memory, etc
it makes it harder to maintain the systems you do value (dependency chains, more packages to track, ...)
they provide avenues for attack to hackers
they are additional points of failure on the system
when you upgrade, you have to re-evaluate the systems, because you tend to forget them
[edit]
This topic presents our evolving ContentModel, integrated with a number of best practice recommendations. The content model is grounded in Sense-Making. The current presentation reflects the Sense-Making foundation and the fact that I haven't had a chance to rewrite this description.
How we got here (methodology)
Initially, we reviewed the existing pages in our TWiki as well as several other TWikis, held discussions among our existing membership, reviewed our own journal entries, and proposed a simpler model (which can be viewed in earlier revisions of this topic.)
After living with that model for a while, we conducted some contextual interviews with members, reviewed entries from the initial content model, and refined the model to this model. One thing that is clear is that those community members who have been most aware of the earlier model have been most successful in their use of the TWiki. AhatSkin is still fairly young at this time, but we hope that it will help in promoting visibility of the model.
Joining theory and practice (MetaTheory)
Authorship (Ideology)
Authorship of topics lies on a spectrum ranging from
community information, e.g., the pocket lab Best Practices. Community information is info that is intended to have shared authorship, although it is often hard to get there
to
shared information, in which the locus of control remains largely with the person who initiates the conversation, e.g., JennysRandomThoughts.
In a number of cases, a topic or discussion will start out as collaborative sense-making of information with shared info and become community info over time. This topic is such a case in point.
Types of pages (Epistemology)
content - a page that focuses on a single topic.
If the topic is a community topic, then it's a really good idea to refactor the topic any time the amount of information exceeds a monitor's worth of information. For example, this topic really needs refactoring! While some folks happily read on past the initial screen, the tendency of a large chunk of the menus-and-mice world to miss any information 'below the fold' was established long before the web existed.
This rule is clearly less important for information that is being shared, but remains primarily with the author. Still, it is a good thought to keep in mind - using a generated table of contents would be a good idea, for example.
organizational - pages that exist primarily to present relationships between other pages, whether those are content pages or other organizational pages.
In Rob's first pass at writing this topic, he said that these pages, which he called 'indices' were 'intelligently arranged' to 'foster easy and efficient access of information in linked to content pages.' The problem with that explanation is that, since the arranger has not encountered the contextualized information need of the page user, search could equally well meet that criterion, and many students immediately head off to Google for just that reason.
Thus an arranged index simply isn't enough. Don't bother. What else can you do? Explain your arrangement. I know that seems odd, and like it would not make a difference, but it does. Try it for yourself - look at pages of links that you find useful, blogs that you find useful and ones you don't. See what the style differences are. What's going on? Well, the arranger (that's you) does not have access to the contextualized information need of the page user (that's the person looking at your organizational page) when choosing which relationships to present between the information present in the content, let alone how to present those relationships. However, if the user has access to the contextualized information need the arranger had in mind when constructing their arrangement, the user has a fighting chance of mapping their own needs to the arrangement presented. That's why annotated bibliographies, particularly authentic personalized annotated bibliographies, are so useful. Make sense? I'm planning on getting all my materials from my research methods course up on this TWiki this summer ('08). Also, hopefully, we can get some of the Sense Making specialists to come play with us.
Finally, don't forget that this is hypertext! You can link info from multiple pages, indices, etc., We tried linking some stuff from multiple palettes? though, and that was not a good idea.
search - search pages are useful, too, especially in the context of a good interaction design, content model, information architecture, etc. The better the rest of the site is, the more useful search is in my experience, because search not only finds what you are looking for, but gives you insights into the structure of the site as a whole. Also, search is far more useful for an expert who understands how people organize info than for someone who always falls back on search. My students are always astonished at how quickly I find things. Heck, guys, you should meet my Dad He's amazing! And he's a retired lawyer....
Designing Information over time (aka Digital Inclusion aka Ontology)
[... add a quick explanation of how digital inclusion relates to issues in access to understanding agile methods and the free/libre software technologies used in TWiki ...]
some specific refactoring recommendations
content pages - we now have template topics to help with ease of managing leaf, node and roots of related materials. Feel free to improve, modify, etc., from sense-making, interaction, content, information architecture, look-and-feel, etc., points of view.
organizational pages - so, how do you get access to users' contextualized information needs, arranger? Start by looking at your own motivation to make that page and being clear, open and honest about it. Next, come up with something that will benefit your users right now, today. Be willing to cede a certain amount of control in return for them working with you. Finally, listen. They will tell you what they need. In turn, tell them what you understood them to say while you are creating your organizational page. It's a process, not an end product. Fundamentally, though, there's nothing wrong with multiple organizational pages, so long as they stay maintained.
[edit]having problems? see DontPanic need to RtFM? We can help
The Ahat lab, which hosts this wiki, uses a rich set of platforms. Our professional work, however, tends towards a blend of Unixes. The big servers are all on Fedora, mostly on core 9. We tend towards bleeding edge webserver/language/interaction research and development rather than Unix kernel development itself, so walking the knife-edge of being on the most recent truly stable Fedora core (i.e., lagging one "production" core behind) has given us the best trade-off of bleeding edge agility vs. bleeding edge instability. We've played with Debian (and some others), but find ourselves back with Fedora. It's not perfect, but it's home...
Personally, many of us use Darwin, Apple's BSD-style Unix OS, as Mac laptops are the laptop of choice (for those with the funding to afford them.) Folks with windows laptops tend to start by installing Ubuntu on their laptops. As a starter Linux, Ubuntu is the clear current winner, and it's an easy knowledge migration from Ubuntu to Fedora. When we need to test things on Windows, we tend to rely on VirtualizatioN. It's a digital inclusion thing; we find that the payoff from putting our limited resources into FOSS has a better return.
Fedora is the free software project companion to Red Hat Linux, Enterprise edition. The information here tends towards fedora, largely due to the tools available for managing servers for scientific programming. We would welcome a companion set of materials about how to achieve the same or similar effects with other platforms.
Installing Fedora
[edit]
There are a rich variety of ways to install Fedora (as well as many other distributions of Linux.) Essentially they all break down to the following steps:
Get the files needed in a bootable configuration
Prepare your system for installation
Boot your computer (see step 1) and install
Reboot your computer and configure.
Fedora's New Users Chapter of the install guide does a pretty good job of discussing the options and when you would use which. We discuss installing from a live CD in detail, simply because we captured lots of notes on the topic since we started keeping this primer. As we progress we promise to expand our set of pocket lab-oriented info.
Installing Fedora 10 via Live CD
[edit]
One of the ways to install some Linux distributions is via a live CD. A live CD is a bootable CD from which you can install the distribution. Why install from a live CD? It comes with less crud. But it is harder to install. See Fedora install guide for f10.
Download the Fedora 10 Live CD from the fedora website. Burned the iso file to a CD. Check the sha1.
Pop in the CD. When you are presented a message similar to "Loading in 9 seconds" and the numbers count down, press any key and select Verify and Boot to verify the CD and then boot from it.
Let it boot up. When you are in X windows, double-click on the icon Install to Hard Drive to start the Fedora Installer.
partitioning the hard drives & installing fedora:
select remove all partitions on selected drives and create default layout, and click at the bottom to choose Review and modify partitioning layout for a custom partition set up. The UI is hard to work with, so we actually had to do this twice.
why should you partition by hand? there are many good reasons, including increased security, building a more robust system, and a system that can handle new fedora installations without losing or needing to reinstall /home or /var directories, for example.
look in the CIS documents to see recommendations for partitions and their respective sizes
the CIS recommendations may be overkill for some. On the installation we ran during this write-up we choose to set partitions for:
boot
100MB
not part of the volume group
/
16512MB
swap
2048MB
this one actually isnt' called swp, it's the one without the mount point
2x the size of your memory (if total memory is <2GB)
1x the size of your memory for memory above 2GB
example: if your total is 2GB, recommendation is 4GB swap; if your total memory is 3GB, recommendation is 5GB swap.
note: depending on what you are running on your server, these are mainly guidelines. You may want more or less.
when you are ready, click on edit, and shrink down the boot to 100MB.
click on VolGroup00 and click on edit, and a new window will pop up
edit the big one ... take off the final digit to make room for the new partitions
then add and adjust the numbers as you need them for your system setup
when ready you will be able to begin the formatting process, and fedora will reflect formating on each partition.
when the formatting is complete, you will see a window with...
a checked checkbox "Install boot loader on /dev/sda"
an unchecked checkbox "Use boot loader password"
a menu "Boot Loader operating system list" and on entry that is checked "Fedora /dev/VolGroup00/LogVol00". This may be a little confusing because you haven't installed the OS yet, but this is the default, and if you try to delete it, the installer will inform you. We left this menu as we have described it here, and clicked next.
the fedora installer will begin to install the OS. This can take a while. Don't panic if the screen saver goes on and you want to go back in X and windows look like they've frozen. Just give it some time.
you will be prompted when it is done, and you will need to reboot or shutdown. You can then eject the CD.
post installation:
turn the computer back on if you shut it down
after booting up, fedora will make you go through some setup screens
[edit]
Software is distributed on Linux distributions in packages managed by tools appropriately called package managers. Once upon a time those package managers were also used to distribute the packages (ask an old-timer about urpmi some day and enjoy the rant), but now we have tools to manage distributing packages rather than the packages themselves sometimes called updaters. For example, the Red Hat package manager (RPM) is a free software package manager used by Red Hat Enterprise, Fedora Core, SuSE, and Mandrake distributions, among others. A variety of updaters are available for packages produced by RPM (.rpm files) including yum (command line), pirut (GUI) and MaintainingDistributionsWithPackageKit GUI.
Package managers and updaters play a critical role in CollaborativeToolBuilding. For end-users, these tools automate the process of letting you know when updates of the software you use are available, installing those updates, etc. For developers, these tools serve a much more important role; they help us expose and track the relations between our work, refining and refactoring those as we go.
What's in a package?
Package contents vary a lot by Linux distribution, type of package, and best practices of the community to which the developers belong (not to mention the degree to which the developers adhere to those best practices!) A package includes (in a perfect world) everything needed to install, remove, and upgrade the software in question on the distribution in question. More specifically, any one package may include:
software modules in one of a dizzying array of languages;
header files;
configuration files;
other resource files;
a list of packages on which this package depends (called dependencies);
a variety of scripts run at specific times (triggers) that automate the install, upgrade, uninstall processes.
You can use the package manager used to create a package to inquire into a package at varying levels of detail. For .rpm files, rpm --filesbypackage will tell you what files will be installed by a package, but not what scripts are included in a package.
Maintaining distributions with Yum
[edit]Yum (yellowdog updater modified) is the update layer for distributions using RPM packages (see also PackageManagers for a general discussion of package managers vs. updaters.)
The main configuration file for yum (on fedora) is /etc/yum.conf. To add something to the excludes list (or to start one), see this example YumDotConf. Updates should be run regularly (whether nightly or weekly), although probably not via a cron job.
In general, use yum in upgrade mode rather than update mode, so as not to leave older versions of packages lying about, as this causes real problems. You can also use package-cleanup (see below.)
In addition to yum itself, there are the yum-utils (separate package), and a variety of useful plugins. Make sure you enable plugins in /etc/yum.conf.
yum-utils: includes package-cleanup, e.g., run package-cleanup --orphans to find 'orphaned' packages. An orphaned package is a package installed from an rpm that no repository currently in your repository list has knowledge of. Usually these are older, redundant versions of packages, but not always. Not all orphans are problems, because not all rpms are in repositories.
yum-fastestmirror: sorts the mirror list for the fastest mirrors.
yum-merge-conf: provides --merge-conf command line option,
yum-remove-with-leaves: removes any unused dependencies brought in by an install 'but not normally removed' (I guess someone was sure that you would want them?) Keeps your system cleaner!
yum-skip-broken: provides --skip-broken command line option.
yum-upgrade-helper: "allows yum to erase specific packages on install/update based on an additional metadata file in repositories. It is used to simplify distribution upgrade hangups."
yum-versionlock: Lets you specify packages as 'locked' - protected from upgrade. Can be used to help yum play well with other package managers.
[edit]PackageKit is a cross-distribution, cross-architecture updater. There's a yum interface to PackageKit and a command line tool called pkcon so you don't have to dork around learning yet another slow, transient GUI tool. At the date of this update (-- HilaryHolz - 29 Mar 2009), although the situation had improved over the last six months, PackageKit remains too new (version 0.3.14) and sketchy to be worth the work in all:
but the most polyglot environments
by the most intrepid administrators
and getting to the actual meaningful documentation (PackageKit Reference Manual) took a significant amount of work which really concerns me for the future viability of the project. Buy a clue, people, transparency, otherwise known as the ability to look under the hood, is make or break for FOSS projects.
I really do like the design, and I will say that I saw a distinct improvement in transparency.
[back to top]
Upgrading Fedora With Yum
[edit]
First, a quick comment on why? Why do a major (core) upgrade of your workstations and/or servers with yum (or analog), rather than starting with a clean install each time?
Gee, I would have thought that was obvious, in all honesty. We've got our servers configured and tuned perfectly for our needs. A major upgrade comes out. That upgrade is, of necessity, one-size-fits-all. Telling us to start all over is, in essence, telling us that someone (or some team) who has never met us, never seen our situation, knows literally nothing about our particular needs knows better than us. To get back to where we were will take some amount of looking through what has changed. I would much, much rather assume that we know what we need, and phase in the new configuration, than assume that someone else knows what we need and load a whole bunch of what is almost surely dancing bear ware on our servers. I am more than happy to serve as a beta deployment platform for certain select packages, but not for all and sundry, including gnuchess (yes, that's GNU Chess). So I'm conservative. I install the base package and the packages we know we need, and leave out the rest until they come to my attention.
Some general comments:
Read the release notes! Yeah, yeah, yeah, I know, I just took those cheap shots at the folks who developed the release, so why start with this? Listen, this is lovely stuff, and you should always approach new software by trying to understand what the folks who wrote it had in mind. You are installing it on your precious boxes, yes? Trusting your precious work to it, yes? Still think that taking a few minutes to skim through a document that distills the major changes and highlights, nifty new stuff and deprecated (on the chopping block and in danger of disappearing in the future if people don't speak up for it) features is that unreasonable? Glad we had this talk
use yum upgrade whenever possible, not yum update, so that you don't accumulate cruft
These instructions assume that you are at least at FC5 (fedora core 5). If you aren't at FC5, you won't have package-cleanup (bummer). See Brandon Hutchinson's Notes to claw your way to FC5, then follow these instructions
Read RemovingPackages before resorting to rpm --nodeps to remove any packages!
For the upgrade from Fedora 8 to Fedora 9, make sure to add the fedora-updates-newkey.repo file to /etc/yum.repos.d
In addition, there's an additional bug in dircolors in Fedora 9 not fixed as of August 5, 2008, which can be fixed by commenting out the line in /etc/DIR_COLORS that starts with CAPABILITY
please add any additional notes to this list!
if you are relying on 3rd party repositories (livna, adobe, etc.), make sure that they are ready for the upgrade, or the process may fail.
What about backups? Manual backups are a fairy tale told to excuse one of the most inexcusable oversights in design history; namely that all filesystems don't come with versioning and mirroring built in. Only do live upgrades on systems with mirrored, versioned file systems or development systems with disposable data.
What about PreUpgrade? Good question. Who knows? Our info is oriented towards servers on which:
we do not run any GUI packages, and in an environment in which
we need to be very cost-sensitive, so we tend not to use GUI tools for systems administration (duh.) PreUpgrade has what seems to be a purely gratuitous GUI interface (it appears to piggy-back in some way on anaconda, so it may have seemed natural at the time, but our rule is to avoid GUI whenever possible for sys admin, and always offer a text alternative. So PreUpgrade is thus far irrelevant. Too bad, as it looked useful.
Prep your system.
Review, consolidate information, and remove all .rpmsave and .rpmnew files before and after upgrading. This step is a major part of the benefit you get from doing this process, as it gets you to look at systems you might otherwise not spend time with. Modify the following script for your use to review the changes captured by the .rpmsave and .rpmnew files, then delete them (you may want to look in places other than /etc and /var, for example.)
for a in $(find /etc /var -name '*.rpm?*'); do diff -u $a ${a%.rpm?*}; done
Take a few moments (I do this until I can't stand it anymore...) to review all the packages that have accumulated on your system and work on RemovingPackages
Clean out the local cache that yum uses:
yum clean all
Switch repositories:
install the packages (rpms) that tell fedora which release you want. These packages are architecture indepedent, so you can get both the release and the release notes by using a wildcard, for example, for Fedora 9:
Use runlevel 3 If you are not already working at runlevel 3 (and all servers should live at runlevel 3, see the Center for Internet Security Benchmark for one set of reasons why, and DesignPhilosophy for another), now is the time to switch to runlevel 3. At runlevel 3, networking and user accounts are fully enabled, but x windows (and hence all GUI stuff) is not.
If you are running a graphical desktop environment, log out of it.
Switch to a text console
ctrl + alt + F1
log in as root
Go to runlevel 3
telinit 3
Do the upgrade: yum, kernel & Base You could do the entire upgrade all at once, but that's very dangerous, as you will trash your system if it dies in the middle. I do it in stages, as follows:
Upgrade yum and all its bits 'n pieces
yum upgrade "yum*"
This is also a good time to review and install any nifty new yum plugins, such as the yum-upgrade-helper
Upgrade the kernel
yum upgrade kernel
if you get any 'missing dependencies' errors, read this excellent post so that you really understand how dependencies work and what these errors mean. Then you'll not only be able to confidently untangle and rebuild the relevant parts of your system, but you'll develop your understanding of package managers, which is a very important free software concept.
Use the Base group to design the base of your system
yum groupupdate Base yum upgrade <base-packages-you-want> yum install <new-base-packages-you-want>
Unfortunately, fedora throws all sorts of stuff into the base that many of us neither want nor need. Review the packages in the Base group to see if you want them, otherwise you'll just be introducing a huge headache.
Do the upgrade: chunks Now upgrade the rest of the system in bite-sized chunks. Groups are one good way to do that if you have defined your own, upgrading by wildcard by letter is another.
yum grouplist
will get you a list of installed and available groups. Just be careful how many you upgrade at a time. A couple hundred packages at a time is ok, but 800 is playing without a safety net. Questions: 1. I received errors during installation of some package(s) during a groupupdate. Is this a big deal? Discussion 2. When you do yum groupupdate GROUPNAME on a group that is listed as installed via grouplist and you see only a couple or few packages installed, shouldn't you install those packages by hand to avoid package installation bloat (i.e. installing more packages than we really need, thus causing a mess again?) Discussion
yum groupinfo GROUPNAME
Provides description and list of mandatory packages, default packages, optional packages, and conditional packages.
yum groupupdate GROUPNAME
Upgrade a group
yum groupremove GROUPNAME
groupremove will remove an entire group of packages and any dependencies. See also RemovingPackages.
Do the upgrade: package stragglers Finally, we upgrade any leftover packages. By stragglers we mean any packages that weren't upgraded by the groupupdate procedure above. Upgrade your stragglers as follows:
Let's take a look at what stragglers are left
yum check-update
this will show you which packages you have installed that can be updated/upgraded. These should be packages not part of a group, since you just upgraded the groups in the previous steps. You may be surprised to see how many packages are not upgraded by the groupupdate procedure discussed above.
Get info on a package
yum info PACKAGENAME
If that doesn't give you enough info, follow the link to the package homepage.
Upgrade a package
yum upgrade PACKAGENAME
Removing packages you don't want with yum
yum remove PACKAGENAME or yum erase PACKAGENAME
remove will remove a package and any dependencies. Take a very careful look at the dependencies listed. See also RemovingPackages.
Clean up At some point, you need to research and dispose of orphans. RemovingPackages discusses orphans and what to do about them.
Restarting the server. Wait! Did you create any new files, logs, etc. during the upgrade procedure that you would like to backup? When you are ready, restart the server
shutdown -r now
If your server starts up fine, pat yourself on the back and take a bow! If your server doesn't start up correctly, DontPanic. Grab your favorite towel, take a deep breath, and take a look at DontPanic.ServerRecovery.
Remember where we started - you are trusting your work, your time, your precious code/homework/recipes/finances/etc to this box. All of us who know these systems well got comfortable with them not by being born with the knowledge, nor on the job, nor (unfortunately) by going to university classes, but by learning in stages with and from each other, when we had some need.
See also the Fedora Live Upgrade Special Interest Group, the Yum project wiki and our own YuM topic in the new Free / Open Source Software Web.
[back to top]
Removing Extraneous Packages Will Keep Your Server Happy and Healthy!
[edit]
In the natural course of things, linux servers accumulate both orphans (packages installed from rpms that are not associated with any current repository) and cruft (packages that are not relevant to the mission of the server.) Periodically reviewing and, in most cases, removing, such packages is a good thing (tm). Seriously, though, a review will:
improve system performance and response times (e.g., removes unneeded daemons, conserves system resources, ...)
save disk space
result in fewer and faster updates
improve system stability (problems with cruft won't affect you)
Orphans
What is an Orphan?
Technically, an orphan (yum calls these extras) is a package installed from an rpm that is not associated with any current repository.
What should I do with Orphans?
That depends on the type of orphan:
Packages initially installed from a repository (look at the rpm and yum installation logs, usually located in /var/log). These packages have effectively become orphaned, hence the name. You should research what has happened with the project that is maintaining the package.
Sometimes there's a different repository from which you can now get the rpm. If so, add the new repository to your yum configuration. If the package is central to your mission, however, you will probably benefit from learning more about what is going on with the project.
Sometimes the package has become deprecated in favor of another package, or included in another package. Continuing to run your production environment using the orphaned package is probably fine for the moment, however, migrating to the newer package should become your top priority.
Packages installed from an rpm which did not come from a repository. These packages are effectively extras, hence the name used by yum. Some projects have gotten to the point of providing an rpm but not to the point of providing a repository, and the rpm is not available from any major repository yet (did you check livna?) a It's worth taking a few minutes to check to see if the project has started to provide a repository, or if the rpm has gotten picked up by one of the major repository servers. That way, you'll get any updates as they are scheduled. a Failing that, you might consider providing your own internal repository for your lab and any collaborators. It will greatly ease working with VirtualizatioN, etc. a As a last resort, make sure that any true extras are well documented, as well as the results of the previous steps, so that the next time any members of your team conduct this review, they can pick up where you left off.
Finding Orphans
Depending on which version of which Linux distribution you are using, you may have one or more of the following tools available.
yum list extras
list the packages installed on the system that are not available in any configured yum repository.
package-cleanup --orphans
(package-cleanup is in package yum-utils which you may need to install.) Sometimes, especially when using multiple repositories, a package will get stuck and won't upgrade. Then you'll need to delete the version from the old core and install the version from the new core. You can set showdupesfromrepos to 1 in your yum.conf file temporarily to help with this, especially as just because a package has an old core number in its name does not mean it is not part of the current core (argh).
yum list installed "*fc#*"
especially when doing live upgrades. Replace fc# with the correct value for the old core, e.g., fc8 when upgrading to core 9.) Conversely, not every orphaned package is detected by package-cleanup, so you need both commands we list to catch all orphans.
Cruft
Cruft, like orphans, comes in several flavors. Many kinds of cruft can actually be prevented from accumulating by careful configuration of yum.
Preventing Cruft
Configuring yum to prevent cruft (would love to have similar info contributed for apt-get, etc...)
obsoletes=1 will prevent obsolete packages from accumulating on your system. If you need to hunt down existing obsolete packages, yum list obsoletes will find them.
multilib_policy=best will install only the best option for 64 bit machines.
plugins=1 turns on a variety of plugins that we also use to tune how yum behaves
we've tried working with the exclude list, but have found that module authors need the flexibility with the dependencies, and it generally works better to let things install and then pull them back out, rather than prevent the installation at all.
Leaves
Leaves are packages installed from rpms that are not depended upon by any other rpms. While being a leaf does not make a package cruft, leaf packages are likely candidates to be cruft, particularly if those leaves are libraries. package-cleanup, in yum-utils, is particularly helpful in identifying leaves that are cruft.
Dancing Bear Ware
Review all the packages that have accumulated on your system:
yum list installed
Research the unfamiliar ones. Do you really want those? Do you need them? Do you have a clue what they do? De-cruftify your OS a bit. Snuggle up to it and get to know it, take loving care of it. Don't let anyone tell you that you are unqualified to know what all that stuff is - it's your computer, not theirs.
yum info < packagename >
Provides a minimal set of information about a package, which is often enough to help you decide. If that doesn't give you enough info, there's a link in the provided info to the package homepage. Follow the link and read more. You can also try running yum erase < packagename > and see if the results are terrifying.
Understanding Dependencies
The dependency system is at the heart of the collaborative development of FOSS. PackageManagers provide a relatively easy way for package developers to specify which package(s) their package depends on. The package manager generates dependency chains from this dependency information, resolving dependency issues as it progresses.
Pay attention to dependency information while removing packages, or you may very easily end up with a dead or malfunctioning server. You may have to override dependency info from time to time (see Circular dependencies), so here are some general guidelines:
start by running package-cleanup --problems to identify any existing dependency problems in your local rpm database and fixing any problems encountered.
never, ever override dependencies related to the package manager system itself (rpm, yum, and python, in which the system is implemented.) The exception would be the rare occasion when there's a known problem and a published work-around that is carefully documented as a step-by-step process. You'll know when this situation has occurred because the information will come from official sources.
be very, very wary of overriding any dependencies related to core system functionality such as mount (mounts filesystems!), etc. Again, if it seems like that is what is needed, spend some time googling to see if you are looking at a known issue. If not, you have probably done something wrong earlier in the process and should consult an expert.
Circular dependencies
Circular dependencies occur when package A depends on package B, but package B depends on package A. You can have a longer loop as well (package A depends on Package B depends on package C which depends on package A). You can remove circular dependencies with rpm --erase --nodeps < packagename >, but you must remove all the packages in the circle.
Removing Packages with Yum
yum erase < whateveryoufound >
(don't use yum -y here!). If yum wants to delete all sorts of stuff for dependencies, look carefully. It doesn't hurt to google the package and any error messages you may be getting with yum erase or yum remove before continuing. Sometimes, it's ok. For example, you may be removing orphans, and what it wants to delete may also be orphans. If not, you may want to let yum delete the packages and then re-install, or you may want to fall back to rpm
Removing Packages with RPM
You should only attempt to use rpm to erase packages if yum can not remove the package due to various errors. Do NOT use this command lightly!
rpm --erase --nodeps < whateveryoufound >
This will erase a package while ignoring dependencies. It is NOT recommended to use rpm in this manner, unless you have checked for dependencies (at least with yum) and there are NO dependencies listed or you have adequately researched the dependencies. You are on your own if you get some wild ideas to just yank packages out with no regard for dependencies.
mea culpa...
Yes, I'm guilty of yanking out packages with no regard for dependencies. It's true.
But the internet told me I could do it!?!
There might be situations where using rpm --erase --nodeps < packagename > might be warranted, but beware. I used it wholesale once for an upgrade on various packages that didn't seem related to the core install for a server. Needless to say, the server didn't restart after reboot.
What if I've removed too much?
If you haven't restarted the server yet, you should be able to just reinstall the packages you removed. Please let us know if you went through this experience, and the server did not start.
If you have restarted your server, and it doesn't boot up correctly, you will probably need to boot off of a CD or recovery disk, and then try to reinstall whatever it is you removed. See Server Recovery - Help! My server won't boot! for more info.
[edit]
Perl has its own package managers (see BootstrappingPerl), which should be used in preference to rpm or yum. There are two interfaces (modules) in use right now - a lower layer of abstraction, the CPAN module (MCPAN) and a higher layer of abstraction, the CPANPLUS module (MCPANPLUS) to which the community is migrating, slowly. It's part of a larger migration to 'pure perl' based on a new module abstraction layer (Module::Build). In any case, you should use cpanp for most things, however, for some things you will still need to fall back to perl -MCPAN -e shell.
Some hints about using cpanp.
use 's reconfigure' to configure cpanp to your taste. I like setting the interface to 'classic', which makes it behave a lot like MCPAN, which is what I am used to.
you will need to use 'l Whatever::Module' to check if a module is installed, as 'm Whatever::Module' does not have the cool = marker to show whether a module is installed or not like the listing on MCPAN.
if you are having trouble installing a package, it may be that header files needed to compile the packages are missing from the system. Some linux distributions (redhat/fedora does this) distinguish between production and development versions of packages. If there's an error message that says can't find foobar.h, or no such file foobar.h, install the -devel package that provides foobar.h
[edit]
Security is a huge subject. The information we provide here is based on our experience, and you follow it at your own risk, we make no promises, warranties, etc., implied or otherwise. The Center for Internet Security publishes best practice guidelines tailored to all the major platforms as well as many important services: apache, various SQL database servers, etc. The Internet Storm Center collects and disseminates information about internet security threats on an ongoing basis, as well as providing useful tools for combating a variety of threats.
See also:
Security isn't secure if you won't use it! The initial configuration sequence for our minimally reasonable, responsible, and usable configuration is given here. Note that all the instructions are command-line, as GUI interfaces are much slower to use, and not generally appropriate for servers.
On initial install, your distribution will ask you to create a user account that you will use to administer the machine.
Each administrator needs their own account, as they will customize it to their own working style, and as the security system will log their activity. The user account is important because you should really never log in to your machine as root unless doing disaster recovery or for some other specialized purpose. So go ahead and create the account immediately. Let's call this account rosta (for sentimental reasons.)
You need to add rosta to group wheel.
Get your spiffy new linux box installed and configured per the distribution's instructions, and boot it into the OS. The configuration instructions may already have had you add rosta to wheel, but if not, here goes:
Log in to root (this is one of those times.)
add rosta to group wheel
usermod -a -G wheel rosta
(don't forget to take a minute or two now to read the usermod man page, as per RtFM)
follow the instructions under UseSudo to setup sudo
Using sudo
follow the instructions under SshD to setup sshd on your server and ssh clients for any workstations that will access the server.
[edit]sudo is a system that (1) provides you with a log of everything you do as superuser, and (2) allows finer grained control over who can do what than simply putting trusted users into group wheel. Really, the only time you should ever use the su command to act as root is to setup sudo.
Getting started with sudo
We provide a crash course in setting up sudo , however, don't forget to RtFM as well, as it is your security that is on the line, not ours!!
The configuration file for sudo is /etc/sudoers. You have to use a special command, visudo, to edit /etc/sudoers . You can make visudo use your favorite editor (rather than vi ) by setting the environment variable EDITOR in your shell before invoking visudo , but that's properly the subject of another topic.
Log in to your privileged administrator account (we're calling it rosta for sentimental reasons.) Enter
su
at the prompt, followed by the root password to become the superuser. This is marginally better than logging in to the root account to work on your server...
type visudo at the prompt. An editor will open with the default configuration of /etc/sudoers. Work your way down to the line below the comment that says "Allows people in group wheel to run all commands" (don't choose the one that says "Same thing without a password" - you can do that later, if you decide to configure SshD to disable password logins and opt for certificate or public-key logins instead.)
make a copy of the existing line (much easier IMNSHO when scanning diff reports of configuration files later), then remove the comment # and space at the beginning of the line. Save your work. Log out and back in and check to be sure that you did things correctly by using sudo to acquire superuser privileges:
sudo -s
Most of the time, you'll want to just prefix each command with sudo, as asking for superuser privileges is a dangerous habit.
Better use of sudo
It's really better not to just put folks in wheel and let them have full privileges. One of the wonderful things about sudo is it's ability to confer heightened privileges within a limited scope. So you can give privileges to mount and unmount media, but not other things, or to restart certain servers, for example.
back to top
denyhosts
[edit]denyhosts is a python daemon that detects and thwarts brute-force ssh attacks. The configuration script, by the way, needs to be /etc/denyhosts.conf, but seems to be installed as /etc/denyhosts.cfg in some versions. Just read and follow the comments in the configuration file.
back to top
Ports & iptables
[edit]
On any new server, you'll need to ask to have any ports you want accessible from off campus explicitly entered into the university's firewall setup. Send email to the designated contact (this should really be a link to the appropriate entry in the IT directory, you know..., currently, for us, that's richard.uhler@csueastbayHYRUQAPZ.edu) with the request. You'll need to explain what ports you want open, and why.
One way to learn what we usually have open is to look at a sample IpTables configuration file with lots of comments and figure out what services apply to your server. Always ask for ports 50000 - 50500, as those are the Math and Computer Science research ports - ports on which we are free to take services up and down at will, but are not preallocated to anything in particular. If you reserve a particular port on a particular server for some long term use, you should add that info to the relevant entry in the LabData topic.
In combination with good security practices set out by the CIS guidelines (as well as those discussed in this twiki) such as using DenyHosts, ModSecurity, SpamAssassin, ..., having some externally open ports, even the research ports, has not proven to be a major issue.
Speaking of iptables (a firewall that runs on linux servers), we run iptables on all our machines. Don't open any ports unless you know what/why, even if a program wants you to do so. Ask for help. /etc/services lists all the port numbers with an incredibly terse description of the related service, but it will give you a key word to Google for.
back to top
sshd, the secure shell daemon
[edit]
The ISC guidelines cover most of the important ssh configuration information.
We provide some more detail to help you get going, as well as explain how to prevent remote connections from getting dropped all the time!
Configuring ssh
Password authentication is inherently unsafe, not to mention a pain-in-the-you-know-what. Other forms of authentication, such as public key or certificate authentication may be a much better fit, depending on your lab setup.
The configuration we show here is for public key authentication (no passwords), with PAM (pluggable authentication modules) account and session checks. We show only the sshd_config settings we change from the default settings, and group them a bit differently than the order they appear in the file to make it easier to explain.
# HJH, duh!
PermitRootLogin no
# HJH, per CIS_RHEL_Benchmark
IgnoreUserKnownHosts yes
# HJH, no password authentication
PasswordAuthentication no
# HJH, disable s/key passwords
ChallengeResponseAuthentication no
# HJH, enable PAM session processing
UsePAM yes
# HJH, poll client every 30 secs, so router doesn't
# drop connection. Settings give 25 minutes of inactivity.
ClientAliveInterval 30
ClientAliveCountMax 50
# HJH, as per CIS_RHEL5_Benchmark
Banner /etc/issue.net
Be sure to review the pam related packages to be sure that you have the appropriate ones installed on your system. We'd love to give you a canonical list, but they get re-factored all the time.
Configuring public-key authentication
The specifics vary somewhat according to the specific protocol, but the general outline is:
generate the private/public key pair on the laptop/workstation
upload the public key to the account on the server, and put it in the target account's .ssh directory with the appropriate name
[edit]
Pocket labs just don't fit in to the networking scheme of larger IT. With Cyberinfrastructure all the rage (there's a vaporware term, goes right along with Computational Thinking, another of my favorites), we have a very real, very growing need to come up with better systems administration turnkey solutions to minimize our impact on our host institutions. That's a focus area for me, and one of the reasons I'm leaving academia for industry (because it seems to be a necessary step for me to be able to work on the problem....) In any case, for the moment, here's our older manual setup, with more automated info to follow the moment we have some breathing space to document what we're using:
DHCP and pocket research lab servers
[edit]
DHCP (dynamic host configuration protocol) is a system that helps with efficient and secure management of a network of computers. The networking information is kept on a central server (the DHCP server), and the individual computers run DHC client programs which contact the DHCP servers on startup. This plan works fine when either (a) all the machines are administered by some central organization (the case for computers in classroom laboratories, for example, which are administered by university IT), or (b) the DHCP server and the client computers are fully decentralized (the case with faculty and student laptops, for example.) The system breaks down, however, for pocket research lab servers, which are, by necessity, neither fully integrated into the UIT structure, nor always easily accessible by research staff. For example, in an unplanned university-wide power outage, pocket research lab servers reboot before the DNS servers, end up with incorrect network information, and become unreachable, unless each one has its own UPS. Hardware that provides the ability to remotely power cycle the servers is available, but costs even more than a UPS. A much cheaper answer is to collaborate with UIT on an initial configuration for each server via DHCP, and then convert the server to a static configuration, documenting the relationship to the dynamic configuration meticulously in the relevant configuration files on the server itself. That way, desired changes to the networking setup can be coordinated with UIT, the servers are robust with respect to the vagaries of life in the big city, the local control necessary for research and learning is retained, and we save a bunch of $$$$. A perfect example of a situation in which the fix becomes obvious when you view the mediating artifact as an integrated whole, rather than focusing exclusively on the programmable components.
Speaking of focusing on the programmable components, the situation is further complicated by the fact that most Unix distributions are just sure that you want to run DHCP, and will revert you back to that default configuration at every major upgrade, unless you take explicit action to prevent that from happening. So be sure to add dhclient and
*dhcp* to the excludes list in YumDotConf once you have converted to a static networking setup. In time we'll get more surgical about just what needs to be excluded.
back to top
static networking setup
[edit]
(If you have not read DHCPornotDHCP, read that first.) In addition to erasing the dhclient and various dhcp packages, you should also erase NetworkManager. Manually configure:
/etc/sysconfig/network-scripts/ifcfg-eth0 (for the first ethernet card, possibly additional scripts for additional cards)
/etc/sysconfig/network
/etc/resolv.conf
/etc/hosts
see examples in the serverfiles subversion project.
back to top
. . .
We combine subversion (online book, project home) and SVK (online book, project home) to get a distributed versioned file system that can store all kinds of data, not just software projects.
Subversion cheat sheet/20 minute tutorial: Subversion Basic Work Cycle
trunk is the main development line, branches and tags are used as usual. snapshots are for versions frozen in time of things other than software, e.g., submitted versions of papers complete with relevant data sets, etc.
The repository is backed up (daily incrementals, weekly differentials, monthly fulls)
We no longer backup user directories on acc.csueastbay.edu. Therefore, you should keep files you care about in the versioned system, not on individual servers.
Pocket Labs and Mirrored Repositories
As a research lab, it's important to not have the same files on all the servers in your account, but to have the option to have different files on different machines. It's also important for us to be able to share code on many of our projects. We have separated the abstractions of authentication and authorization from the abstraction of a file system, and are using a state-of-the-art answer to the question of file system. (We'll get back to you about state-of-the-art answers to authentication and authorization. ) We use subversion (online book, project home) + SVK (online book, project home), a distributed authoring and versioning file system, aka mirrored repository, aka DAV_FS. Subversion is the versioned file system (repository), while SVK is the distributed (mirrored) component of that technical mumbo-jumbo.
Because we're a research lab and our needs are less structured and predictable than an academic department or an ISP, you essentially organize your files as you please, checking them out and in to whatever machine you are working on at the moment. Data in the versioned file system is organized (conceptually) into projects. Projects can be shared or private, depending on the authorization settings. What's more you can mirror part or all of a repository on a machine with SVK, so you don't need network access to check your changes in. You can synchronize with the central repository when you get network access. Once you have your own private project, you don't need any further authorization to create subprojects, etc., that are private.
Getting started with Subversion
If you've never used a version control system, you'll want to read (at least) Fundamental Concepts & Basic Usage. Don't panic, though, they are quick reads.
If you have used a version control system such as RCS, or an older distributed authoring systems such as CVS, you can get started with a quick skim of Basic Usage. You're cheating yourself, however, if you don't read more. Subversion is much, much more than simply friendlier CVS or RCS for groups.
When working with group project files and/or server configuration project files, you'll also need to read branching and merging.
Getting started with SVK
The subversion documentation is less mature than the subversion documentation, so the best approach is to go through the subversion docs, then our SVKCheatSheet
Backups
Mirroring provides backup capability as well as version control, so long as all the important information is kept in the mirrored file system. That leaves the question of backing up the repository itself, which subversion and SVK both support.
A mirror case study
The online subversion book gives all sorts of great info about using a readonly mirror with svnsync as a continuous backup of all your data. Actually doing the configuration and setup is left as an exercise for the reader, as it were.
back to top
Managing Server Configuration Files
[edit]
Any server configuration file that is not 'boilerplate' should live in the serverfiles project in the lab's file system. When you want to change a configuration file, check first to see if the configuration file is part of the serverfiles project for the server in question. If no (or adding a new system to an Ahatlab server), please add the file to the appropriate project and update the LabData topic.
When altering configuration files, for each change, comment out the line with the default value, make a copy of the line with the default value immediately below the commented out line and make changes to the copy. Initial and date the change. This process helps a lot with retaining and sharing knowledge - as researchers, we tend to delve into a server or OS topic related to our interests, eventually developing a lot of expertise in an area, then not needing that expertise for weeks, months, or even years. It's frustrating to have to relocate and/or relearn information later. So document things at the moment you do them. The specific process described also optimizes tracking changes with diff reports when doing version upgrades, whether of a specific subsystem or an entire OS (see UpgradingFedoraWithYum.)
Puppet and Cft combine to form a much more agile approach to configuration management.
Using Puppet Configuration Recipes
[edit]Puppet is a declarative language designed to
automate systems administration (duh...)
expose the relationships between systems emphasizing stable configuration data, not the transient fine details involved in each build
document both the stable data and the transient fine details, rendering the outcome reliable and repeatable.
As such, Puppet offers a general protocol layer that can be used to radically stabilize
managing the ever changing set of clients and servers used in a pocket research lab.
Specialized deployment tools such as Capistrano may be used in
combination with Puppet .
A particular set of relationships and files designed to accomplish a particular task is
called a recipe. A recipe may contain supporting files as well as describing sets
of changes appropriate for various types of servers. Very clever. Puppet is written in RuBy, btw.
Puppet uses Facter to profile each host. Facter
is also written in Ruby, and can be extended to include custom 'facts', which Puppet can then
take into account in the design of recipes.
A central server called the PuppetMaster stores and serves templates called manifests.
[back to top]back to top
64 bit systems
[edit]
On a 64bit machine, you occasionally need both the x86_64 packages and the i386 (i.e., 32 bit) packages. As best we currently understand it (folks who are more into this, please pitch in, or at least dig up good references for more reading and link those in), the difficulty arises when you start out needing a package which is a 32 bit package. That package may break if it doesn't have other 32 bit packages to depend on. Yes, of course these things really ought to be interchangeable, but let's remember that this is open source software. There's a real, meaningful trade-off between hard encapsulation and rate of progress. Allowing for greater flexibility encourages a much richer growth pattern. And you aren't paying for this stuff, so cope and install both sets of packages....
Some packages to be aware of specific to 64 bit systems
mcelog - a daemon that collects and decodes Machine Check Exception data on x86-64 machines
microcode_ctl - Tool to update x86/x86-64 CPU microcode
[edit]
Virtualization is an important tool for web 2.0 and science 2.0 research labs. Through virtualization, you can test the effects of your changes on your end users in a rich approximation of their environment in an automated fashion, without destabilizing them. Much like Selenium, but at the next level down into the system.
A variety of commercial products for specific uses now have virtualization built in to them. For lab uses, though, you generally need to use one of the major virtualization platforms.
VMWare
VMWare is one of the oldest of the virtualization platforms. It runs on pretty much any OS, and a student version is available for free (yay, VMWare!)
Xen
Xen started out as the fedora project's virtualization framework, but has since graduated first to a wide variety of Linux distributions, and then to other Unix distributions. Xen uses modified kernels, so is much less resource intensive than VMWare.
Xen's documentation is, shall we say, in flux, as Xen moves from being tightly linked with fedora/Red Hat to being a more widely used system. We offer the following set of resources to help out in the transition...
[edit]
Any Unix distribution will come with perl installed. Hilary has a bunch of perl info she needs to move over here, where it will be much easier to play with and reorg, but until then, here's the critical info...
Bootstrapping Perl
[edit]
We use yum to bootstrap a perl installation on a fedora machine, as follows (these are just brief notes, you are welcome to do this on your own fedora machine, however if doing this for the first time on a lab machine and unfamiliar with the tools, ask for help.)
use yum to install perl, perl-libs, and perl-CPAN (the CPAN module, or MCPAN). You may have to install some dependencies to install these modules.
switch to MCPAN, 'perl -MCPAN -e shell' (not 'sudo perl -MCPAN -e shell'), configure to run make under sudo
use MCPAN to install the CPANPLUS module, or MCPANPLUS
now you should use CPANPLUS as the default package manager for perl modules ('cpanp'). Only use yum when CPANPLUS fails.
It should be possible to use yum-versionlock to get around this better... (edit /etc/yum.conf and add any modules that you installed using yum in step 1 to the exclude list so that yum does not update them, overwriting newer versions installed by cpanp, space delimited. Example exclude line in yum.conf:
=exclude=perl-* httpd*=)
You can still use yum to list, etc., a module that depends on modules managed by cpan or cpanplus using the --disableexcludes option. However, you will not be able to update them using yum because yum has no --nodeps option. You can update them from source or, if the module is not a perl module, you can use rpm --nodeps to update it. (If the module is a perl module, you cannot use rpm --nodeps as it would not update the perl package inventory.)
Juggling multiple package managers is frustrating. mcpan recognizes when another package manager has touched a file, but yum did not as of 3.2.8. I've just upgraded to fedora 9 and am exploring the yum plugins to see what I can do with them.
As above, invoke MCPANPLUS as 'cpanp', not 'sudo cpanp'. This ability of MCPANPLUS is very helpful, as many test suites cannot be effectively run with superuser priviledges. cpanp also has the ability to uninstall modules.
Optional modules you will want to install include: Storable, YAML::Tiny, Bundle::CPANPLUS::Test::Reporter, Crypt::OpenPGP
[back to top]
Perl's package managers: cpan and cpanp
[edit]
Perl has its own package managers (see BootstrappingPerl), which should be used in preference to rpm or yum. There are two interfaces (modules) in use right now - a lower layer of abstraction, the CPAN module (MCPAN) and a higher layer of abstraction, the CPANPLUS module (MCPANPLUS) to which the community is migrating, slowly. It's part of a larger migration to 'pure perl' based on a new module abstraction layer (Module::Build). In any case, you should use cpanp for most things, however, for some things you will still need to fall back to perl -MCPAN -e shell.
Some hints about using cpanp.
use 's reconfigure' to configure cpanp to your taste. I like setting the interface to 'classic', which makes it behave a lot like MCPAN, which is what I am used to.
you will need to use 'l Whatever::Module' to check if a module is installed, as 'm Whatever::Module' does not have the cool = marker to show whether a module is installed or not like the listing on MCPAN.
if you are having trouble installing a package, it may be that header files needed to compile the packages are missing from the system. Some linux distributions (redhat/fedora does this) distinguish between production and development versions of packages. If there's an error message that says can't find foobar.h, or no such file foobar.h, install the -devel package that provides foobar.h
[edit]
We don't provide email accounts (and I recommend against doing so, but pocket lab servers still need to run a variety of mail services to meet the needs of other applications. That will require interfacing with your university (or other provider) mailhosts, as well as thinking through and configuring your own services. As we need to maintain our configuration, we'll add more detail to these notes.
Mailservers (postfix)
[edit]postfix is our prefered mailserver (not sendmail - yuck!). When you install postfix, make sure you not only stop sendmail, but actually remove it from the system and add it to the yum excludes list (YumDotConf). Why use postfix rather than sendmail? The reasons are legion - it's just so much easier, cleaner, safer, more stable. But why trust me? Read for yourself or best of all, try it and see.
[back to top]
[edit]
We use Mailman for our email lists. Mailman is written in Python, and is easy to use. It has built-in web-based archiving via pipermail archives, which are pretty ugly, but has hooks for use with MHonArc. MHonArc is quite nice.
[back to top]
Managing Apache
[edit]
Why Apache rather than other webservers? There are lots of reasons, but I suppose, in the end, it has to do with working on the platform that w3c works on. The same reason that I prefer to use Firefox as my default browser. -- HilaryHolz - 23 May 2008
Designing an Apache install - Danger, Will Robinson!
[edit]
The key to a happy and healthy Apache installation is a good design - that's a hard learned lesson. What's more, for pocket labs, that design will change steadily over time, as Apache continues to develop. What we throw out here is what works quite well for us, but we're very open to collaboration.
Folks, this is a really key area for us right now due to the current thrust towards virtual labs. Now, virtual labs are really useful, but our lab has learned from experience is that anyone working seriously in our areas of Digital Arts and Sciences is going to run into trouble quite quickly trying to use virtual labs for our work. Our work is simply too close to the OS for VMWare to support. Given that we're not an OS research group, that's rather revealing. The Digital Arts and Sciences folks need to work as a community to collaborate with IT and university administration to help them sort out what sorts of uses virtual labs are good for and what sorts of uses they aren't - not fight the whole trend, or work in isolation, duplicating effort and wasting time that could be put towards our research and teaching
Some major recommendations:
install Apache from source (not yum, and not an rpm, not even a source rpm)
you can use apxs (usually in /usr/sbin/apxs, run 'which apxs' to find it) once you've installed apache to install related systems from tarballs with real ease, so this first installation is, in a sense, what drives the rest of the process
only install Apache 2, not Apache 1 unless you have an absolutely mission critical reason to do so.
if you absolutely must have Apache 1 on a system, install only Apache 1 on that system and do everything you can to convert asap.
Apache 2 is very deeply embedded in how the larger computing community thinks about programming itself. As a result it takes more and more work to keep it viable on a system. It isn't intentional, it just is.
keep that Apache installation up to date. I realize that it's harder to do when doing the install by hand, but that's what mailing lists are for.
don't use threading. threading is a super idea, and it's great that the developer community is working away at it, and we're all looking forward to the day it works, but it's a snake pit. What's more, you'll need to keep your eyes out yourself to be sure you are aware of all the places threading pokes its nasty head up, because so much focus in the Apache 2 world is on threading. Select 'prefork' packages whenever there's an option.
use mod_security, get a stable configuration, then make sure you have a lot of time to sort out the upgrade in rule set each time you upgrade the rule set. mod_sec is a wonder, but each rule_set upgrade is a lot of work to get stable, even if you are pretty fluent in apache and perl and pcre.
make sure you get both the httpd package and the libapreq2 packages and install them. You can use apxs to install libapreq2.
if you are going to be using mod_perl, then you use mod_perl to configure and install apache, and use apxs to install everything else.
Step by step (although look at the mod_perl info if going that route):
If you have apache installed from a rpm, remove that version first. Version 1 is listed as apache, you would use 'rpm -qa | grep apache' to check for it. Version 2 is listed as httpd, you would use, well, you get the idea, ...
You get apache2 and libapreq2 source from the Apache HTTP Server Project. Unpack the most recent stable source release into a directory in /usr/src.
It's still best to install apache 2 according to the fedora default layout. We ended up creating a new entry in config.layout which we called Fedora (duh). (FedoraHttpLayout) We did this before we understood much about the automake, etc., system, so no doubt could do a better job now...
in a production environment, the current recommendation is against using threads, so you also need to turn those off. So the final configuration command to use the Fedora layout, no threads, and our default module set is
Follow the default configuration files given in the linked TWiki pages, documenting as noted above any needed changes. Make sure that you come update the default config files when new releases of packages necessitate updating our default config. We may not have the time (or the need) to update all the machines all the time, but it's very important to keep these guidelines up to date!
Once you have stabilized your configuration, add the server to the services system so that it will be restarted at boot time. To do so, you'll need administrator privileges, but here are the steps (for Fedora, it varies a bit from distribution to distribution):
customize this HttpdService script as necessary, then copy it to /etc/init.d. Make sure you make the script executable!
add your new service. If you named your file httpd, you would use 'chkconfig --add httpd'
test your file, by running 'sudo service httpd start' (or whatever you decided to call your server) and then visiting the server in a browser
once it works, set it to start automatically: 'chkconfig httpd on'
[edit]
Config file are generally in /etc/httpd/conf and /etc/httpd/conf.d, however some 3rd party modules have additional configuration files (e.g., modsecurity has a whole set it adds to /etc/modsecurity.d)
Our design is agile, i.e., the only configuration information that should go in the main configuration file, /etc/httpd/conf/httpd.conf, is that which applies to the base configuration of apache. Configuration information for dynamically loaded modules ('dso's) should go in separate configuration files in /etc/httpd/conf.d. The main configuration file automatically loads all the rest of the configuration files. Make sure that all configuration information needed for a particular module or application is self-contained in a config file in /etc/httpd/conf.d so that only that config file need be changed if/when the module changes.
HttpdConf - the main server config file. This is a composite, with extra info to help you configure a server. It won't work alone - it works with the related files also listed here! In general, there's a lot of stuff turned off that you are probably used to seeing turned on. Leave it off - Apache is the dominant server and thus the high profile target. You'd be amazed at the resources lost simply repelling attacks. (Remember? Denial of Service attacks?) To sort out exactly what modules you need for your particular server:
customize the template, adding the various configuration information you need
try to start the server, it will die
check the error log for directives that Apache could not find
look those directives up in the Directive Quick Reference. The linked documentation will tell you which module a directive belongs to. Turn on the module.
consider reading up a bit about the module in question. You will only truly understand what a module does when you need its functionality, so now is the time to read about what it does, and the Server Project pages are the primary sources.
[edit]
No PHP. Ever. Turn it off. Making and keeping php secure takes more resources than we have. You must get explicit permission from Hilary to turn it on, even for a microsecond, and it will be really tough to talk her into it, as she has better things to do with her time than deal with the resulting continual barrage of attacks.
back to top
mod_ssl
[edit]
Comes as part of the source install, although you do have to enable it explicitly. We use openssl as our ssl package.
(Get info about how to get certificates signed...)
back to top
mod_security
[edit]
Install and use mod_security (keep up to date). Mod_security is a third-party module (not part of the apache project, so you get it from the modsecurity homepage. You'll need to sign up for an account at the breach security network to actually download the source.
installing mod_security on fedora
Documentation for mod_security is generally pretty good.
Note that the fedora yum rpms are generally a release behind on the libxml2 rpms, and you usually do need the latest release, so check the link to xmlsoft.
There's no need to stop your apache install when you unpack the mod_security archive (at least on linux, who knows what goes on with windows...)
in the apache2 directory, run ./configure, make, and make install
check your ruleset against our sample configuration files in ConfiguringApache
[edit]
Jonathan Zdziarski's mod_evasive module is also a fundamental protection module to install on your Apache server. Download it from his site and follow the ridiculously clear and easy directions in the README file Thanks, fella!
back to top
mod_perl
[edit]
You'll also be best off installing mod_perl from source (and it is easier than trying to use a package install of mod_perl.) The source comes from the mod_perl homepage If you use the configure command listed above, you can configure mod_perl using 'perl Makefile.PL MP_APXS=/usr/sbin/apxs' and then follow the standard install process. If, for some reason, you decide you need more complex configuration options, you should store those options in an options file and add the argument MP_OPTIONS_FILE to the configure step. The first time one of us finds a need to do this, think through a good name and location for a file and come back here and document it.
back to top
gdbm
[edit]
As of April 1st, 2008, the rpm install of gdbm had a bug that affected mod_perl, so you'll need to install gdbm from source. You can download it from any GNU mirror, you want version 1.8.3 (which dates from 2002!)
back to top
. . .
[edit]
05 Aug 2008: Feel free to dive right in, but you may find this quick foreword very helpful in using these materials. Ahat (the lab) has been maintaining a collaborative set of notes since its inception, in 2001. Initially these were jointly owned and authored webpages, to which we added wiki pages on a kwiki server. Kwiki did not work out, however, we were hooked on wikiservers. As we continued to add collaboration tools to our toolkit (such as Subversion), we explored a number of alternative wikiservers, including several exploratory shakedown deployments, finally settling on twiki. We deployed our initial production version twiki server in May 2008. As such, the information in ManagingTWiki represents the oldest set of topics since our migration to TWiki. Moving to TWiki resulted in an explosion of collaborative activity in the lab, which is wonderful - the truest affirmation we could have had of our decision, as well as pretty great endorsement of TWiki
We've brought on some new interns, Nate and Charles, and are now getting our feet back under ourselves. Phew! Currently, we're reviewing all the materials on the Ahatwiki. We're also upgrading to 4.2.1, so I'm working on bringing these topics up to date, but be aware that both the style and content of some of the ManagingTWiki stuff may lag behind our current information design best practices.
[edit]
TWiki has a variety of configuration files, more than just those that the TWiki folks themselves consider to be configuration files. We list and discuss those files here. As we go along, we'll try to keep our notes up to date. The main benefit of this discussion, though, is to make you aware of where the possible pitfalls are. When upgrading and/or writing TWiki extensions or working on the TWiki core, make sure that you always research the TWiki developer documentation thoroughly for any new information regarding these configuration files before proceeding.
Do the rest of us a favor, too, ok? Take a few moments and do a core dump here with your notes and info. Even the raw stuff is great if you don't have time to be coherent, It'll help the next developer immensely. Just mark what level of editing you've had a chance to do when you add your info to the pile...
twiki configuration files (as of 05 Aug 2008)
twiki/lib/LocalSite.cfg -- the main one, where you enable plugins, etc.
twiki/bin/LocalLib.cfg -- auxiliary
twiki/bin/logos/favicon.ico - replace with your favicon on installation to ensure yours is used with some generated pages
apache twiki configuration files (as of 05 Aug 2008)
/etc/httpd/conf.d/twiki.full-server - contains configuration directives that are common to both virtual hosts: http (non-encrypted) and https (ssl/encrypted). It is included by twiki.conf twice, once within each <VirtualHost> section. Configures mod_rewrite and the twiki 'bin' directory. At present, when you install a plugin, if it adds a script to the bin directory, you will need to add a line manually to this directory. We can change this later by using a <Perl> section. Note that this file includes hardcoded references to both People and TWiki as we are using Shorter URLs.
/etc/httpd/conf.d/twiki.conf -- main twiki apache configuration file, Has one reference to People hardcoded into it as we are using Shorter URLs. Includes twiki.full-server (twice) and mod_perl_startup.pl.
twiki/tools/mod_perl_startup.pl -- used for mod perl preloading, called from twiki.conf. (use ModPerl::RegistryPrefork)
in one web only: follow the instructions on PatternSkin Logo customization under "using a new filename." Edit the WebPreferences topic for the web in question.
[edit]A cheat-sheet for annoying TWiki admin tasks
Changing a user's password under template authentication. If you run htpasswd directly on the password file, you will erase the user's email address. Run
htpasswd -n <username>
, enter the password, and cut 'n paste the output into the password file, replacing the old password.
Making TWikiGuest (or JohnSmith) a TWiki Admin: To add the first administrator, follow the instructions in the box, below. A similar box appears on TWikiAdminGroup, but disappears after you follow these instructions, adding the first administrator.
How to add the first administrator If you haven't previously set up an administrator, follow these steps:
Authenticate as the internal TWiki administrator: internal admin login (use the username suggested and the password set in configure).
Verify that new members show up properly in the group listing at TWikiGroups
Make sure always to keep this topic write protected by keeping the already defined ALLOWTOPICCHANGE setting
The ALLOWTOPICHANGE and ALLOWTOPICRENAME settings in TWikiPreferences and TWikiPreferences have already been set to this group (TWikiAdminGroup), restricting edit of site-wide preferences to the TWiki Administrator Group
We have a strong preference for PostgreSQL, both for the driver itself and for the community around it. Due to current issues with yum, it's best to install it from source (easy, too.) That way, you get pg_config, for example.
MySQL:
Has gotten pretty commercial, but also needs to be installed from source.
Guests are welcome to view our materials. To subscribe, edit, view raw markup, etc., you'll need to register for an account. Accounts are free (and will always be free) - your involvement helps us directly and indirectly (by demonstrating that our work matters to our funders...) StartingPoints has more info.
Copyright 1999-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Ideas, requests, problems regarding Ahatwiki? Send feedback
Syndicate this site
RSSATOM