After an outage lasting 3 weeks, we
are back on line..
Further details are now available
as to the cause of the downtime. A system reboot was planned
and executed. Upon which the machine refused to boot up. By
booting from an installation CD, myself and several other technicians
looked over the system. The initial suspected cause was a software
upgrade that had been performed several days earlier on various
program which are unrelated to the problem. For at least two
weeks, the focus was on fixing a problem with what
seemed
like
an
initialization
file. This file instructs the operating system to load and
start critical programs, such as logging, the web server, and
other services, after the file system has been mounted. Over
this
past weekend, as we tried to correct the problem,
we found that the root cause of the problem was that several
critical system directories were missing. Among these directories
were the /etc/ directory, which holds many programs and
settings, and the /home/ directory which held all of the user
data. How a whole system directory can just "go missing" is
a totally mystery to me. One day they were there, the next
they weren't. It was determined that a system rebuild was in
order.
In short, the system was totally
rebuilt from scratch, and all data was lost. Compounding this
problem are hard drive problems on my own machine at home.
These problems resulted in a reformat and reinstall on the
drive, and loss of backups from the 16th of December, 2002
onward.
Regrettably, the latest backups available are from December
15th, 2002.
As a result, I have decided
to start from scratch. The server has been rebuilt, has a brand
new firewall and intrusion detection system. It also has all
current software updates available, an updated FTP server,
and an updated Apache web server with more features available.
The latest beta Activeworlds servers are also installed.
However, much has been learned
from this whole ordeal. First, I am going to be much more strict
with access, firewalls, and passwords. Second, backups, backups,
backups. The most obvious lesson is that backups must be made
regularly, and stored in multiple places. I now have a backup
linux server, which will be used to test all software changes/updates.
This server is will also mirror all of the primary server's
configurations. In other words, you world will be loaded on
both servers, but will only be started on the primary server.
Therefore, if the primary server goes down for any reason,
the backup server can be brought online immediately, and have
up-to-date (at most 1 day old) data. To facilitate this, I
will be implementing several pieces of new software. Part of
the
scheme involves the software I have been working on for months,
parts of which have been released as "Log Save" and
the soon to be released "AW backup". The other pieces
are open source backup/administration programs. On top of the
multiple servers
I am trying to arrange for tape backups of the entire machine
within the data center.
So, to start getting worlds back online, here's how it's
going to work:
If you intend to make the transition
with JTech to a new fee based system, I will need an
e-mail from you saying so (JerMe@nc.rr.com). Please include
in your e-mail your
world name, password, and CT list, and the size (in MB)
you would
like for you object path (if you would like one), and
any other pertinent information. I will load the most recent
backup for you world, and upload a
backup of
your object path data, if you previously had a OP with JTech.
There have been many questions about
the new pricing. Please see http://www.JTechWebSystems.com/prices.shtml for
pricing information. Please feel free to ask
questions if you need any clarification on the exact terms
of the new payment plans. Obviously, the implementation of
the new free based services has been delayed again. Right now,
it looks as if I will begin charging for the month of May,
and payment for that month will be due by late April.