OK, first let me say a couple of things in my defense.
I did this in 1998 (I think. It might have been earlier. SQL 6.5 was the new version of MSSQL at the time…). Replication, the version that Microsoft created and supports, was not yet available. And, we had a deadline to meet. I had one afternoon to create and test some process for moving changes made in the Order Entry/Customer Service database on a SQL 6.0 server into the Accounting/Billing database on another (It was running MSSQL 6.5). Finally, I was still pretty inexperienced with SQL (I had only been working with it for a couple of years at the time). I did the best I could at the time and within a couple of weeks I learned some of the most important lessons of my career…
Conceptually, the system was simple enough.
The publishing server had triggers on the tables that needed to be replicated. Every insert, update, or delete caused a SQL statement to be generated that would be inserted into a “Transaction Log”. The “Transaction Log” was just another table. It had two columns: TranSequence integer and CmdText varchar (500) -- I think. The Transaction Log had an insert trigger defined on it as well. Every time a row was inserted, it used xp_sendmail to send a mail message to the other box. The Subject of the email was a special phrase (so the other DB would recognize that it was not SPAM) and the TranSequence. The body of the email contained the command.
Every minute the subscriber used xp_readmail to read any messages that arrived. If it found the correct phrase in the subject, it executed the command found in the message body then sent a message (using a similar process back to the publisher which informed it that the replication was complete. Every message, after it was read and acted on (or not, depending on the phrase) , was deleted from the exchange server using xp_deletemail.
I created this process on a Friday afternoon. It was tested (the requirement was that it should be able to manage about 400 messages / per day and that the changes would migrate to the next machine within about 2 minutes. So, testing wasn’t all that rigorous). We had it in Production on the following Monday. It worked precisely as advertised. Everyone was happy…
What Went Wrong
After a week or so, the sent items folder on the Exchange server reached its’ size limit. This caused the Exchange server to lock. The company was now without email. Once Exchange stopped responding, xp_sendmail, xp_readmail, and xp_deletemail stopped working. They didn’t fail. Instead, they simply went into a wait.
These Extended Stored Procedures were being executed from within a transaction. So, locks on the tables involved quickly escalated into Table Locks. System tables did the same thing. Within a few minutes, the server was down and there was not even a way to log on to it. We had to guess at what the problem might be. It took quite a while to extrapolate a cause/effect relationship between Exchange going down, then the SQL Server.
Worse yet, by the time this went wrong, the company was quite dependent on my “Homegrown” replication. There was no way back. For the next week or so, I started every day by manually deleting the messages from the sent mail folder. Then several times each day, I did it again. The company was growing quickly and as it grew, so did the number of messages to delete. Soon, deleting messages consumed a large part of each day. Eventually, I managed to get an intern to do this for me (I still feel just a little guilty about that). When I left the company a year later, this was still going on.
Lessons Learned
As I promised, I will now highlight the most important things I learned from this experience. I hope you can generalize these lessons to be useful in the situations you encounter. I know I wasn’t the only one who needed to learn these things because I still see the same mistakes repeated (in only slightly different ways) on a regular basis.
First, never make an external system (like Exchange) a critical part of your SQL process (if you can help it). You have no control over that system and any number of things external to your process can (and sooner or later will) cause it to quit working. Also, your process might affect that system. Others who depend on it will be affected. (Trust me. There is nothing as embarrassing as shutting down the entire company, order entry, accounting, and email. You only do it once…)
Second, avoid using triggers to move data around, execute stored procedures, or perform important business functions. They are intended to help insure transaction integrity. Try to resist the temptation to use them for other things.
Third, remember that everything you do has two sets of consequences, those that you intended and those that you did not intend. The unintended consequences are usually the ones hurt. Take longer than one afternoon to think about the problem you need to solve. I should have explained that Friday afternoon (two days before go live) was not the appropriate time to begin adding an important new piece to the system. Yes, it would probably have been painful to delay the launch. But, it cost the company a lot more money and pain to have email, accounting and order entry shut down for a day while we tried to figure out what went wrong.
Finally, remember this. No matter what you do. Regardless of how stupid it is. Once it makes it into a production environment, it will probably stay there to haunt you for a long time. And, even when it is gone, if it was dumb, it won’t be forgotten…