Reposting as I thought this was a fairly good issue that any web developer can get into.
Josh Breckman worked for a company that landed a contract to develop a content management system for a fairly large government website. Much of the project involved developing a content management system so that employees would be able to build and maintain the ever-changing content of their site.
Because they already had an existing website with a lot of content, the customer wanted to take the opportunity to reorganize and upload all the content into the new site before it went live. As you might imagine, this was a fairly time-consuming process. But after a few months, they had finally put all the content into the system and opened it up to the Internet.
Things went pretty well for a few days after going live. But, on day six, things went not-so-well: all of the content on the website had completely vanished and all pages led to the default “please enter content” page. Whoops.
Josh was called in to investigate and noticed that one particularly troublesome external IP had gone in and deleted *all* of the content on the system. The IP didn’t belong to some overseas hacker bent on destroying helpful government information. It resolved to googlebot.com, Google’s very own web crawling spider. Whoops.
After quite a bit of research (and scrambling around to find a non-corrupt backup), Josh found the problem. A user copied and pasted some content from one page to another, including an “edit” hyperlink to edit the content on the page. Normally, this wouldn’t be an issue, since an outside user would need to enter a name and password. But, the CMS authentication subsystem didn’t take into account the sophisticated hacking techniques of Google’s spider. Whoops.