Scaring you silly

“I crashed the New Zealand stock exchange”: 13 terrifyingly true developer horror stories

JAX Editorial Team
halloween2

Pull up a chair and gaze into the glow of the warm monitor, as JAXenter shares with you our favourite Halloween chillers.

It’s October 31st –
officially the scariest day of the year. And what better way to
honour of this celebration of ghouls and gremlins, we thought, than
sharing thirteen terrifying tales of developer horror gathered at
JAX London?

Read on, dear thrill seeker. But don’t come crying to us if this
gives you goosebumps…and hey, if you’ve got some spine chillers of
your own, please feel free to share in the comment section!

 

1. Say rm -rf and die

 

This must have been, what, early noughties time? So I used to
work for a company that owned another company, and I knew a guy
that worked there. He was convinced that their site was being
hacked, and he wanted helping.

So I ssh’d into their box, had a look around, spot some kind of
problem, and fix it for him. And then I accidentally run a rm -rf
command from the wrong directory, and wipe out their entire
website
.

And then ping him, and I was like, “dude, there’s a bit of a
problem. Where are your backups?”

To which he replies: “What backups?”

 

2. Beware of the stock exchange

 

So…I crashed the New Zealand stock exchange. I shut down the
servers in the wrong window.

We were sitting in the Ops room, and we had a police siren set
up for when the stock exchange goes down, because we need to know
about it. So I enter my command on the keyboard, and all of a
sudden, the siren kicks off.

I didn’t put two and two together to start with. I thought,
“Cool! Our siren’s gone off.” So I started looking at various
systems, and then I thought: oh, hang on a second.

And that’s the day we discovered our disaster recovery did
actually work. They’d never been tested before.

 

3. Diary of a mad tester

 

One of our tests says, “Get me a random number. Are these two
numbers the same? No? Good. Pass.”

 

4. Legend of the lost operating system

 

I managed to delete an entire operating system.

A long time ago, I had a different setup of computers than we
have today. They were on the removable disks – they’re all hidden
away these days – but you had great big cabinets. We had something
like £25m worth of developed software on the disks, and I was doing
the rotating backups. Somebody set a switch on the front of the
computer the wrong way, which was auto-start.

So, what it did was, I put the disk in, and we had little power
glitch, and it auto-started halfway through the backup. I corrupted
the first £25m worth of software, and then I put the next disk in
and corrupted that one. And then I put the next in, because I was
doing the full rotation! And in the end, I had none left.

I spent the next three weeks patching together a nearly-complete
operating system and software. Never quite got there. I lost about
two months’ work in the end. Oh my god, was I slaving away for
hours and hours.

 

5. Print fever

 

We asked another department in the company to update an Excel
sheet and edit the cells. They handed it back to us printed out,
and someone had written the corrections on it! Because it was so
big, they’d stuck it all together with sellotape, on pieces of A4,
up on a whiteboard. Priceless, that.

 

6. Monster laptop

 

This is a company laptop – they don’t let you install anything
on the laptop, you gotta be admin. So I spent most of the weekend
doing a live CD, making it persistent as well, so I could use an
external drive.

I came here to a workshop yesterday, booted it up once and it
worked. Forgot to set the persistence flag, so reset it, set the
persistence flag, and then I got a read error on the initrd Linux
RAM disk.

So that was it, all that work gone out the window. I had no
laptop for the workshops. That was a horror story, thanks to
Windows and their permissions.

 

7. Night of the living exploit

 

We released a service that allowed people to download software,
which had a huge security flaw in. Which we realised about fifteen
minutes after we deployed it. And then frantically fixed the
vulnerability and pushed the fix out there. We didn’t get exploited
or anything, so everything was fine, but that was quite a good
example of, “when the shit hits the fan, how good are you at
getting a fix out?”

 

8. The curse of the cleaners

 

A lot of companies forget to lock where their server farms are,
and the cleaners will just go in and clean, right? So they’ll dust
on the computers, and off switches will happen – Chaos Monkey
style. That’s happened at a major, major investment bank here in
London.

They let the cleaners into the server rooms, and all of a sudden
major production systems started going down. Everyone’s going,
“what’s going on?”, rang security down thinking they were being
hacked, and the cleaner’s like hoovering, headphones in, elbows
smacking up against the servers.

 

9. The £100k coding horror

 

I used to work for a really famous bank, and we were doing some
work for one of our clients who take credit card payments. And I
had to make a change, and accidentally committed something without
a test, and then six months later they found out there was a bug,
and it cost them £100,000.

That’s not the worst bit. So I patched it, and we went to work
on a couple of branches. And no-one pulled it in. I was
there for another eighteen months, and by the time I left it still
hadn’t been patched.

And the client – we had no idea of identifying which version of
the code they had. Because they were all physical sites somewhere.
And it reminded me of an engineer going out with a USB pen, when he
feels like it, just plugging it in and upgrading their software. So
that bug is probably still out there…

 

10. Be careful what you look for…

 

I came across, in our code, the perfect way to find if a number
is negative. Convert it to a string and see if the first character
is a minus sign. It was in actual production code.

 

11. Deep trouble

 

I’m at a conference today, and I got a call an hour ago, saying
a number of clients’ websites had gone down. And of course, there’s
no-one back in the office to fix it, because all of the guys are
down here with JAX London. So we’ve got a very worried-sounding
manager phoning to say help.

So we tried to get on the wifi network here, and of course
everyone else is trying to use the wifi here, so we’re having to do
it via mobile phones to connect in and sort out the problem. If we
can get 3G in here…

We managed to fix it in the end, so at least there’s a happy
ending.

 

12. My hairiest adventure

 

I used to work in a hospital, and it about was one month after I
started to work in IT. I was still a student. So what I did, I was
thinking I’m doing a fix and deploying it to test. What really
happened was we had access to the live, and it went live.

So, the next morning I came in at like 9 o’clock. Everybody was
looking at me. The information system of one of the biggest
hospitals in the country’s not working, what did you do? Turn it
back!

 

13.  Attack of the mutant asynchronous system

 

Two systems, one doing application logic and the other doing the
data logic. So if we have a cancellation of some kind, basically
this system works on the information, and it’s sort of saving a
bunch of data before it sends off an email – like, “we’re sorry to
lose you”, that kind of stuff.

Because the systems, the cross-cancelling subscription is
asynchronous, it sends some data, waits for the exact amount to
come back, receives them, and then sends an email with all the
information combined. So this intermediary data lives on the
application system and waits for the other to complete its work
asynchronously, was – still is, actually – in production, and it’s
a list of 100 entries maximum. It sort of lives there, waits and
then we pop it off the list.

So if at any point, for some reason, there are 100 sequential
requests, and they were all still being processed, we will start
losing data, because of that hard limit of 100 for no apparent
reason – because, I guess 100 was enough. And if the server gets
restarted, we lose that.

We’ve had that for about eight years now. Nobody really knows
about it.

Comments
comments powered by Disqus