[specref] Downtime post-mortem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[specref] Downtime post-mortem

Tobie Langel-4
Hi all,

Sorry Specref was down for a number of hours overnight as I deployed a
buggy app refactor and failed to see that the app was crashing despite
checking the logs.

Lessons learned:
- the app needs tests, not only the data,
- Papertrail logs don't differentiate enough failed and served requests,
so check the app too,
- Don't push code late a night.

Apologies,

--tobie

Reply | Threaded
Open this post in threaded view
|

Re: [specref] Downtime post-mortem

Tobie Langel-4
As another mitigation strategy for such issues, I've given access to the
app to Dom, Shane and Marcos.

We should pretty much have all timezones covered with these three.

--tobie

On Fri, 4 Mar 2016, at 12:59, Tobie Langel wrote:

> Hi all,
>
> Sorry Specref was down for a number of hours overnight as I deployed a
> buggy app refactor and failed to see that the app was crashing despite
> checking the logs.
>
> Lessons learned:
> - the app needs tests, not only the data,
> - Papertrail logs don't differentiate enough failed and served requests,
> so check the app too,
> - Don't push code late a night.
>
> Apologies,
>
> --tobie

Reply | Threaded
Open this post in threaded view
|

Re: [specref] Downtime post-mortem

Shane McCarron-6
In particular since most of us never sleep.

On Fri, Mar 4, 2016 at 7:53 AM, Tobie Langel <[hidden email]> wrote:
As another mitigation strategy for such issues, I've given access to the
app to Dom, Shane and Marcos.

We should pretty much have all timezones covered with these three.

--tobie

On Fri, 4 Mar 2016, at 12:59, Tobie Langel wrote:
> Hi all,
>
> Sorry Specref was down for a number of hours overnight as I deployed a
> buggy app refactor and failed to see that the app was crashing despite
> checking the logs.
>
> Lessons learned:
> - the app needs tests, not only the data,
> - Papertrail logs don't differentiate enough failed and served requests,
> so check the app too,
> - Don't push code late a night.
>
> Apologies,
>
> --tobie




--
Shane McCarron
Projects Manager, Spec-Ops
Reply | Threaded
Open this post in threaded view
|

Re: [specref] Downtime post-mortem

Tobie Langel-4
In reply to this post by Tobie Langel-4
On Fri, 4 Mar 2016, at 14:53, Tobie Langel wrote:
> As another mitigation strategy for such issues, I've given access to the
> app to Dom, Shane and Marcos.

I've now also setup email alerts when errors show up in the logs.

--tobie