Things in production are not always working as they should be. In general, we don't care who has made a mistake that resulted in an emergency or incidents. The entire team is responsible for letting something slip into production. We see any emergency as an opportunity to learn something and to improve how we handle things.
During an emergency
In general, try to keep a cool head. A correct fix is better than a quick, and potentially wrong one.
When you are working on fixing an emergency in production these tips might help:
- ask a colleague to code review any changes, big and small, that you might make
- only perform the minimal code changes to fix the problem at hand. Don't make any unrelated changes or performance improvements
- should bugs expose any data publicly or to unauthorized persons, immediately bring the application down with
After an emergency
When the dust is settled, schedule a meeting with the relevant team members to discuss:
- what actual went wrong
- how this could have been prevented
- which changes we will make to the app to prevent a similar incident
- how we could change our procedures to prevent similar incident.