top | item 31273759

(no title)

tomkwong | 3 years ago

First, I want to say that this is a great post. You always grow stronger when you make mistakes. Writing it up solidify understanding in the learning process.

This story resonates with many people here because many experienced engineers had done something similar before. For me, destructive batch operations like this would be two distinct steps:

1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

Production operations are always risky. A good practice is to always prepare an execution plan with detailed steps, a validation plan, and a rollback plan. And, review the plan with peers before the operation.

discuss

notyourday|3 years ago

> 1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

> These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

This is the most underrated comment.

I'm saying it as someone who had the ultimate oversight of deleting hundreds of TBs per day spread of billions of files on different clouds and local storage.

spiffytech|3 years ago

I've never regretted treating tasks like this as a pipeline of discrete steps with explicit outputs and inputs. Sending output to a file, viewing it, then having something process the file is such a great safety net.