Moving large amounts of data
The problem¶
Scenario: you're transferring large amounts of files from your computer to another disk or cloud storage.
The transfer errors in the middle - either:
- your device crashes
- internet goes down
- the service has a hickup
You're now in a situation where you have to figure out which directories:
- transferred fully
- transferred partially
- did not transfer at all
.. then having to come up with a plan on how to continue from there. Ugh.
The solution¶
Varasto's design really fits this use case because it has an asynchronous replication queue (one per volume) where all data transfers are (atomically) queued and the queue workers are resilient to temporary errors and software/hardware crashes.
Example case of moving data¶
Required reading
You should be familiar with replication policies first.
Let's say that you have only on your local disk (named Volume A
) this content:
- 20 TB of movies and TV series
- 500 GB of work files
- 500 GB of miscellaneous files
You now decide that you want to backup of all of this content into your cloud storage. The cloud storage is empty in the beginning, so you've got 21 TB of data to transfer.
Before deciding you need a cloud backup, here's your replication policies:
Name | New data goes to |
---|---|
Default | Volume A |
We'll transfer data by making changes to replication policies.
Screenshot of replication policies UI
No need for prioritization¶
If you don't need to prioritize sending the data, you can just change the above policy to:
Name | New data goes to |
---|---|
Default | Volume A, Cloud |
Varasto's replication reconciliation process will notice that there are conflicts (with how you want things to be vs. how they currently are), confirm them with you and will start replicating the data.
You're done. You just need to wait for the queue backlog to reach realtime.
Explain the conflict resolution
Since the policy's desired replica count (derived from New data goes to
) applies to
existing data as well (not just new data - but new data will be written with policy
compliance) and you just changed the policy, reconciliation process finds policy
conflicts with your existing data:
Policy change | Policy's replica count | Existing data's replica count | Conflict |
---|---|---|---|
Before | 1 | 1 | ☐ |
After | 2 | 1 | ☑ |
Using prioritization¶
If you want your data to be both in Volume A, Cloud
, but you want to transfer your data
in prioritized batches, you could create a temporary "better policy" which you slowly
extend to cover more directories until it covers everything.
You create a new policy - here's your policies now:
Name | New data goes to |
---|---|
Default | Volume A |
Increased resiliency | Volume A, Cloud |
Here's the content in order of importance (= step order):
- Work files
- Everything else that is not movies or TV series
- Movies
- TV series
We'll do these steps:
Directory | Policy, explicit | Policy, inherited | Cloud |
---|---|---|---|
/ | Default | Default | ☐ |
/media/movies | Default | ☐ | |
/media/series | Default | ☐ | |
/work | Default | ☐ | |
/misc | Default | ☐ |
Directory | Policy, explicit | Policy, inherited | Cloud |
---|---|---|---|
/ | Default | Default | ☐ |
/media/movies | Default | ☐ | |
/media/series | Default | ☐ | |
/work | Increased resiliency | Increased resiliency | ☑ |
/misc | Default | ☐ |
What you did:
- Assign
Increased resiliency
to/work
Effect:
- Work files will be transferred
Directory | Policy, explicit | Policy, inherited | Cloud |
---|---|---|---|
/ | Increased resiliency | Increased resiliency | ☑ |
/media/movies | Default | Default | ☐ |
/media/series | Default | Default | ☐ |
/work | Increased resiliency | ☑ | |
/misc | Increased resiliency | ☑ |
What you did:
- Assign
Increased resiliency
to root - Assign
Default
to/media/movies
- Assign
Default
to/media/series
- Remove
Increased resiliency
from/work
- No effect but it'd be now an redundant exception
Effect:
- Everything else except movies & TV series will be transferred
Directory | Policy, explicit | Policy, inherited | Cloud |
---|---|---|---|
/ | Increased resiliency | Increased resiliency | ☑ |
/media/movies | Increased resiliency | ☑ | |
/media/series | Default | Default | ☐ |
/work | Increased resiliency | ☑ | |
/misc | Increased resiliency | ☑ |
What you did:
- Remove
Default
from/media/movies
(it'll now inherit root policy)
Effect:
- Movies will be transferred
Directory | Policy, explicit | Policy, inherited | Cloud |
---|---|---|---|
/ | Increased resiliency | Increased resiliency | ☑ |
/media/movies | Increased resiliency | ☑ | |
/media/series | Increased resiliency | ☑ | |
/work | Increased resiliency | ☑ | |
/misc | Increased resiliency | ☑ |
What you did:
- Remove
Default
from/media/series
(it'll now inherit root policy)
Effect:
- TV series will be transferred
After the transfer is done, all your content exists locally and in the cloud, and all new content will be automatically also written to in realtime.
Tip
Now, optionally, delete the old Default
policy (clearly, it's no longer the default)
and rename Increased resiliency
to be the new Default
.