Helpers

This helper functions provide the functionality that FastDownload relies on. Most users should use FastDownload rather than calling these helpers.

dest = Path('tmp')
url = 'https://s3.amazonaws.com/fast-ai-sample/mnist_tiny.tgz'

download_url[source]

download_url(url, dest=None, timeout=None, show_progress=True)

Download url to dest and show progress

dest.mkdir(exist_ok=True)
fpath = download_url(url, dest)
fpath
100.54% [344064/342207 00:00<00:00]
Path('tmp/mnist_tiny.tgz')

path_stats[source]

path_stats(fpath)

True if size and hash of fpath matches size_check and hash_check

path_stats(fpath)
(342207, '56143e8f24db90d925d82a5a74141875')

checks_module[source]

checks_module(module)

Location of download_checks.py

The download_checks.py file containing sizes and hashes will be located next to module:

mod = checks_module(fastdownload)
mod
Path('git/fastdownload/fastdownload/download_checks.py')

read_checks[source]

read_checks(fmod)

Evaluated contents of download_checks.py

assert read_checks({}) == {}

check[source]

check(fmod, url, fpath)

Check whether size and hash of fpath matches stored data for url or data is missing

update_checks[source]

update_checks(fpath, url, fmod)

Store the hash and size of fpath for url in download_checks.py

if mod.exists(): mod.unlink()
update_checks(fpath, url, mod)
read_checks(mod)
{'https://s3.amazonaws.com/fast-ai-sample/mnist_tiny.tgz': (342207,
  '56143e8f24db90d925d82a5a74141875')}

download_and_check[source]

download_and_check(url, fpath, fmod, force)

Download url to fpath, unless exists and check fails and not force

class FastDownload[source]

FastDownload(cfg=None, base='~/.fastdownload', archive=None, data=None, module=None)

d = FastDownload(module=fastdownload)
d.module
Path('git/fastdownload/fastdownload/download_checks.py')

The config.ini file will be created (if it doesn't exist) in {base}/config.ini:

d.cfg.config_file
Path('.fastdownload/config.ini')
print(d.cfg.config_file.read_text())
[DEFAULT]
data = /home/jhoward/.fastdownload/data
archive = /home/jhoward/.fastdownload/archive


FastDownload.download[source]

FastDownload.download(url, force=False)

Download url to archive path, unless exists and self.check fails and not force

If there is no stored hash and size for url, or the size and hash matches the stored checks, then download will only download the URL if the destination file does not exist. The destination path will be retured.

if d.module.exists(): d.module.unlink()
arch = d.download(url)
arch
100.54% [344064/342207 00:00<00:00]
Path('.fastdownload/archive/mnist_tiny.tgz')

FastDownload.update[source]

FastDownload.update(url)

Store the hash and size in download_checks.py

d.update(url)
eval(d.module.read_text())
{'https://s3.amazonaws.com/fast-ai-sample/mnist_tiny.tgz': (342207,
  '56143e8f24db90d925d82a5a74141875')}

Calling download will now just return the existing file, since the checks match:

d.download(url)
Path('.fastdownload/archive/mnist_tiny.tgz')

If the checks file doesn't match the size or hash of the archive, then a new copy of the file will be downloaded.

FastDownload.extract[source]

FastDownload.extract(url, extract_key='data', force=False)

Extract archive already downloaded from url, overwriting existing if force

extr = d.extract(url, force=True)
extr
Path('.fastdownload/data/mnist_tiny')
extr.ls()
(#5) [Path('.fastdownload/data/mnist_tiny/models'),Path('.fastdownload/data/mnist_tiny/train'),Path('.fastdownload/data/mnist_tiny/labels.csv'),Path('.fastdownload/data/mnist_tiny/valid'),Path('.fastdownload/data/mnist_tiny/test')]

Pass extract_key to use a key other than data from your config file when selecting an archive extraction location:

d.cfg['model_path'] = 'models'
d.extract(url, extract_key='model_path')
Path('.fastdownload/models/mnist_tiny')

FastDownload.rm[source]

FastDownload.rm(url, rm_arch=True, rm_data=True, extract_key='data')

Delete downloaded archive and extracted data for url

d.rm(url)
extr.exists(),arch.exists()
(False, False)

FastDownload.get[source]

FastDownload.get(url, extract_key='data', force=False)

Download and extract url, overwriting existing if force

res = d.get(url)
res,extr.exists()
100.54% [344064/342207 00:00<00:00]
(Path('.fastdownload/data/mnist_tiny'), True)

If the archive doesn't exist, but the extracted data does, then the archive is not downloaded again.

d.rm(url, rm_data=False)
res = d.get(url)
res,extr.exists()
(Path('.fastdownload/data/mnist_tiny'), True)

extract_key works the same way as in FastDownload.extract:

res = d.get(url, extract_key='model_path')
res,res.exists()
(Path('.fastdownload/models/mnist_tiny'), True)