This helper functions provide the functionality that FastDownload relies on. Most users should use FastDownload rather than calling these helpers.
dest = Path('tmp')
url = 'https://s3.amazonaws.com/fast-ai-sample/mnist_tiny.tgz'
dest.mkdir(exist_ok=True)
fpath = download_url(url, dest)
fpath
path_stats(fpath)
The download_checks.py file containing sizes and hashes will be located next to module:
mod = checks_module(fastdownload)
mod
assert read_checks({}) == {}
if mod.exists(): mod.unlink()
update_checks(fpath, url, mod)
read_checks(mod)
d = FastDownload(module=fastdownload)
d.module
The config.ini file will be created (if it doesn't exist) in {base}/config.ini:
d.cfg.config_file
print(d.cfg.config_file.read_text())
If there is no stored hash and size for url, or the size and hash matches the stored checks, then download will only download the URL if the destination file does not exist. The destination path will be retured.
if d.module.exists(): d.module.unlink()
arch = d.download(url)
arch
d.update(url)
eval(d.module.read_text())
Calling download will now just return the existing file, since the checks match:
d.download(url)
If the checks file doesn't match the size or hash of the archive, then a new copy of the file will be downloaded.
extr = d.extract(url, force=True)
extr
extr.ls()
Pass extract_key to use a key other than data from your config file when selecting an archive extraction location:
d.cfg['model_path'] = 'models'
d.extract(url, extract_key='model_path')
d.rm(url)
extr.exists(),arch.exists()
res = d.get(url)
res,extr.exists()
If the archive doesn't exist, but the extracted data does, then the archive is not downloaded again.
d.rm(url, rm_data=False)
res = d.get(url)
res,extr.exists()
extract_key works the same way as in FastDownload.extract:
res = d.get(url, extract_key='model_path')
res,res.exists()