File analytics download¶
The following script allows you to download analytics of previous uploaded files using parallel threads. If you still need to upload your files, follow the how-to guide Batch files upload
Note
If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.
Script parameters¶
workdir
: Absolute path to the work directory. For instance/home/user/workdir_batch_01
output_dir
: Absolute path of the directory to download all analytic files. For instance:/home/user/workdir_batch_01/analytics_files
uploaded_file_ids_csv_filename
: Filename of the previous generated CSV containing IDs of the uploaded files. For instance:uploaded_file_ids_20241026_002611.csv
max_concurrency
: The number of concurrent threads to usemax_download_timeout
: Timeout in seconds the script will wait for each file in case it is not processed yet.
How to run the script¶
(If not yet done) Follow Prerequisites instructions to set up the require environment
Ensure that the CSV file
uploaded_file_ids_YYYYMMDD_HHMMSS.csv
, containing the ID of the previous uploaded files, is in the work directory/home/user/workdir_batch_01
Create a new directory to store all analytic files that we plan to download, for instance
/home/user/workdir_batch_01/analytics_files
Finally, you can run the script
python3 batch_file_analytics_download.py \
workdir=/home/user/workdir_batch_01 \
output_dir=/home/user/workdir_batch_01/analytics_files \
uploaded_file_ids_csv_filename=uploaded_file_ids_20241026_002611.csv \
max_concurrency=50 \
max_download_timeout=100
The script will download and store the analytic files in the output_dir
folder. The analytic files will have the following format:
<original_base_filename>_<original_file_extention>_analytics.json
. For instancefile_01_abc_analytics.json
The script will also generate an output CSV file download_result__%Y%m%d_%H%M%S.csv
with the following values:
file_id
: File identifier that we can use in future requests to download or delete filesdownload_status
: Status of the download. It can beDOWNLOAD_DONE
orDOWNLOAD_ERROR
original_absolute_file_path
: The absolute path of the uploaded files
Example of the file download_result_20241026_003611.csv
4C303FEB0B384EEB882FAF927D4F1961,DOWNLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,DOWNLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,DOWNLOAD_ERROR,/home/user/files_to_upload/file_03.csv
If the file contains any DOWNLOAD_ERROR
you can run the script again, but using the download_result_20241026_003611.csv
in the parameter uploaded_file_ids_csv_filename
. The script will then try to download all file IDs with the status DOWNLOAD_ERROR
python3 download.py \
workdir=/home/user/workdir_batch_01 \
output_dir=/home/user/workdir_batch_01/analytics_files \
uploaded_file_ids_csv_filename=download_result_20241026_003611.csv \
max_concurrency=50 \
max_download_timeout=100