File analytics download¶

The following script allows you to download analytics of previous uploaded files using parallel threads. If you still need to upload your files, follow the how-to guide Batch files upload

batch_file_analytics_download.py

Note

If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.

Script parameters¶

workdir: Absolute path to the work directory. For instance /home/user/workdir_batch_01
output_dir: Absolute path of the directory to download all analytic files. For instance: /home/user/workdir_batch_01/analytics_files
uploaded_file_ids_csv_filename: Filename of the previous generated CSV containing IDs of the uploaded files. For instance: uploaded_file_ids_20241026_002611.csv
max_concurrency: The number of concurrent threads to use
max_download_timeout: Timeout in seconds the script will wait for each file in case it is not processed yet.

How to run the script¶

(If not yet done) Follow Prerequisites instructions to set up the require environment
Ensure that the CSV file uploaded_file_ids_YYYYMMDD_HHMMSS.csv, containing the ID of the previous uploaded files, is in the work directory /home/user/workdir_batch_01
Create a new directory to store all analytic files that we plan to download, for instance /home/user/workdir_batch_01/analytics_files
Finally, you can run the script

python3 batch_file_analytics_download.py                                       \
    workdir=/home/user/workdir_batch_01                                        \
    output_dir=/home/user/workdir_batch_01/analytics_files                     \
    uploaded_file_ids_csv_filename=uploaded_file_ids_20241026_002611.csv       \
    max_concurrency=50                                                         \
    max_download_timeout=100

The script will download and store the analytic files in the output_dir folder. The analytic files will have the following format:

<original_base_filename>_<original_file_extention>_analytics.json. For instance file_01_abc_analytics.json

The script will also generate an output CSV file download_result__%Y%m%d_%H%M%S.csv with the following values:

file_id: File identifier that we can use in future requests to download or delete files
download_status: Status of the download. It can be DOWNLOAD_DONE or DOWNLOAD_ERROR
original_absolute_file_path: The absolute path of the uploaded files

Example of the file download_result_20241026_003611.csv

4C303FEB0B384EEB882FAF927D4F1961,DOWNLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,DOWNLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,DOWNLOAD_ERROR,/home/user/files_to_upload/file_03.csv

If the file contains any DOWNLOAD_ERROR you can run the script again, but using the download_result_20241026_003611.csv in the parameter uploaded_file_ids_csv_filename. The script will then try to download all file IDs with the status DOWNLOAD_ERROR

python3 download.py                                                            \
workdir=/home/user/workdir_batch_01                                            \
output_dir=/home/user/workdir_batch_01/analytics_files                         \
uploaded_file_ids_csv_filename=download_result_20241026_003611.csv             \
max_concurrency=50                                                             \
max_download_timeout=100