File analytics download

The following script allows you to download analytics of previous uploaded files using parallel threads. If you still need to upload your files, follow the how-to guide Batch files upload

Note

If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.

Script parameters

  • workdir: Absolute path to the work directory. For instance /home/user/workdir_batch_01

  • output_dir: Absolute path of the directory to download all analytic files. For instance: /home/user/workdir_batch_01/analytics_files

  • uploaded_file_ids_csv_filename: Filename of the previous generated CSV containing IDs of the uploaded files. For instance: uploaded_file_ids_20241026_002611.csv

  • max_concurrency: The number of concurrent threads to use

  • max_download_timeout: Timeout in seconds the script will wait for each file in case it is not processed yet.

How to run the script

  1. (If not yet done) Follow Prerequisites instructions to set up the require environment

  2. Ensure that the CSV file uploaded_file_ids_YYYYMMDD_HHMMSS.csv, containing the ID of the previous uploaded files, is in the work directory /home/user/workdir_batch_01

  3. Create a new directory to store all analytic files that we plan to download, for instance /home/user/workdir_batch_01/analytics_files

  4. Finally, you can run the script

python3 batch_file_analytics_download.py                                       \
    workdir=/home/user/workdir_batch_01                                        \
    output_dir=/home/user/workdir_batch_01/analytics_files                     \
    uploaded_file_ids_csv_filename=uploaded_file_ids_20241026_002611.csv       \
    max_concurrency=50                                                         \
    max_download_timeout=100

The script will download and store the analytic files in the output_dir folder. The analytic files will have the following format:

  • <original_base_filename>_<original_file_extention>_analytics.json. For instance file_01_abc_analytics.json

The script will also generate an output CSV file download_result__%Y%m%d_%H%M%S.csv with the following values:

  • file_id: File identifier that we can use in future requests to download or delete files

  • download_status: Status of the download. It can be DOWNLOAD_DONE or DOWNLOAD_ERROR

  • original_absolute_file_path: The absolute path of the uploaded files

Example of the file download_result_20241026_003611.csv

4C303FEB0B384EEB882FAF927D4F1961,DOWNLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,DOWNLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,DOWNLOAD_ERROR,/home/user/files_to_upload/file_03.csv

If the file contains any DOWNLOAD_ERROR you can run the script again, but using the download_result_20241026_003611.csv in the parameter uploaded_file_ids_csv_filename. The script will then try to download all file IDs with the status DOWNLOAD_ERROR

python3 download.py                                                            \
workdir=/home/user/workdir_batch_01                                            \
output_dir=/home/user/workdir_batch_01/analytics_files                         \
uploaded_file_ids_csv_filename=download_result_20241026_003611.csv             \
max_concurrency=50                                                             \
max_download_timeout=100