Batch files upload

You can upload files to Bigdata.com to analyse them and search insights.

The following script allows you to upload files from a work directory to Bigdata.com using parallel threads, ideal for batch uploads!!

Note

If your browser displays the python text instead of downloading it. You can press ctrl+s after the file opens.

Script parameters

  • workdir: Absolute path to the work directory. For instance /home/user/workdir_batch_01

  • upload_txt_filename: Text file containing the absolute path of the files to upload, this file must be inside the above work directory. For instance: file_list.txt

/home/user/files_to_upload/file_01.txt
/home/user/files_to_upload/file_02.pdf
/home/user/files_to_upload/file_03.csv
  • max_concurrency: The number of concurrent threads to upload files

How to run the script

  1. Follow Prerequisites instructions to set up the require environment

  2. Add all the files that you want to upload in a directory, for instance in /home/user/files_to_upload

  3. Create the work directory, for instance /home/user/workdir_batch_01

  4. In the work directory, create a txt file containing the absolute path of all files to upload, for instance file_list.txt

/home/user/files_to_upload/file_01.txt
/home/user/files_to_upload/file_02.pdf
/home/user/files_to_upload/file_03.csv
  1. Finally you can run the script

python3 batch_file_upload.py                                 \
    workdir=/home/user/workdir_batch_01                      \
    upload_txt_filename=file_list.txt                        \
    max_concurrency=5

The script will generate two files:

  • Logging file: Contain details about the upload process. For instance: bigdata_processing_20241026_002610.log

  • CSV file with IDs: Enumerate the IDs of the uploaded files so you can manage (Delete, download, etc) them in the future. The CSV file contains the following values:
    • file_id: File identifier that we can use in future requests to download or delete the uploaded files

    • upload_status: Status of the upload. It can be UPLOAD_DONE or UPLOAD_ERROR

    • original_absolute_file_path: The absolute path of the uploaded files

Example of the file uploaded_file_ids_20241026_002611.csv

4C303FEB0B384EEB882FAF927D4F1961,UPLOAD_DONE,/home/user/files_to_upload/file_01.txt
3BDBA5EBA34A4A65817954E3559476BB,UPLOAD_DONE,/home/user/files_to_upload/file_02.pdf
F6FCC64ABAD64D52AC8A6864AE5F7C40,UPLOAD_DONE,/home/user/files_to_upload/file_03.csv