Transcription API¶

To use transcription you have to be an authorized user. Please refer to API introduction for more information.

We have two transcription providers: AWS and Google. Both of them adopted to have as much as possible similar API. So if you handle one of them, you will be able to handle another one easily.

Transcription flow can be divided on next steps:

Upload file by pre-signed url
Start transcription job
Get transcription status
When status is completed save transcription to profile

Let’s start from the beginning!

Upload file by pre-signed url (AWS)¶

In order to be able to upload large files (up to 200MB for now) we have to avoid uploading whole file to the server and even in browser. To do that we use multipart upload by pre-signed url. Sounds complicated, but not really. Let’s see how it works.

First of all we have to initialize uploading process. Pleas call https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_create_multipart_upload_create for that. As a result, you will get upload id. This id is required to determine what object have to contain uploaded parts. Then we have to split file and generate pre-signed url (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_generate_upload_url_create) for each part and upload it to AWS S3. As a result we will get etag for each part. Etag is a hash of uploaded part. We need it to complete uploading process (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_complete_multipart_upload_create). Implementation of this process depends on your platform, but general example for javascript is provided below.

// Get File metadata
export async function getAudioData(file) {
  return new Promise((resolve, reject) => {
    const audio = new Audio();
    const objectUrl = URL.createObjectURL(file);

    audio.onloadedmetadata = () => {
      URL.revokeObjectURL(objectUrl);
      resolve({
        duration: audio.duration,
        bitrate: Math.floor((file.size * 0.008) / audio.duration)
      });
    };

    audio.onerror = () => {
      URL.revokeObjectURL(objectUrl);
      reject(new Error('Failed to load file'));
    };

    audio.src = objectUrl;
  });
}

// Upload file to S3 by chunks
export async function uploadFileToS3ByChunks(file, fileMeta, chunkSize = 5 * 1024 * 1024) {
  const numChunks = Math.ceil(file.size / chunkSize);
  const fileName = file.name;
  const uploadIdResponse = await this.$axios.$post('/aws/create-multipart-upload/', {
    file_name: fileName,
    file_duration: fileMeta.duration
  });
  const uploadId = uploadIdResponse.upload_id;
  const parts = [];

  for (let i = 0; i < numChunks; i++) {
    const start = i * chunkSize;
    const end = Math.min(start + chunkSize, file.size);
    const chunk = file.slice(start, end);
    const partNumber = i + 1;

    const presignedUrlResponse = await this.$axios.$post('/aws/generate-upload-url/', {
      file_name: fileName,
      part_number: partNumber,
      upload_id: uploadId,
    });
    const presignedUrl = presignedUrlResponse.url;

    const response = await this.$axios.put(presignedUrl, chunk, {
      headers: {'Content-Type': 'binary/octet-stream'},
      transformRequest: (data, headers) => {
        // Remove Authorization header to avoid conflict on AWS
        delete headers.common['Authorization'];
        delete headers['Authorization'];
        return data;
      },
      onUploadProgress: (progressEvent) => {
        console.log(`Chunk ${partNumber}: ${(progressEvent.loaded / progressEvent.total) * 100}%`);
      },
    });
    const {etag} = response.headers;
    parts.push({
      part_number: partNumber,
      etag: etag,
    });
  }
  parts.sort((a, b) => a.part_number - b.part_number);

  const completeUploadResponse = await this.$axios.$post('/aws/complete-multipart-upload/', {
    file_name: fileName,
    upload_id: uploadId,
    parts,
  });
  return {
    location: completeUploadResponse.location,
    etag: completeUploadResponse.etag,
    checksum: completeUploadResponse.checksum,
  }
}

We are done with uploading. Now we can start transcription job.

Start transcription job (AWS)¶

To start transcription job please call https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_start_transcription_job_create method. As a result you will get transcription job name. With this job name you can get transcription status (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_get_transcription_job_retrieve). When status is completed you can save results to profile (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_start_transcription_job_create). Please make calls every 2-10 seconds to https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_get_transcription_job_retrieve while status is IN_PROGRESS. This is done to keep event loop on client side and simplify settings to avoid websockets. Here below an example of how it can be done on javascript.

export async function getTranscriptionJob(job_name) {
  let status = 'in_progress'
  while (true) {
    ({status} = await this.$axios.$get(`aws/get-transcription-job/${job_name}/`))
    if (status !== 'in_progress') {
      break;
    }
    await new Promise(r => setTimeout(r, 5000));
  }
  return status
}

On this point we have completed job object. Let’s save it to profile.

Save transcription (AWS)¶

In order to save transcription to profile please call https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_aws_save_results_create method. It will finalize all process and make transcription readable, editable for user. Here is an example of how it can be done on javascript.

export async function saveAWSTranscriptionResults(jobName, file, fileMeta = null) {
   fileMeta = fileMeta || await getAudioData(file)
   return await this.$axios.$post(`${API_V2_PREFIX}transcription/aws/save-results/`, {
     job_name: jobName,
     bitrate: fileMeta.bitrate,
     duration: fileMeta.duration
   })
 }

Congratulations! You have completed transcription process. Now you can get transcription results from profile.

Here is example of whole process on javascript.

const fileMeta = await getAudioData(file)
await uploadFileToS3ByChunks(file, fileMeta);
const transcriptionJobName = await startTranscriptionJob(file.name, selectedCountryCode);
const status = await getTranscriptionJob(transcriptionJobName);
if (status === 'completed') {
  await saveTranscriptionJobHistory(transcriptionJobName, this.file, fileMeta);
}

Google Speech to Text¶

For Google Speech to Text service we have to do almost the same steps.

Upload file by pre-signed url (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_gc_generate_resumable_session_create) (please read description in endpoint)
Start transcription job (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_gc_start_transcription_job_create)
Get transcription status (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_gc_get_transcription_job_retrieve)
When status is completed save transcription to profile (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/v2_transcription_gc_save_results_create)

Next steps¶

All next steps are very straightforward. We need few endpoints.

Get transcription history list (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/transcription_get_my_transcript_history_list)
Get specific transcription (https://rythmex.com/api/schema/public/swagger-ui/#/Transcription/transcription_get_transcript_retrieve)

That’s it. Good luck!