This API endpoint allows you to create new document information extraction templates for specific document types. Templates help in structuring the extraction of key data fields from documents, enabling you to capture the required information accurately. When creating a new template, it's important to emphasize the prompt parameter, which plays a crucial role in instructing the system on how to extract data from the provided documents.
By creating custom templates with detailed prompts, you can ensure that the API accurately extracts the required data from documents, even when dealing with multilingual, distorted, or culturally specific text.
Note: It's essential to tailor the prompt to the unique characteristics of the documents you're processing to achieve the best results.
Take a look at this prompt example:
{
"prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
}
Basically the prompt are a series of instructions and considerations about the document, you must define the most important aspects of it, in the body details there are more information of how to correctly create the prompt.
It's important that inside the prompt you add {{fields}} and {{format}}, this is to tell the AI what data extract and in which format return.
import axios from 'axios';
const options = {
method: 'POST',
url: '<https://api.verifik.co/v2/ocr/scan-prompt/template>',
params: {
},
headers: {
Accept: 'application/json',
Authorization: 'jwt <tu_token>'
},
data: {
code: 'unique_identifier_2',
name: 'Example template name',
fields: ['firstName', 'lastName', 'fullName', 'documentNumber'],
format: 'json',
prompt: 'From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.',
description: 'Example Prompt description',
documentTypes: ['CCCR']
}
};
try {
notFou const { data } = await axios.request(options);
console.log(data);
} catch (error) {
console.error(error);
}
{
"data": {
"__v": 0,
"_id": "653bdcc2ff7de0cee3b0760a",
"code": "unique_identifier",
"name": "Escaneo de cédulas de Colombia",
"active": true,
"client": "623b6317fe5fd1774be9f566",
"fields": [
"firstName",
"lastName",
"fullName",
"documentNumber"
],
"format": "json",
"prompt": "From the provided JSON input , extract the fields {{fields}}. The text can be in multiple languages and may contain minor distortions due to OCR. In Colombian culture, individuals may have two last names, both of which must be included. People also normally have 2 first names, like Juan Miguel or Jose Manuel, so when you see a 4 words name, extract it that way. For the driver restrictions, its usually like driver needs glasses or another detail about his/her negative observations Please avoid translation, simply extract the information as provided. The output must be in a {{format}}. Only consider fields with a confidence level of 0.5 or higher, disregard fields that do not have a confidence score.",
"system": false,
"deleted": false,
"createdAt": "2023-10-27T15:52:34.058Z",
"updatedAt": "2023-10-27T15:52:34.058Z",
"description": "Prompt para extraer cédulas de Colombia",
"documentTypes": [
"CC"
]
}
}
This API endpoint allows you to update a specific OCR template. OCR templates are used to extract structured data from documents. When you update a template, you can modify its settings, such as the fields to extract and the document types it applies to.