User's guide

Read barcodes

scancode

POST https://tesseractor.com/api/v1/scancode?login=&password=

login	Your identification code.
password	Your password.
multipart/form-data
file	Content of the PDF or JPG, PNG or GIF image in binary.
yolo	Detect barcodes (experimental).
firstpage	First page to process in a PDF.
lastpage	Last page to process in a PDF.
resolution	Resolution in dpi of the image generated for each page of a PDF .
images	Directly extract just the images in a PDF.
rotate	Rotate images.
crop	Crop images.
reframe	Reframe images on a background.
unborder	Remove border lines.
resize	Resize images.
normalize	Add contrast to the colors.
colorspace	Convert to grayscale.
unsharp	Sharpen the contours.
dots	Remove white dots.

yolo - : every image is cropped around barcodes using the object detection system. IMPORTANT: Only works for now on QR codes.

Specify the extraction mode of each page of a PDF:

resolution : resolution of the image generated in dpi - 50, 75, 100, 125, 150 or 200. IMPORTANT: If a page contains only one image and no text, the image is systematically directly extracted from the document.
images : 1 - directly extract only the images.

Activate the processing options of each image before analysis:

rotate : 180 to flip the image, -90 to rotate it to the left or to the right,
crop - : cut the image to the size specified by a width and a height separated by an x from a position specified by x and y coordinates preceded by a +, e.g. 640x200+50+80,
reframe - : reframe the image on a background with a blur level between 1 and 20, e.g. 5,
unborder - : remove the borders with, separated by an x, the maximum width and height of a text between 10 and 1000 pixels, e.g. 30x30,
resize - : resize the image by 50, 75, 125, 150 or 200 %,
normalize - : 1 - add contrast to the colors,
colorspace - : 1 - convert the image to grayscale,
unsharp - : 1 - sharpen the contours,
dots - : 1 - remove white dots.

IMPORTANT: Image processing options are run in the above order.

To have a correct understanding of the effects of these parameters, test them in the interface of your personal space.

barcode39.jpg

ean128.gif

qr.png

barcodes.pdf

$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@barcode39.jpg" -o -
WIKIPEDIA
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@ean128.gif" -o -
010123456789012815057072
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@qr.png" -o -
https://www.wikipedia.org

On the PDF which contains the 3 images, one per page:

$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@barcodes.pdf" -o -
WIKIPEDIA
010123456789012815057072
https://www.wikipedia.org
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@barcodes.pdf" -F "firstpage=3" -o -
https://www.wikipedia.org

3qr.pdf

$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "images=1" -F "file=@3qr.pdf" -o -
210934
NEWDOC
https://www.tesseractor.com
https://www.tesseractor.com
NEWDOC
210934
210934
NEWDOC
https://www.tesseractor.com

The PDF has 3 pages:
1 • 2 images with 1 QR https://www.tesseractor.com + the 2 QR NEWDOC and 210934,
2 • 3 images with 1 QR https://www.tesseractor.com + 1 QR NEWDOC + 1 QR 210934,
3 • 1 single image with the 3 QR.

The option images=1 directly extracts the images in their original sizes, without the text. Try with the option resolution=125.

YOLO (You Only Look Once)

dqr.jpg

$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@dqr.jpg" -o -
210968

Only 1 QR is read by ZBar.

$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "yolo=1" -F "file=@dqr.jpg" -o -
210968
NEWDOC

Using YOLO, the image is analyzed and cropped into 2 distinct images which ZBar can easily decode.

Batch processing

Download the program scancode2csv.php:

scancode2csv.php

IMPORTANT: Edit the constants LOGIN and PASSWORD in the program before trying it and select in the array $passes the different operations carried out on each file.

Copy all the PDF and JPG, PNG or GIF to process in a folder and type the following command:

$ php -f scancode2csv.php dir file.csv

Display the results in the file file.csv with your spreadsheet.

define('USAGE', 'php -f %s dir file.csv');
function abort($msg, $code=1) {
echo $msg, PHP_EOL;
exit($code);
}
function usage() {
global $argv;
abort(sprintf(USAGE, basename($argv[0])), 1);
}
if ($argc != 3) {
usage();
}
$dir=$argv[1];
$csv=$argv[2];
define('LOGIN', 'abcdef');
define('PASSWORD', 'ABCDEF');
define('DELIMITER', ',');
define('ENCLOSURE', '"');
define('ESCAPE', '\\');
define('URL', 'https://tesseractor.com/api/v1/scancode' . '?' . 'login=' . urlencode(LOGIN) . '&' . 'password=' . urlencode(PASSWORD));
define('SCANCODE', 'curl -s --fail --show-error -X POST "' . URL . '" -F "file=@%s" %s -o -');
$passes=array(
'YOLO_0_IMAGES_1' => '-F "yolo=0" -F "images=1"',
// 'YOLO_1_IMAGES_1' => '-F "yolo=1" -F "images=1"',
// 'YOLO_0_IMAGES_1_RESIZE_150' => '-F "yolo=0" -F "images=1" -F "resize=150"',
// 'YOLO_0_DPI_125' => '-F "yolo=0" -F "resolution=125"',
);
$filelist=@scandir($dir, SCANDIR_SORT_NONE);
if (!$filelist) {
abort($dir . '?');
}
sort($filelist, SORT_NATURAL);
$csvout = @fopen($csv, 'w');
if ($csvout === false) {
abort($csv . '?');
}
$headline=array(false);
foreach ($passes as $label => $arg) {
$headline[]=$label;
$headline[]=''; // count
$headline[]=''; // secs
}
if (fputcsv($csvout, $headline, DELIMITER, ENCLOSURE, ESCAPE) === false) {
abort($csv . '?');
}
foreach ($filelist as $file) {
if ($file == '.' || $file == '..')
continue;
echo $file, PHP_EOL;
$line=array($file);
foreach ($passes as $label => $arg) {
$cmdline=sprintf(SCANCODE, $dir . DIRECTORY_SEPARATOR . $file, $arg);
$output=false;
$stime=microtime(true);
@exec($cmdline, $output, $ret);
$etime=microtime(true);
$line[]=$ret == 0 && $output ? implode("\n", $output) : false;
$line[]=$ret == 0 && $output ? count($output) : 0;
$line[]=$ret == 0 ? sprintf('%0.2f', round($etime-$stime, 2)) : false;
}
if (fputcsv($csvout, $line, DELIMITER, ENCLOSURE, ESCAPE) === false) {
abort($csv . '?');
}
}
exit(0);

Download the code of the sendpost and file_mime_type functions from the iZend library. Copy the files in the space of your application.

sendhttp.php

filemimetype.php

NOTE: See the page Call the service API for a description of the sendpost and file_mime_type functions.

Add the file scancode.php with the following content:

scancode.php

require_once 'sendhttp.php';
require_once 'filemimetype.php';

Loads the code of the sendpost and file_mime_type functions.

function scancode($login, $password, $file, $yolo=false, $output='file.txt', $params=false) {

Defines the function scancode. $login is your identification code. $password is your password. $file is the pathname of the PDF, JPEG, PNG or GIF file to convert. If $yolo is true, every image is cropped around barcodes using the object detection system. $output is the pathname of the text file which will contains the result of the analysis of $file. $params is an associative array containing the names and the values of the parameters specifying the extraction mode of each page of a PDF and the processing options of each image before analysis, e.g. array('images' => true).

$curl = 'https://tesseractor.com/api/v1/scancode' . '?' . 'login=' . urlencode($login) . '&' . 'password=' . urlencode($password);

Sets $curl to the URL of the scancode action with the identification code and the password of the user's account. $login and $password must be escaped.

$args = array(
'yolo' => $yolo ? '1' : '0',
);
$args = array_merge($args, $params);

Prepares the list of arguments of the POST.

$files=array('file' => array('name' => basename($file), 'tmp_name' => $file, 'type' => file_mime_type($file)));

Prepares the list of files attached to the POST: file - the PDF, JPEG, PNG or GIF to analyze with the name of the file, the pathname of the file and its MIME type.

$response=sendpost($curl, $args, $files);

Sends the HTTP request with sendpost. The arguments login and password are already in $curl.

if (!$response or $response[0] != 200) {
return false;
}

If $response is false, the server is unreachable. If $response[0] doesn't contain the HTTP return code 200 Ok, an execution error has occurred. In case of error, scancode returns false.

return @file_put_contents($output, $response[2]);
}

Returns true if the text returned by the request could be written to the output file, false otherwise.

EXAMPLE

Assuming you have saved the files sendhttp.php, filemimetype.php and scancode.php in the current directory, run PHP in interactive mode, load the scancode function and call it with your identification code and password, the pathname of a PDF, JPEG, PNG or GIF file in argument:

$ php -a
php > require_once 'scancode.php';
php > scancode('abcdef', 'ABCDEF', 'qr.png');
php > quit

Display the result :

$ cat file.txt

Comments

To add a comment, click here.

tesseractor.com