Read barcodes
scancode
POST https://tesseractor.com/api/v1/scancode?login=&password=
login | Your identification code. |
---|---|
password | Your password. |
multipart/form-data | |
file | Content of the PDF or JPG, PNG or GIF image in binary. |
yolo | Detect barcodes (experimental). |
firstpage | First page to process in a PDF. |
lastpage | Last page to process in a PDF. |
resolution | Resolution in dpi of the image generated for each page of a PDF . |
images | Directly extract just the images in a PDF. |
rotate | Rotate images. |
crop | Crop images. |
reframe | Reframe images on a background. |
unborder | Remove border lines. |
resize | Resize images. |
normalize | Add contrast to the colors. |
colorspace | Convert to grayscale. |
unsharp | Sharpen the contours. |
dots | Remove white dots. |
yolo
- : every image is cropped around barcodes using the object detection system.
IMPORTANT: Only works for now on QR codes.
Specify the extraction mode of each page of a PDF:
resolution
: resolution of the image generated in dpi - 50
, 75
, 100
, 125
, 150
or 200
.
IMPORTANT: If a page contains only one image and no text, the image is systematically directly extracted from the document.
images
: 1
- directly extract only the images.
Activate the processing options of each image before analysis:
rotate
: 180
to flip the image, -90
to rotate it to the left or to the right,
crop
- : cut the image to the size specified by a width and a height separated by an x
from a position specified by x and y coordinates preceded by a +
, e.g. 640x200+50+80
,
reframe
- : reframe the image on a background with a blur level between 1
and 20
, e.g. 5
,
unborder
- : remove the borders with, separated by an x
, the maximum width and height of a text between 10
and 1000
pixels, e.g. 30x30
,
resize
- : resize the image by 50
, 75
, 125
, 150
or 200
%,
normalize
- : 1
- add contrast to the colors,
colorspace
- : 1
- convert the image to grayscale,
unsharp
- : 1
- sharpen the contours,
dots
- : 1
- remove white dots.
IMPORTANT: Image processing options are run in the above order.
To have a correct understanding of the effects of these parameters, test them in the interface of your personal space.
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@barcode39.jpg" -o -
WIKIPEDIA
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@ean128.gif" -o -
010123456789012815057072
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@qr.png" -o -
https://www.wikipedia.org
On the PDF which contains the 3 images, one per page:
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@barcodes.pdf" -o -
WIKIPEDIA
010123456789012815057072
https://www.wikipedia.org
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@barcodes.pdf" -F "firstpage=3" -o -
https://www.wikipedia.org
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "images=1" -F "file=@3qr.pdf" -o -
210934
NEWDOC
https://www.tesseractor.com
https://www.tesseractor.com
NEWDOC
210934
210934
NEWDOC
https://www.tesseractor.com
The PDF has 3 pages:
1 • 2 images with 1 QR https://www.tesseractor.com + the 2 QR NEWDOC and 210934,
2 • 3 images with 1 QR https://www.tesseractor.com + 1 QR NEWDOC + 1 QR 210934,
3 • 1 single image with the 3 QR.
The option images=1
directly extracts the images in their original sizes, without the text.
Try with the option resolution=125
.
YOLO (You Only Look Once)
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "file=@dqr.jpg" -o -
210968
Only 1 QR is read by ZBar.
$ curl -s --fail --show-error -X POST "https://tesseractor.com/api/v1/scancode?login=abcdef&password=ABCDEF" -F "yolo=1" -F "file=@dqr.jpg" -o -
210968
NEWDOC
Using YOLO, the image is analyzed and cropped into 2 distinct images which ZBar can easily decode.
Batch processing
Download the program scancode2csv.php:
IMPORTANT: Edit the constants LOGIN
and PASSWORD
in the program before trying it and select in the array $passes
the different operations carried out on each file.
Copy all the PDF and JPG, PNG or GIF to process in a folder and type the following command:
$ php -f scancode2csv.php dir file.csv
Display the results in the file file.csv with your spreadsheet.
- define('USAGE', 'php -f %s dir file.csv');
- function abort($msg, $code=1) {
- echo $msg, PHP_EOL;
- exit($code);
- }
- function usage() {
- global $argv;
- abort(sprintf(USAGE, basename($argv[0])), 1);
- }
- if ($argc != 3) {
- usage();
- }
- $dir=$argv[1];
- $csv=$argv[2];
- define('LOGIN', 'abcdef');
- define('PASSWORD', 'ABCDEF');
- define('DELIMITER', ',');
- define('ENCLOSURE', '"');
- define('ESCAPE', '\\');
- define('URL', 'https://tesseractor.com/api/v1/scancode' . '?' . 'login=' . urlencode(LOGIN) . '&' . 'password=' . urlencode(PASSWORD));
- define('SCANCODE', 'curl -s --fail --show-error -X POST "' . URL . '" -F "file=@%s" %s -o -');
- $passes=array(
- 'YOLO_0_IMAGES_1' => '-F "yolo=0" -F "images=1"',
- // 'YOLO_1_IMAGES_1' => '-F "yolo=1" -F "images=1"',
- // 'YOLO_0_IMAGES_1_RESIZE_150' => '-F "yolo=0" -F "images=1" -F "resize=150"',
- // 'YOLO_0_DPI_125' => '-F "yolo=0" -F "resolution=125"',
- );
- $filelist=@scandir($dir, SCANDIR_SORT_NONE);
- if (!$filelist) {
- abort($dir . '?');
- }
- sort($filelist, SORT_NATURAL);
- $csvout = @fopen($csv, 'w');
- if ($csvout === false) {
- abort($csv . '?');
- }
- $headline=array(false);
- foreach ($passes as $label => $arg) {
- $headline[]=$label;
- $headline[]=''; // count
- $headline[]=''; // secs
- }
- if (fputcsv($csvout, $headline, DELIMITER, ENCLOSURE, ESCAPE) === false) {
- abort($csv . '?');
- }
- foreach ($filelist as $file) {
- if ($file == '.' || $file == '..')
- continue;
- echo $file, PHP_EOL;
- $line=array($file);
- foreach ($passes as $label => $arg) {
- $cmdline=sprintf(SCANCODE, $dir . DIRECTORY_SEPARATOR . $file, $arg);
- $output=false;
- $stime=microtime(true);
- @exec($cmdline, $output, $ret);
- $etime=microtime(true);
- $line[]=$ret == 0 && $output ? implode("\n", $output) : false;
- $line[]=$ret == 0 && $output ? count($output) : 0;
- $line[]=$ret == 0 ? sprintf('%0.2f', round($etime-$stime, 2)) : false;
- }
- if (fputcsv($csvout, $line, DELIMITER, ENCLOSURE, ESCAPE) === false) {
- abort($csv . '?');
- }
- }
- exit(0);
Download the code of the sendpost
and file_mime_type
functions from the iZend library.
Copy the files in the space of your application.
NOTE: See the page Call the service API for a description of the sendpost
and file_mime_type
functions.
Add the file scancode.php with the following content:
- require_once 'sendhttp.php';
- require_once 'filemimetype.php';
Loads the code of the sendpost
and file_mime_type
functions.
- function scancode($login, $password, $file, $yolo=false, $output='file.txt', $params=false) {
Defines the function scancode
.
$login
is your identification code. $password
is your password.
$file
is the pathname of the PDF, JPEG, PNG or GIF file to convert.
If $yolo
is true
, every image is cropped around barcodes using the object detection system.
$output
is the pathname of the text file which will contains the result of the analysis of $file
.
$params
is an associative array containing the names and the values of the parameters specifying the extraction mode of each page of a PDF and the processing options of each image before analysis, e.g. array('images' => true)
.
- $curl = 'https://tesseractor.com/api/v1/scancode' . '?' . 'login=' . urlencode($login) . '&' . 'password=' . urlencode($password);
Sets $curl
to the URL of the scancode action with the identification code and the password of the user's account.
$login
and $password
must be escaped.
- $args = array(
- 'yolo' => $yolo ? '1' : '0',
- );
- $args = array_merge($args, $params);
Prepares the list of arguments of the POST.
- $files=array('file' => array('name' => basename($file), 'tmp_name' => $file, 'type' => file_mime_type($file)));
Prepares the list of files attached to the POST: file
- the PDF, JPEG, PNG or GIF to analyze with the name of the file, the pathname of the file and its MIME type.
- $response=sendpost($curl, $args, $files);
Sends the HTTP request with sendpost
.
The arguments login
and password
are already in $curl
.
- if (!$response or $response[0] != 200) {
- return false;
- }
If $response
is false
, the server is unreachable.
If $response[0]
doesn't contain the HTTP return code 200 Ok, an execution error has occurred.
In case of error, scancode
returns false.
- return @file_put_contents($output, $response[2]);
- }
Returns true
if the text returned by the request could be written to the output file, false
otherwise.
EXAMPLE
Assuming you have saved the files sendhttp.php, filemimetype.php and scancode.php in the current directory, run PHP in interactive mode, load the scancode
function and call it with your identification code and password, the pathname of a PDF, JPEG, PNG or GIF file in argument:
$ php -a
php > require_once 'scancode.php';
php > scancode('abcdef', 'ABCDEF', 'qr.png');
php > quit
Display the result :
$ cat file.txt
Comments
To add a comment, click here.