Reports of the Academy of Sciences of the USSR
CYBERNETICS
Submitted 1966-01-01 | RussiaRxiv: ru-196601.54315 | Translated from Russian

Full Text

Reports of the Academy of Sciences of the USSR
1966. Volume 167, No. 5

519.95

CYBERNETICS
AND CONTROL THEORY

B. N. KOZINETS, R. M. LANZMAN, V. A. YAKUBOVICH

FORENSIC EXAMINATION OF SIMILAR HANDWRITINGS BY MEANS OF ELECTRONIC COMPUTERS

(Presented by Academician V. I. Smirnov on 20 VII 1964)

1. It is known that one of the most difficult and theoretically insufficiently studied types of forensic examination is handwriting examination. The graphic examination of similar handwritings presents particular difficulty. Expert practice knows many cases in which the expert either refuses to give an opinion or, in cautious form, expresses an assumption about the possible writer. The present communication is devoted mainly to the problem of differentiating similar handwritings.

Let us note that a computer operating according to the program described below is an example of a learning machine; in recognition it does not use any features laid down in it in advance, taken, for example, from expert practice, but, as it were, develops them itself, learning recognition according to the algorithm described below on the training sequence shown to it.

2. We introduce the following terminology: a graphic object is a concrete graphic image (this may be a word, signature, numeral, letter, part of a letter, etc.); metrization is a concrete method of converting a graphic object into a sequence of numbers; a handwriting variant is a sequence of numbers corresponding (under a fixed metrization) to a given graphic object; handwriting space is the set of all possible handwriting variants corresponding to all possible graphic objects under a fixed metrization; a training sequence is a finite collection of graphic objects or handwriting variants with an indication of the person who executed each object; a recognition sequence is a finite collection of objects (handwriting variants) with respect to which, generally speaking, it is unknown by which person the objects were executed.

After metrization has been carried out, the problem of differentiating similar handwritings is as follows. There are: a) a recognition sequence \(N=\{x_1,x_2,\ldots,x_p\}\), with respect to which it is known that each of its handwriting variants was executed by one of the persons \(A_1,A_2,\ldots,A_k\); b) a training sequence \(R_i=\{x_1^{(i)},x_2^{(i)},\ldots,x_{m_i}^{(i)}\}\) \((i=1,2,\ldots,k)\), where \(x_h^{(i)}\) were executed by person \(A_i\). It is assumed that all handwriting variants \(x_h, x_h^{(i)}\) are of the same type (the corresponding graphic objects are different renderings of one word, signature, including forgeries, etc.). It is required to establish by which person \(A_i\) each handwriting variant \(x_j\) of the recognition sequence was executed. Without loss of generality, in what follows we shall take \(k=2\).

Let us describe the adopted method of metrization.* On each graphic obj—

* The most natural metrization would be the following. Each graphic object is projected onto a screen divided into a large number \(n\) of small squares. The handwriting variant would then be a set of numbers \((\xi_1,\xi_2,\ldots,\xi_n)\), where \(\xi_i\) is the degree of blackening of the \(i\)-th small square. However, this method of metrization was not used for the following reasons: first, it is an extremely uneconomical recording of information,

…in the order of execution of the graphic tracing, \(n\) points \((\zeta_1,\zeta_2,\ldots,\zeta_n)\) were placed (the same for all graphic objects). The points were either uniformly distributed over the entire graphic object, or (in other experiments) certain quite definite points were always located in the corresponding characteristic places, while the remaining ones were uniformly distributed between them*. Thus, a handwriting variant is a set of \(2n\) coordinates of the points \(\zeta_1,\zeta_2,\ldots,\zeta_n\), and the handwriting space is \(2n\)-dimensional Euclidean space.

  1. Let \(S_1\) and \(S_2\) be sets of handwriting variants corresponding to all possible graphic objects of the same type executed by two different persons \(A_1\) and \(A_2\). Consider all possible handwriting variants \(x=(\zeta_1,\ldots,\zeta_n)\in S_1\) and (or \(\in S_2\)). The presence of a stable writing stereotype means a certain regularity in the arrangement of the points \(\zeta_1,\zeta_2,\ldots,\zeta_n\), which in imprecise terms means the following. If the point \(\zeta_1\) is located arbitrarily on the plane, then \(\zeta_2\) is in a certain region depending on \(\zeta_1\), the point \(\zeta_3\) is in a certain region depending on \(\zeta_1\) and \(\zeta_2\), and so on. The totality of these regions is precisely the characteristic of the writing stereotype of the given person for the given type of graphic material (signature, fixed word, etc.). We shall now formulate a rigorously accepted hypothesis on the character of the writing stereotype of a given person. Let \(M_j\) \((j=1,2,\ldots,n(n+1)/2)\) be a set of convex sets on the plane. Let \(M=\{\zeta\}\) be some set of points \(\zeta\) on the plane. By \(M(\zeta_0)\) we shall denote the set shifted by the vector \(\zeta_0\), i.e. \(M(\zeta_0)=\{\zeta+\zeta_0\}\).

Hypothesis I. To each writing stereotype of a fixed person and a fixed type of graphic objects there corresponds a set of convex sets \(M_j\) \((j=1,2,\ldots,n(n+1)/2)\) such that, for the points \(\zeta_j\) of graphic objects of the same type executed by one and the same person, the relations
\[ \zeta_1\in M_1,\quad \zeta_2\in M_2(\zeta_1)\cap M_3,\quad \zeta_3\in M_4(\zeta_2)\cap M_5(\zeta_1)\cap M_6,\ldots \]
hold. (It is possible that some of the sets \(M_j\) coincide with the whole plane or with the entire screen.)

From Hypothesis I it follows easily (see (1)) that the sets \(S_1\) and \(S_2\) are convex**. In addition, the following natural assumption was adopted.

Hypothesis II. The sets \(S_1\) and \(S_2\) are open, and the distance between them is greater than zero**.

The idea of the algorithm used for recognition is based on finding, with some accuracy, the shortest distance between the convex hulls of the training sequences \(R_1\) and \(R_2\), and drawing through the midpoint of the segment \(\delta\) realizing this distance a plane \(P\) orthogonal to the segment \(\delta\). For \(x\in N\) the algorithm gives the answer \(x\in S_1\) if the handwriting variant \(x\) lies on the same side of the plane as \(R_1\), and \(x\in S_2\) otherwise. In addition, for the purposes of convenience in training, economy of memory, and also convenience in modeling the algorithm on specialized—

since, for the required accuracy, a very fine partition of the screen is needed; secondly, the sequence of execution of a real graphic tracing is not taken into account; and, finally, what is most essential, in the handwriting space the structure of the set of handwriting variants corresponding to the graphic objects of one person is apparently sufficiently complex, and therefore this set is difficult to restore from the training sequence, i.e. it is difficult to find out to which set the handwriting variant under study belongs.

* Characteristic points were points of local maximum, minimum, corner points, and others.

** For what follows, it is sufficient that the convex closed hulls of the sets \(S_1\) and \(S_2\) do not intersect.

*** It is assumed that the graphic object has reasonable dimensions, i.e. cannot be arbitrarily small; otherwise, of course, the distance \(\rho(S_1,S_2)=0\). The case of a “technical forgery,” when, for example, one signature is copied from another (in this case also \(\rho(S_1,S_2)=0\)), is not of interest, since the expert will easily detect such a forgery from the slowed movements, stopping points of the writing instrument, etc.

in the machines under consideration, an additional requirement of recurrence is introduced, which consists in the fact that the machine reproducing this algorithm is not given the entire training sequence at once, but element by element. The recurrent algorithm for finding the shortest distance (the learning algorithm) is as follows.

Denote by \(L\) the infinite sequence consisting of the multiply repeated sequence
\(x_1^{(1)}, \ldots, x_{m_1}^{(1)}, x_1^{(2)}, \ldots, x_{m_2}^{(2)}\), whose elements we denote by \(z_1, z_2, \ldots\). Let \(x_1=x_1^{(1)}, y_1=x_1^{(2)}\). Suppose that \(x_k, y_k\) have been constructed. Take \(z_k\). For definiteness let \(z_k\in S_1\). (If \(z_k\in S_2\), then in the following formulas everywhere replace \(x\) by \(y\), and \(y\) by \(x\).) Then set \(y_{k+1}=y_k\). If
\[ (z_k-x_k,\, y_k-x_k)<(1/T)(y_k-x_k,\, y_k-x_k), \]
where \(T>0\) is a prescribed number, then \(x_{k+1}=x_k\); otherwise \(x_{k+1}\) is computed by the formulas \(x_{k+1}=z_k\), if
\[ (y_k-z_k,\, x_k-z_k)\leq 0, \]
\[ x_{k+1}=z_k+\frac{(z_k-y_k,\, z_k-x_k)}{(z_k-x_k,\, z_k-x_k)}(x_k-z_k), \]
if
\[ (y_k-z_k,\, x_k-z_k)>0. \]
The stopping conditions for the algorithm are: either
\[ (x_j-y_j,\, x_j-y_j)<\varepsilon^2, \]
where \(\varepsilon\) is a prescribed small number,* or
\[ x_j=x_{j-(m_1+m_2)},\qquad y_j=y_{j-(m_1+m_2)}. \]
The above-mentioned plane \(P\) is drawn through the midpoint of the segment \(x_jy_j\), obtained after the algorithm has stopped. The stopping condition in the second case means that this plane certainly separates the convex closed hulls of the sets \(R_1\) and \(R_2\).

Theorem. Suppose that hypotheses I, II are satisfied and that the elements of the training sequence and the element \(x\) of the sequence for recognition are chosen randomly and independently in accordance with some probability density prescribed on the handwriting space. Then the algorithm converges in a finite number of steps for any \(T\geq 2\) and for any \(\varepsilon>0\) one can indicate an \(m_0\) such that, when \(m_1+m_2\geq m_0\), the probability of correct recognition of the element \(x\) is greater than \(1-\varepsilon\).

In addition to the above, an algorithm not using hypothesis I was applied for recognition; it gives the answer \(x\in S_1\) if
\[ \min_{R_1}\lvert x-x_j^{(1)}\rvert<\min_{R_2}\lvert x-x_j^{(2)}\rvert, \]
and \(x\in S_2\) otherwise. The experiments showed that this algorithm gives a worse result (by approximately 7–16%) than the one described.

  1. For the experiments, genuine and forged signatures of employees of the Vilnius Scientific Research Institute of Forensic Examination were taken. (The forgery was carried out after training by handwriting experts A. Dombrauskaitė and Ya. Ignat’eva.) The training sequences consisted of 17–20 genuine signatures and approximately the same number of forged signatures; the sequence for recognition contained, written in unknown order, 50–60 genuine and forged signatures not included in the training sequences. For comparison with the work of the machine this material was given to experts of the following forensic institutions: the Leningrad Scientific Research Laboratory of Forensic Examination, the scientific-technical department of the Administration of Internal Affairs of the Regional Executive Committee, and the scientific-technical group of the traffic police department of the Ministry for the Protection of Public Order of the RSFSR.***

* Such a case may be interpreted as the machine’s refusal to conduct the examination. In this case the distance between the convex closed hulls of the sets \(R_1\) and \(R_2\) is very small (for sufficiently small \(\varepsilon\)), for example, when the sets \(S_1\) and \(S_2\) intersect and, consequently, hypothesis II is false. In the experiments conducted, when \(\varepsilon=10^{-10}d\), where \(d=10\) cm, this case did not occur.

** The larger \(T\) is, generally speaking, the smaller the training sequence may be, but the greater the computing time on the electronic computer.

*** We note that the examination was conducted for experimental purposes with the obligatory condition that no conclusion be given. In expert practice, in all doubtful cases the expert refuses to give a conclusion.

Table 1

Signature Recognition percentage: experts Recognition percentage: machine Signature Recognition percentage: experts Recognition percentage: machine
Medzyavichyus 58,3; 68,3; 70 88 Chyapas 75; 80 84,2
Shtromas 75,4; 78,9; 80,7 91,2 Poshkyavichyus 90; 92 100

The results of the experiments presented in Table 1 attest to the fundamental possibility of using electronic computers for the forensic examination of similar handwritings.

The authors express their gratitude to the experts of the institutions listed above. The results of the experiments, using a larger amount of graphic material, will be published in Collection No. 2 of the Lithuanian Scientific Research Institute of Forensic Examination, under whose plan this work is being carried out jointly with the Computing Center of Leningrad University.

Lithuanian Scientific Research
Institute of Forensic Examination
Vilnius

Received
17 VII 1964

REFERENCES

  1. V. A. Yakubovich, Machines that learn to recognize patterns, in: Collection. Methods of Computation, Leningrad State University Press, issue 2, 1963.

Submission history

Reports of the Academy of Sciences of the USSR