Comunitatea PHP Romania
 

 
levenshtein

levenshtein

(PHP 3>= 3.0.17, PHP 4 >= 4.0.1, PHP 5)

levenshtein --  Calculate Levenshtein distance between two strings

Description

int levenshtein ( string str1, string str2)

int levenshtein ( string str1, string str2, int cost_ins, int cost_rep, int cost_del)

int levenshtein ( string str1, string str2, function cost)

This function returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the limit of 255 characters (255 should be more than enough for name or dictionary comparison, and nobody serious would be doing genetic analysis with PHP).

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2. The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

In its simplest form the function will take only the two strings as parameter and will calculate just the number of insert, replace and delete operations needed to transform str1 into str2.

A second variant will take three additional parameters that define the cost of insert, replace and delete operations. This is more general and adaptive than variant one, but not as efficient.

The third variant (which is not implemented yet) will be the most general and adaptive, but also the slowest alternative. It will call a user-supplied function that will determine the cost for every possible operation.

The user-supplied function will be called with the following arguments:

  • operation to apply: 'I', 'R' or 'D'

  • actual character in string 1

  • actual character in string 2

  • position in string 1

  • position in string 2

  • remaining characters in string 1

  • remaining characters in string 2

The user-supplied function has to return a positive integer describing the cost for this particular operation, but it may decide to use only some of the supplied arguments.

The user-supplied function approach offers the possibility to take into account the relevance of and/or difference between certain symbols (characters) or even the context those symbols appear in to determine the cost of insert, replace and delete operations, but at the cost of losing all optimizations done regarding cpu register utilization and cache misses that have been worked into the other two variants.

See also soundex(), similar_text(), and metaphone().

Ultimele discutii in forum RSS Forum

Ultimele articole Ultimele articole

Top membri

Pirahna Pirahna
la birou
carco carco
Bucuresti
Birkoff Birkoff
Bucuresti
Mascka Mascka
Braila
raul_ raul_
whooper whooper
Toronto ON
mihaitha mihaitha
Sibiu
gabysolomon gabysolomon
Bacau
oriceon oriceon
Constanta
garlicinicolae garlicinicolae

Newsletter


Email:
 inscriere
 renuntare
 
 Arhiva newsletter

Parteneriat

Copyright © 2001-2008 PHP Romania Add PHPRomania to Google Add PHPRomania to Del.icio.us Add PHPRomania to Stumbleupon Add PHPRomania to Yahoo! Add PHPRomania to Digg Add PHPRomania to Blink Free Advertising | Masini | Western Union Money Order | Consolidation | Personal Loans
Ads: Partener Way2Web Nework: gazduire web | inregistrare domenii | web design | imobiliare | web hosting
Powered by Simplis