首页 > 开发 > Java > 正文

Java滚动数组计算编辑距离操作示例

2020-07-28 13:53:33
字体:
来源:转载
供稿:网友

本文实例讲述了Java滚动数组计算编辑距离操作。分享给大家供大家参考,具体如下:

编辑距离(Edit Distance),也称Levenshtein距离,是指由一个字符串转换为另一个字符串所需的最少编辑次数。

下面的代码摘自org.apache.commons.lang.StringUtils

用法示例:

StringUtils.getLevenshteinDistance(null, *)       = IllegalArgumentExceptionStringUtils.getLevenshteinDistance(*, null)       = IllegalArgumentExceptionStringUtils.getLevenshteinDistance("","")        = 0StringUtils.getLevenshteinDistance("","a")       = 1StringUtils.getLevenshteinDistance("aaapppp", "")    = 7StringUtils.getLevenshteinDistance("frog", "fog")    = 1StringUtils.getLevenshteinDistance("fly", "ant")    = 3StringUtils.getLevenshteinDistance("elephant", "hippo") = 7StringUtils.getLevenshteinDistance("hippo", "elephant") = 7StringUtils.getLevenshteinDistance("hippo", "zzzzzzzz") = 8StringUtils.getLevenshteinDistance("hello", "hallo")  = 1

Java代码:

public static int getLevenshteinDistance(String s, String t) {  if (s == null || t == null) {    throw new IllegalArgumentException("Strings must not be null");  }  int n = s.length(); // length of s  int m = t.length(); // length of t  if (n == 0) {    return m;  } else if (m == 0) {    return n;  }  if (n > m) {    // swap the input strings to consume less memory    String tmp = s;    s = t;    t = tmp;    n = m;    m = t.length();  }  int p[] = new int[n+1]; //'previous' cost array, horizontally  int d[] = new int[n+1]; // cost array, horizontally  int _d[]; //placeholder to assist in swapping p and d  // indexes into strings s and t  int i; // iterates through s  int j; // iterates through t  char t_j; // jth character of t  int cost; // cost  for (i = 0; i<=n; i++) {    p[i] = i;  }  for (j = 1; j<=m; j++) {    t_j = t.charAt(j-1);    d[0] = j;    for (i=1; i<=n; i++) {      cost = s.charAt(i-1)==t_j ? 0 : 1;      // minimum of cell to the left+1, to the top+1, diagonally left and up +cost      d[i] = Math.min(Math.min(d[i-1]+1, p[i]+1), p[i-1]+cost);    }    // copy current distance counts to 'previous row' distance counts    _d = p;    p = d;    d = _d;  }  // our last action in the above loop was to switch d and p, so p now   // actually has the most recent cost counts  return p[n];}

实际上,上述代码的空间复杂度还可以进一步简化,使用一维数组替换滚动数组。

Java代码:

public int minDistance(String s, String t) {  if (s == null || t == null) {    throw new IllegalArgumentException("Strings must not be null");  }  int n = s.length(); // length of s  int m = t.length(); // length of t  if (n == 0) {    return m;  } else if (m == 0) {    return n;  }  if (n > m) {    // swap the input strings to consume less memory    String tmp = s;    s = t;    t = tmp;    n = m;    m = t.length();  }  int d[] = new int[n+1]; // cost array, horizontally  // indexes into strings s and t  int i; // iterates through s  int j; // iterates through t  char t_j; // jth character of t  int cost; // cost  for (i = 0; i<=n; i++) {    d[i] = i;  }  for (j = 1; j<=m; j++) {    t_j = t.charAt(j-1);    int pre = d[0];    d[0] = j;    for (i=1; i<=n; i++) {      int temp = d[i];      cost = s.charAt(i-1)==t_j ? 0 : 1;      // minimum of cell to the left+1, to the top+1, diagonally left and up +cost      d[i] = Math.min(Math.min(d[i-1]+1, d[i]+1), pre+cost);      pre = temp;    }    }  return d[n];}
发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表