Bug #53113

Java Regex Limitation

Added by raztoki about 6 years ago. Updated about 6 years ago.

Status:ClosedStart date:09/29/2014
Priority:NormalDue date:
Assignee:jiaz% Done:

100%

Category:General
Target version:020 - Next Release 2.0
Resolution:

Description

this is in relation to #52896

I think this
crawledPackageName = crawledPackageName.replaceAll("[^a-zA-Z0-9]+", "")
is bad idea... because you are effectively stripping unicode also. Some package names could be entirely unicode characters. Java regex does not support unicode well, please see http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions (some what of a old thread/question, but I believe still relevant to this day).

my fix to (no) space issue in package names was
crawledPackageName = crawledPackageName.replaceAll("[^a-zA-Z0-9]+", " ").replaceAll("\\s{2,}", " ");
puts spaces in and makes ure there is only one white space char. In the end I also commented out the line because its bad code, for the unicode reasons above.
// crawledPackageName = crawledPackageName.replaceAll("[^a-zA-Z0-9]+", " ").replaceAll("\\s{2,}", " ");


Related issues

Related to Bug #52896: Package Name: no spaces/separators between words, unreada... Closed 09/26/2014

Also available in: Atom PDF