Author Topic: Need Help Parsing a String  (Read 4098 times)

0 Members and 1 Guest are viewing this topic.

Bill Tillman

  • Guest
Need Help Parsing a String
« on: April 13, 2015, 12:42:35 PM »
I have a bunch of data I'm working with and one of the fields is in this format:

   ABCxxXxxABCjklMZ

Or basically, the first three characters will always be string, then next one, two or possibly three characters will be integers from 0 to 9, then an upper case X followed by one, two or three integers 0 to 9. At this point of the string is where I want to discard everything to the left side of the last integer and grab anything and everything that's to the right of it. I'm pretty sure I can do this with some code, but this looks like it might be a good time for a regular expression. Not being too familiar with them I was hoping to get some advice here.

There may be times when a user puts an upper case X in one of the first three characters so if I do key in on that upper case X between the sets of integers, I'd have to take that into account. But most of the time, that upper case X will occur only between the sets of integers which may contain 1 to 3 digits each. And one set will not always be the same length as the other.

ChrisCarlson

  • Guest
Re: Need Help Parsing a String
« Reply #1 on: April 13, 2015, 01:00:23 PM »
Code: [Select]
/[A-Z]\D+$/ig

Potentially?

Should match

ABC11X11ABCjklMZ

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Need Help Parsing a String
« Reply #2 on: April 13, 2015, 05:53:50 PM »
If I read it correctly and not wanting to use regex
Then can chain some string functions.
Here is a simple console program that has a one-liner and a step by by function

one-liner
Code - C#: [Select]
  1. String.Concat(input.Substring(input.IndexOf('X', 3) + 1).SkipWhile(Char.IsNumber));
  2.  

 
one-liner function with a step by step function
Code - C#: [Select]
  1.  
  2. class Program
  3.     {
  4.         static void Main(string[] args)
  5.         {
  6.             string[] inputs = new[] { "XxC12X6ABCjklMZ", "ABC12X21ABCjklMZ", "XXX1X21ABCjklMZ" };
  7.  
  8.  
  9.             foreach (var input in inputs)
  10.             {
  11.                 Console.WriteLine(ParseField(input));
  12.             }
  13.  
  14.  
  15.             foreach (var input in inputs)
  16.             {
  17.                 ParseFieldbyStep(input);
  18.             }
  19.             Console.ReadKey();
  20.  
  21.  
  22.         }
  23.  
  24.  
  25.         static string ParseField(string input)
  26.         {
  27.             return String.Concat(input.Substring(input.IndexOf('X', 3) + 1).SkipWhile(Char.IsNumber));
  28.         }
  29.  
  30.  
  31.        
  32.         static void ParseFieldbyStep(string input)
  33.         {
  34.             Console.WriteLine();
  35.             Console.WriteLine(input);
  36.             var removeFirst3chars = input.Substring(3);
  37.             Console.WriteLine(removeFirst3chars + "  - First 3 char removed");
  38.  
  39.  
  40.             var indexOfFirstX = removeFirst3chars.IndexOf('X');
  41.             Console.WriteLine(indexOfFirstX + "  = Index of First X in \"" + removeFirst3chars +"\"");
  42.  
  43.  
  44.             var indexAfterFirstX = indexOfFirstX + 1;
  45.             var stringAfterFirstX = removeFirst3chars.Substring(indexAfterFirstX);
  46.             Console.WriteLine(stringAfterFirstX + "  - string after X ");
  47.            
  48.             int numberIndex = 0;
  49.             while (Char.IsNumber(stringAfterFirstX[numberIndex]))
  50.             {
  51.                 Console.WriteLine(stringAfterFirstX[numberIndex] + "  - is a number");
  52.                 numberIndex++;
  53.             }
  54.  
  55.  
  56.             Console.WriteLine(stringAfterFirstX[numberIndex] + "  - first non-number");
  57.  
  58.  
  59.             var result = stringAfterFirstX.Substring(numberIndex);
  60.             Console.WriteLine("Result = " + result);
  61.         }
  62.  
  63.  
  64.     }
  65.  

CADbloke

  • Bull Frog
  • Posts: 342
  • Crash Test Dummy
Re: Need Help Parsing a String
« Reply #3 on: April 13, 2015, 10:13:25 PM »
I have a bunch of data I'm working with and one of the fields is in this format:

   ABCxxXxxABCjklMZ

... I want to discard everything to the left side of the last integer and grab anything and everything that's to the right of it. I'm pretty sure I can do this with some code, but this looks like it might be a good time for a regular expression.

Code - C#: [Select]
  1. string resultString = null;
  2. try {
  3.         resultString = Regex.Replace(subjectString, @"(?:.*)(\d)(?!.*\d)(.*)", "$1$2", RegexOptions.IgnoreCase | RegexOptions.Multiline);
  4. } catch (ArgumentException ex) {
  5.         // Syntax error in the regular expression
  6. }

Do you want to grab the last integer? if yes, keep the $1 in the replacement, otherwise your replacement is $2.

See the screengrab for an explanation, that's http://www.regexbuddy.com/ and it's worth every cent. See also http://www.regexbuddy.com/regexmagic.html

edit: yay regex, of course I got it wrong on the 1st attempt. Also, added a test in AutoCAD (no, you haven't seen that plugin before) ;)

moar edit: this may be a clearer explanation (RegexBuddy generated this)...
Code - C#: [Select]
  1. @"
  2. (?:        # Match the regular expression below
  3.   .          # Match any single character that is NOT a line break character (line feed)
  4.      *          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  5. )
  6. (          # Match the regex below and capture its match into backreference number 1
  7.   \d         # Match a single character that is a “digit” (any decimal number in any Unicode script)
  8. )
  9. (?!        # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
  10.   .          # Match any single character that is NOT a line break character (line feed)
  11.      *          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  12.   \d         # Match a single character that is a “digit” (any decimal number in any Unicode script)
  13. )
  14. (          # Match the regex below and capture its match into backreference number 2
  15.   .          # Match any single character that is NOT a line break character (line feed)
  16.      *          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  17. )
  18. "
  19.  
« Last Edit: April 14, 2015, 04:30:57 AM by CADbloke »

Bill Tillman

  • Guest
Re: Need Help Parsing a String
« Reply #4 on: April 17, 2015, 07:49:17 AM »
Thanks. You guys are awesome. I'm reading lots more about regex stuff this week and it's really cool.

ChrisCarlson

  • Guest
Re: Need Help Parsing a String
« Reply #5 on: April 17, 2015, 07:57:24 AM »
I just can't understand regex but this site http://www.regexr.com/ has helped me quite a bit figuring individual ones out

CADbloke

  • Bull Frog
  • Posts: 342
  • Crash Test Dummy
Re: Need Help Parsing a String
« Reply #6 on: April 19, 2015, 08:48:50 AM »
this site http://www.regexr.com/ has helped me quite a bit figuring individual ones out
FYI there are differences between the different Regex Engines, that one uses the  Javascript parser, the .NET engine is slightly different in places. http://www.regular-expressions.info/ is a great site by the guy who wrote RegexBuddy. http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075 has more information than you could probably ever read.
« Last Edit: April 19, 2015, 08:53:55 AM by CADbloke »

Tuoni

  • Gator
  • Posts: 3032
  • I do stuff, and things!
Re: Need Help Parsing a String
« Reply #7 on: April 20, 2015, 06:11:24 AM »
Quote
Some people, when confronted with a problem, think “I know, I'll use regular expressions.”  Now they have two problems.

 :whistling:

CADbloke

  • Bull Frog
  • Posts: 342
  • Crash Test Dummy
Re: Need Help Parsing a String
« Reply #8 on: April 28, 2015, 06:12:20 AM »
Quote
Some people, when confronted with a problem, think “I know, I'll use regular expressions.”  Now they have two problems.

 :whistling:
heheh, for more on that ... http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

BTW, it's a lot more  than 2 - http://blog.codinghorror.com/regex-performance/http://www.regular-expressions.info/catastrophic.html

TLDR; it is entirely feasible that a badly done Regex can take days to compute. Whups.

.NET 4.5 has a timeout property because of this. Don't let that scare you, I have found them to be fast and I still have all my fingers.

BlackBox

  • King Gator
  • Posts: 3770
Re: Need Help Parsing a String
« Reply #9 on: June 26, 2015, 12:29:11 PM »
Disclaimer: Newest post I could find that seemed relevant.



Just wanted to share this site, which yielded a usable RegEx 'match' Method in seconds - highly recommend :

http://txt2re.com/index-csharp.php3
"How we think determines what we do, and what we do determines what we get."