Click to See Complete Forum and Search --> : Perl - need regex help
iDxMan
05-01-2001, 10:45 PM
I've suddenly gone brain dead. -> Please help! :D
Here's what I need out of this string.
sample data:
Identifier: Some extra crap in the line: More crap: STANZA1=WORD1 WORD2 WORD3 STANZA2=STUFF123 STANZA3=MORE STUFF STANZA4=4327780917801 STANZA5=03/01/2001
I need words1-3 in stanza1 -- although there might be 1 or 2 at times.
The stanza's are the same, but they aren't always there so I can't depend on #2 being there in order to grab everything in between.
There are only 5 possibilities for the stanza's so I could include them all, but who wants that?
So far this works *IF* #2 is there:
($var) = /.+STANZA1=(.+)STANZA2=/;
so $var is "WORD1 WORD2 WORD3"
If #2 isn't there I don't get the data. In various other tests I get the data, but the next stanza name is in there too.
Any ideas?
Thanks...
Mikey123
05-01-2001, 11:31 PM
how about:
(@match)= split(/STANZA\d+=/);
it leaves you a null first element but splits it up correctly after that i believe.
[ 01 May 2001: Message edited by: Mikey123 ]
iDxMan
05-02-2001, 07:00 AM
That would work if the stanza's were actually named accordingly. ack! Guess I should have mentioned that..
They are something like:
NAME=
DOB=
SSN=
PROVIDER ID=
MEMBER ID=
The NAME= always seems to be first and that's the one I need. So far, depending on the situation, either MEMBER ID or SSN is next..
-r
[ 02 May 2001: Message edited by: iDxMan ]
iDxMan
05-02-2001, 07:30 AM
This seems to work, but I would rather have it look for the next stanza after NAME -- not hard code ssn and member id in.. Who knows if the return data will change..
($var) = /.+NAME=(.+)(SSN=|MEMBER ID=)/;
YaRness
05-02-2001, 09:07 AM
Originally posted by iDxMan:
<STRONG>
Identifier: Some extra crap in the line: More crap: STANZA1=WORD1 WORD2 WORD3 STANZA2=STUFF123 STANZA3=MORE STUFF STANZA4=4327780917801 STANZA5=03/01/2001
I need words1-3 in stanza1</STRONG>
did this with perl:
$_ = 'Identifier: Some extra crap in the line: More crap: STANZA1=WORD1 WORD2 WORD3 STANZA2=STUFF123...';
/STANZA1=((?:[^\s]+\s+){1,3})/ and $var = $1;
print $var;
that will match if there is at least one word (a word in this case being any non-whitespace characters followed by one or more whitespaces) in STANZA1, and will pull out up to three words.
iDxMan
05-02-2001, 10:02 AM
Thanks Yarness... << But its a bit too greedy.
$_ = 'Identifier: Some extra crap in the line: More crap: STANZA1=WORD1 WORD2 STANZA2=STUFF123...';
/STANZA1=((?:[^\s]+\s+){1,3})/ and $var = $1;
Now $var = "WORD1 WORD2 STANZA2"
or if stanza1 only has 1 value then $var is "WORD1 STANZA2=STUFF123"
One other problem is that stanza2 can be in the form of:
STANZA2=DATA
or
STANZA2 ID=DATA
I'll play around with that for a bit.. Its a start. :D
-r
[ 02 May 2001: Message edited by: iDxMan ]
Mikey123
05-02-2001, 10:08 AM
(@match)= split(/\w+=/);
shift @match;
split on a series of word characters followed by an equals sign then shift off the first empty element. If there is a space in your identifier you will have to modify the above regex a little. It might be easier to ensure you don't use spaces ie Member_ID= instead of Member ID
Thinking a little more about the "MEMBER ID" thing I think you will either have to ensure no spaces are used for identifiers or seperate each block with a comma or something. Otherwise there is no way of knowing if MEMBER is part of the previous stanza or part of the upcoming identifier.
[ 02 May 2001: Message edited by: Mikey123 ]
YaRness
05-02-2001, 10:22 AM
yup, sure did goof that one up. replace the regex with
/STANZA1=((?:[^\s]+\s+)+)STANZA2=/
it will get all the words, or none if there are none (tested it too!). you can substitute the bolded part here, ((?:[^\s]+\s+)+), for something like {1,3} if you want between 1 and 3 words, or {2,} i think if you want 2 or more words. see the perlre man page for more info ('perldoc perlre' or 'man perlre', or browse to it through your activestate documentation... whichever works for ya).
iDxMan
05-02-2001, 10:24 AM
Originally posted by Mikey123:
<STRONG>
(@match)= split(/\w+=/);
shift @match;
split on a series of word characters followed by an equals sign then shift off the first empty element. If there is a space in your identifier you will have to modify the above regex a little. It might be easier to ensure you don't use spaces ie Member_ID= instead of Member ID
</STRONG>
Thanks for the ideas.. I'll poke around..
For other parts of the data I have been converting space to an underscore for ease of use.. But this part is a bit different.
data:
Date of Birth: 01/01/1970
storage:
$key{DATE_OF_BIRTH} = "01/01/1970";
-r
Mikey123
05-02-2001, 10:34 AM
I edited my previous post to say this but if you do not either delimit the blocks or use _ for items like MEMBER_ID it will be impossible to know if "previousdata Stanza2" is an identifier or if "previousdata" belongs to the previous group.
iDxMan
05-02-2001, 10:37 AM
Originally posted by Mikey123:
<STRONG>Thinking a little more about the "MEMBER ID" thing I think you will either have to ensure no spaces are used for identifiers or seperate each block with a comma or something. Otherwise there is no way of knowing if MEMBER is part of the previous stanza or part of the upcoming identifier.
</STRONG>
Exactly.. Now the real fun begins when they start to change the format of the return data. :)
(`they` being the company who returns result data for us to update our system with.)
Boy I love last minute hacks due to new software that doesn't have the same features as the old version. lol
-r
iDxMan
06-07-2001, 12:02 PM
Old post, but I finally found time to look at it again, so here's an update.. This seems to work best so far..
I kept running into the problem of matching one but not the other.
ie:
NAME=FIRST MID LAST STANZA2=THIS
but not
NAME=FIRST MID LAST STANZA1 IDNUM=THIS
or it being too greedy..
ie: NAME=FIRST LAST STANZA1 IDNUM=THIS
would grab "FIRST LAST STANZA1" as the value..
anyways:
Probably could combine this into 1 regex, but this will have to do..
-r
if(/.+NAME=((\w+\s){1,3})(\w+\s\w+\=)/)
{
($name) = $1;
} else {
($name) = /.+NAME=((\w+\s){1,3})(\w+\=)/;
}