Click to See Complete Forum and Search --> : First newbie code project - coding a simple search engine w/ Perl.


njcajun
06-07-2001, 01:45 PM
Hi folks,

I got hold of some really buggy search engine code written in Perl, and I have just some really basic questions to help me weed through a couple of pieces of code. I'm not really a Perl programmer, but it's a really simple search engine that just searches a text file colon-delimited database.

I don't have a Perl reference here at the office, otherwise I'd consult it instead of wasting the precious time of everyone here, but maybe others can benefit from the answers...

-- What is '$_' ?
-- What do the 'pack' and 'hex' functions do ?
-- How is this translated into English: 's/%(...)/ /'
-- What is '/i' as in 'if ("$string" =~ /$url/i)...' and what's the '=~' do in that line?

I got the regex parts, and the loops, logical operators, scope and all that stuff looks pretty much like C or Korn. There is some other different stuff, but I've found most of it. The explanation I saw of '$_' sucked though.

Links are welcome - I'm really not so lazy that I need all the questions answered - I'll RTFM, just point me in the right direction.

Thanks.

EyesWideOpen
06-07-2001, 02:29 PM
Originally posted by njcajun:
-- What is '$_' ?

The default input and pattern-searching space.
http://dmitry.dn.ua/library/oreilly/link/perl/prog/ch02_09.htm

-- What do the 'pack' and 'hex' functions do ?

pack (http://www.perl.com/pub/doc/manual/html/pod/perlfunc/pack.html)
hex (http://www.perl.com/pub/doc/manual/html/pod/perlfunc/hex.html)

-- How is this translated into English: 's/%(...)/ /'

This means: substitute the first pattern that matches the one in the parentheses (sp?) with a space.

-- What is '/i' as in 'if ("$string" =~ /$url/i)...' and what's the '=~' do in that line?

'/i' means to match the pattern case-insensitively. i.e., match lowercase and uppercase characters alike.

The '=~' (pattern-binding operator) tells Perl to look for a match of the regular expression '$url' in the variable '$string'.

[ 07 June 2001: Message edited by: EyesWideOpen ]

Gotenks
06-07-2001, 02:31 PM
Damn, You beat me to it EyesWideOpen :)

:cool: Perl :cool:

Gotenks

[ 07 June 2001: Message edited by: Gotenks ]

njcajun
06-07-2001, 02:38 PM
Awesome!

Haven't checked out the links yet - but it's gotta be better than what I have so far.

After reviewing the code, I think I'm actually gonna have to pretty much rewrite the damn thing. Unfortunately, it's the only SIMPLE search engine I could find that didn't have all kinds of crazy header files, agents and all that stuff that I'm not really ready for.

I would've much rather done this in C. Can you write CGI in C, or is it a SSI at that point or what?

TheLinuxDuck
06-07-2001, 02:52 PM
Originally posted by njcajun:
<STRONG>-- What is '$_' ?</STRONG>

$_ is a default perl variable that is designed to be used in place of defining/declaring your own. Many of the functions in perl will use this variable if no variable is specified. It's kinda like a lazy man's dream..

Take this loop:

for(@array) {
print "Array item:";
print;
# same as:
# print $_;
print "\n";
# more efficient as print "Array item: $_\n";
}


The for(@array) steps through each item of the array @array and assigns the value to $_. The second print isn't given a variable, so it prints what is in $_.

Or:

$_="Big brown bags of blue bile";
my(@array)=split /\s/;
# same as:
# my(@array)=split /\s/, $_;


Split is given a regexp used to split a string (\s is at a space), but since it isn't given a variable, it assumes $_.
[/code]

<STRONG>
-- What do the 'pack' and 'hex' functions do ?</STRONG>

pack is a way to push variables of different types (int, long, single bytes, bits, hex values, etc) onto a scalar. The pushed items will not be a number/etc in string format, but the actual bytecode of the item. For example:

my($string)=pack "I", 19000;
print $string, "\n";

returns

8J



ascii(J)=74 * 256 =
18944
ascii(8)=56 + 56
-----
19000


hex converts hexadecimal values into decimal values:

~&gt; perl -e "print hex \"1A9BB9\", \"\n\""
1743801


So, 0x1A9BB9 is 1743801.

<STRONG>
-- How is this translated into English: 's/%(...)/ /'
</STRONG>

By itself, that line would be saying "Find the first occurance of a percent sign and any three other characters (not a newline) in the default variable $_, and replace it with a space." The () are usually used to remember what values were found. Am I safe in assuming that a line coming next in the script uses the variable $1?

<STRONG>
-- What is '/i' as in 'if ("$string" =~ /$url/i)...' and what's the '=~' do in that line?</STRONG>

=~ is a comparison operator, when using regexps. The i tells the comparison to ignore case.

I hope this help!

TheLinuxDuck
06-07-2001, 02:55 PM
Guess I spent too much time typing that out.. I done got beat to it!!

Sorry for the double answer post!

EyesWideOpen
06-07-2001, 03:06 PM
Originally posted by TheLinuxDuck:
<STRONG>Guess I spent too much time typing that out.. I done got beat to it!!

Sorry for the double answer post!</STRONG>

Never can have too much info. ;)

TheLinuxDuck
06-07-2001, 03:12 PM
Originally posted by EyesWideOpen:
<STRONG>Never can have too much info. ;)</STRONG>

True.. true.. the funniest part is that I recently ranted in the offtopic forum about people who post the same info that someone else posted in a thread.. but I just found out how that can happen.

::sigh::

TheLinuxDuck
06-07-2001, 03:15 PM
njcajun:

Yes, you can use C to code CGI's. Many people do.. the main reason that alot of people use perl is that it is very easy to write a fairly complicated CGI in perl, whereas C has it's nuances that must be catered to that make it not as ideal for quick projects, and the like. C also requires it to be compiled for the same type of system that the web server is on, but perl will generally work on most systems.

Now, don't get me wrong, it can be written out quickly once you know what you're doing.. perl is just easier to do so.

By using C code, though, you are gain some speed in certain places that you can't get with perl, for instance the interpreter load time (not an issue if web server is compiled with mod_perl), and C code generally runs faster (from what I understand).

There's my .001 cents.

njcajun
06-07-2001, 03:30 PM
I know perl is a more popular language for creating CGIs, but I HATE fscking Perl! No offense to those who use it all the time and love it, I just personally can't stand it.

No matter how simple a thing you're doing, Perl always seems to make it look way sloppier than it needs to be, or sloppier than it would be in another language. I know some of the sloppiness is the regex, but some of it isn't.

In my eyes, code is easier to read when variables are declared before they are used, and must be called by name in order to reference it's value. This is apparently not the case in Perl. That's great if you are already good at Perl and you're lazy, but it makes it a bit harder to dig into if you're a newbie.

I like the more structured world of C. Yeah, there are a lot of things that are a pain in the *** in C, but at least (to me), the code does more to explain itself, which benefits the less experienced a lot.

You guys have been a great help, and I appreciate it.

TheLinuxDuck
06-07-2001, 03:49 PM
njcajun:

I can totally relate to what you're saying. Perl does have some things you can do to force better habits.


#!/usr/bin/perl -w
use strict;


Those two should be used in every perl code. They are like an english teacher. For example, they force you to declare variables before use (the exception to this would be filehandles).

Of course, just because perl LETS you be sloppy, it doesn't mean you SHOULD be sloppy.

Approach the perl code as you would C code. The main different you'll find is the way that functions are defined and handled. But, you can do some things similar to C.

For example:


int testAdd(int x1,int x2);

int testAdd(int x1, int x2)
{
int sum;
sum=x1+x2;
return sum;
}

in perl:

sub testAdd($$);

sub testAdd($$)
{
my(x1,y1)=@_;
my(sum);
sum=x1+y1;
return sum;
}


Certainly this isn't a great example, but it just shows you how you can format your perl code to be more structured.

I'm in the same boat that you are, in that I come from a very structured C background into perl. Now, I love perl, because I've gotten to know some of it's nuances and tricks. And, yes, occasionally, I slip and don't structure it very well...

but, as far as declaring variables up front.... c++ and java both let you do so anywhere in the code. So, that isn't such a taboo as it might have been in pre-c++ days.

Anyhow, I'd say to you: Don't give up on perl just because it appears ugly. It's powerful and flexible, and can be used to do soem things that C isn't very ... nice... about.. regexps for example.. and regexps are going to be ugly in any language... (^=

M2CW

Ben Briggs
06-07-2001, 08:05 PM
njcajun:

You smelly peice of rat infested swap goo, I just answered this over at LNB!


NO DOUBLE POST
NO DOUBLE POST


I'm just kidding, but I did answer over at LNB.

njcajun
06-07-2001, 11:20 PM
I'm SOOOOO Busted (note capital 'B'). :eek:
:o
I posted at LNB, and after a FEW HOURS of not even getting VIEWED, I posted it here, because I never have to wait that long. Sure enough, my questions were answered in less than ONE hour.

At least I didn't cross post forums at either site. That would really be horrible. HEY - YOU CROSS ANSWERED!!! :rolleyes:

[ 07 June 2001: Message edited by: njcajun ]

[ 07 June 2001: Message edited by: njcajun ]

jemfinch
06-08-2001, 01:29 AM
Originally posted by njcajun:
I know perl is a more popular language for creating CGIs, but I HATE fscking Perl! No offense to those who use it all the time and love it, I just personally can't stand it.

No matter how simple a thing you're doing, Perl always seems to make it look way sloppier than it needs to be, or sloppier than it would be in another language. I know some of the sloppiness is the regex, but some of it isn't.


I agree. That's why I use Python, not Perl. I'm sure if you look into it, you'll find that it gives you the speedy development of perl without the sloppiness or ugly code that generally entails. Take a look at it, you won't regret it.

Jeremy