A blog to help novice Factor developers (including myself) and those more familiar with other programming languages with Factor.

Tuesday, February 27, 2007

Practical Factor - writing a reddit reader and some other general factor comments

Practical Factor - writing a reddit reader and some other general factor comments

For the last month or so, I have really been wanting to have a cleaner way to access the reddit feeds other than taking the time to open up a browser and browse to the various reddit locations. I knew writing a simple reader wouldn't take more than a couple of hours and haven't really worked with Factor's socket io libraries so choosing to use Factor to implement this wasn't even up for debate. (Actually, I thought for a sec I might use Python, but I knew the Factor code would turn out to be much cleaner)

I had my first little unit test in about 1 line of code (ie; "http://theurl.com" http-get), development time = 20 secs, hehe. I am being a little dishonest because this didn't work right off the bat, there was a small issue with the windows http/io library in 0.87, that is fixed with the code in the repository (a non-issue which I will discuss a little bit later).

In all honesty, using 'http-get' wasn't too far from all that I needed to write the reader. The http-get word takes an input URL string and returns status-code, the returned HTTP headers, and the content as a string. So if you wanted to connect to google at the Factor prompt, you would just do: "http://www.google.com" http-get, the following might end up on the stack: "<html>...google content...</html> H{ # # # } 200

Not complicated at all. That pretty much covers the actual url request; parsing the reddit RSS feed is about as simple. I based the following implementation on the 'yahoo-search' library (discussed in an earlier Factor blog); so the XML parser implementation made up all of about 5 lines of code.

The URL request code:

: feed-redditer ( url name -- )
"n...reading feed [ " write write " ]" print
http-get 2nip string>xml parse-reddit print-reddit ;

: redditer ( -- )
#! Reddit call without loop and without support for proxy
"
http://reddit.com/new" "home.new" feed-redditer ;

Parsing the RSS XML String returned from the get request into an array of tuples:

TUPLE: reddit-result title link description ;

: parse-reddit ( xml -- )
#! Parse the XML tags and output to the result tuple
"item" tags-named* [
{ "title" "link" "description" }
[ tag-named children>string ] map-with
first3 <reddit-result>
] map ;

Lets take this couple of lines apart, first identify what looks familiar. xml, the strings "item" "title" "link" "description"

Assume an XML string will be on the stack as input to the this 'parse-reddit' word definition. item, title, link, description; those look like

the rss tags for each item in a feed. For example:

<item>

<title>Major Bush Scandal. Bush admin funneling money to Sunni groups linked to Al Qaeda.</title>
<link>http://reddit.com/goto?rss=true&id=16v5k</link>
<dc:date>2007-02-27T12:38:47.802517-05:00</dc:date>
<description><a href="http://skeletonproject.com/2007/02/27/major-bush-scandal-bush-admin-funneling-money-to-sunni-groups-linked-to-al-qaeda/">[link]</a><a href="http://reddit.com/info/16v5k/comments">[more]</a></description>
</item>
I don't know what the word tags-named* is used for, but I can easily look up with the search for help (personally I like tags-named* help)
Word Description for tags-named*
returns a sequence of all tags of a matching name, recursively searching children and children of children
tags-named* ( tag name/string -- tags-seq )
And similary, lookup tag-named and children>string.
The map-with (variant of map) combinator is something you might have seen in Lisp. Here is the Lisp definition for map which works similarly in Factor:
Given a function and one or more lists, mapcar applies the function successively to the lists' elements in order, collecting the results in a new list.
Practical Usage
The image below is of the output of the reader. It reads four of the reddit feeds and updates every 30 seconds. And because the feeds are output to the console, it at least looks like you are working on something important.
The code provided below is a little bit more verbose than needs to be, because I work at Fort Knox I had to modify the existing http-client code in order to bypass my web-proxy; so if you are behind some firewall or some other impediment to information, the added code is needed to support HTTP/1.1 and proxy connections.
Figure A: Output of the reddit reader, updates reddit feeds every 30 seconds and outputs to the display

: print-reddit ( seq -- )
#! Pretty print the sequence into a small table
"==== Printing Reddit Feed ====" print
[ " Title:" write reddit-result-title print ] each
"==== End Feed ====n" print ;

Full RSS Reader Client Source (extra code may not be needed if not behind some web proxy)

http://docs.google.com/Doc?id=dq6cjjg_33vz4qh3

Factor is just cool

I remember my first coding experiences with Basic when I was about 10 (started out doing this in daycare); While working Basic is not overly exciting, it is exciting to type in some commands and have those commands turn into flashing graphics and beeps and sounds. From having zero exposure to computers and then to making the machine actually do something was an eye-opening experience, especially as a kid. I later moved onto the Logo programming language also for the purpose of creating graphics. It felt a little cleaner and as opposed to simple block graphics with Basic I could actually create pretty sophisticated images with Logo. Anyway, working with both of them was still more rewarding than playing with my lackluster lego set. Back to the present, working with Factor brings some back memories of the. Working with an exciting new tool that encourages a new way of thinking to accomplish a particular task, working with clean, concise code, interacting with a REPL that is hands down more innovative than a lot of the environments that are out there now. It is just cool. Practical too; so far it seems like it has been ported to a majority of the major platforms and from reading the blogs, it seems like each port only takes a couple of days. Hell, it takes a couple of days to weeks to port a J2EE/Java application to a different server let alone a different platform. If you asking about, "What large applications have been written with Factor"; that is an easy one, the Factor code base and compiler are mostly written in Factor code so there is a large code base to use as a case study.

Figure 1: get help by just typing in the particular word

and help (ie dup help will display information on dup)

And because google blogs can't handle the formatting, here is the google.docs version: You guys really need to work together.

http://docs.google.com/Doc?id=dq6cjjg_28gdxt8t

4 comments:

Berlin Brown said...

Google blogs just toasted my formatting. But docs.google outputs just fine. Sorry for the bad formatting.

http://docs.google.com/Doc?id=dq6cjjg_28gdxt8t

smoothtalker said...

Great article; it was pretty surprising to see someone write (in part) about a library I wrote. Two comments: for processing XML, Slava and I were planning on actually making a query language for to automate the creation of XML processing code. Even if it's relatively terse the way things are, there is a good deal of repetition and boilerplate, as you can see in your reddit parser, which was basically the same as the Yahoo parser.

On another topic, I saw in your reddit code the fragment "3 [ 2dup ] times". This works, but it's generally considered in bad style. The code is simpler to understand if (in this case) you just put 2dup at the beginning of each of the next 3 lines. In other similar cases, 2keep is useful here.

Daniel Ehrenberg said...

oops, that last comment was actually me, logged into the wrong account.

Berlin Brown said...

Thanks for the comments; yea I liked the xml library, it is pretty easy to follow. A java approach would be so verbose. getDocument, getTag, getAttribute, iterate...; always is a mess.

I am sure the next approach will be just as easy.

On the dup thing, that was the first thing that came to mind.

About Me

My Photo
Berlin Brown
He is a software developer with a diverse background in a multitude of different environments. He has worked with the CDC/SAIC, Geographic Information Systems (GIS) and now works for a Financial Services firm. You can find him freenode as blbrown and also visit botlist and botnode.com. Berlin can be found in Atlanta, Georgia. Also see botnode.com and on twitter. Please copyright any work to me (Berlin Brown) but you are free to do anything you like with it. All text is placed under a Creative Commons license. All code is placed under a New BSD license (unless noted otherwise).
View my complete profile