Journler development has ended. Sprouted is shutting down. User support will continue indefinitely.
Read more

Community

Introducing Parsers 1.0b1 - text parsing and data mining
Goto page 1, 2  Next
Post new topic   Reply to topic journler.com Forum Index » Journler Beta Testing
View previous topic :: View next topic  
Author Message
phildow
Site Admin


Joined: 11 Dec 2005
Posts: 3407
Location: Berkeley

PostPosted: Fri May 23, 2008 4:19 pm    Post subject: Introducing Parsers 1.0b1 - text parsing and data mining Reply with quote
Download Parsers 1.0b1
http://www.getsprouted-mirror.com/downloads/Parsers1.0b.zip
[[ Mac OS 10.5 Leopard Only ]]

Screenshot
http://www.getsprouted-mirror.com/screenshots/ParsersScreenshot.png

Alright, I’m pretty excited about this! In my free time I’ve been working on another small application, Parsers. I believe it’s finally ready for open beta testing.

Parses is “drag and drop text parsing and data mining”. The app is designed to help you preprocesses data so that you're left with only the information you need in a format you can easily apply elswhere. In a visual environment you create and test re-usable, rules based documents for parsing well formatted text.

For example, let's say you want to download your local weather information from an online feed. The temperature, conditions and forecast change by the hour but the html in which that information is contained otherwise stays the same. That is, the relevant information or variables change but the format remains constant. Parsers helps you extract the temperature, conditions and forecast from the feed so that you can use the information elsewhere, in a widget, your own webpage or a spreadsheet for example.

Parsers is a utility application targeted at a specific class of users. While it won’t be for everyone, those who do integrate Parsers into their workflow may find that it saves hours of work.

The application should be straightforward. Try one of the tutorials and you’ll understand immediately what the application does.

Featurewise Parsers sports an Apple Script interface, bulk file processing and support for regular expressions. It can read text, rich text, Word and PDF documents and output XML or native property lists. Error checking occurs at every step. When something goes wrong you should be informed in a helpful manner.

This is flat out some of the best code I’ve ever written, and unlike Cork (which I posted as an R&D project a while back), Parsers is pretty much finished. There are some finishing touches to take care of and the need for a thorough testing, but once that’s done I fully intend to release the application.

Feel free to check it out. Feedback is appreciated!


Last edited by phildow on Fri May 23, 2008 6:25 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
phildow
Site Admin


Joined: 11 Dec 2005
Posts: 3407
Location: Berkeley

PostPosted: Fri May 23, 2008 4:41 pm    Post subject: Reply with quote
Ah, here's a small copy of the screenshot:

Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
phildow
Site Admin


Joined: 11 Dec 2005
Posts: 3407
Location: Berkeley

PostPosted: Fri May 23, 2008 6:24 pm    Post subject: Reply with quote
Ah, I failed to mention: Parsers requires Mac OS 10.5 Leopard.
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
NovaScotian



Joined: 18 Feb 2007
Posts: 2072

PostPosted: Fri May 23, 2008 10:49 pm    Post subject: Reply with quote
Now, as the magic next step, parsers need a replace so it can be a powerful find/replace app too.
Back to top
View user's profile Send private message
phildow
Site Admin


Joined: 11 Dec 2005
Posts: 3407
Location: Berkeley

PostPosted: Sat May 24, 2008 3:35 am    Post subject: Reply with quote
NovaScotian wrote:
Now, as the magic next step, parsers need a replace so it can be a powerful find/replace app too.

Hmm, that's not quite what it's for though.
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
NovaScotian



Joined: 18 Feb 2007
Posts: 2072

PostPosted: Sat May 24, 2008 1:28 pm    Post subject: Reply with quote
Of course that's not what it's for -- who among us ever uses software for only the purposes intended by the developer? ;-)
Back to top
View user's profile Send private message
jkr



Joined: 14 Sep 2006
Posts: 27
Location: New York, NY

PostPosted: Sat May 24, 2008 3:59 pm    Post subject: Reply with quote
And another winner - this is VERY useful for my research data-mining purposes. I will give this a more serious try later this summer when I am working on some datasets. I was trying to figure out how to do this exact tast with my statistics software (STATA), which can be done, but this is soooo much more Mac-like. Most of all I am grateful that you included the help part into the app even in the beta! Thanks again for doing this, Phil!

What exactly is this "spare time" you are talking about you supposedly have? That's impossible ;-)
Back to top
View user's profile Send private message
justG



Joined: 21 Jan 2007
Posts: 542
Location: LI, NY, US

PostPosted: Sun May 25, 2008 1:08 pm    Post subject: Reply with quote
jkr wrote:
What exactly is this "spare time" you are talking about you supposedly have? That's impossible ;-)

Seriously. We Journler users are doing something wrong if Phil's got spare time. =]
Back to top
View user's profile Send private message
NovaScotian



Joined: 18 Feb 2007
Posts: 2072

PostPosted: Sun May 25, 2008 2:52 pm    Post subject: Reply with quote
Ahh --- but don't you think that this might be part of an integrated search tool for Lex and Journler 2.6 which are to be highly integrated? I do.
Back to top
View user's profile Send private message
justG



Joined: 21 Jan 2007
Posts: 542
Location: LI, NY, US

PostPosted: Sun May 25, 2008 8:44 pm    Post subject: Reply with quote
Oh, I agree with you, NovaScotian. I was just jokingly objecting to the use of the word "spare." =)
Back to top
View user's profile Send private message
phildow
Site Admin


Joined: 11 Dec 2005
Posts: 3407
Location: Berkeley

PostPosted: Sun May 25, 2008 10:28 pm    Post subject: Reply with quote
Hey, trust me, *I'm* surprised I have spare time. =)
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
NovaScotian



Joined: 18 Feb 2007
Posts: 2072

PostPosted: Sun May 25, 2008 10:43 pm    Post subject: Reply with quote
We do trust you, Phil; it's just that what you consider "spare time" and what we consider "spare time" aren't in the same constellation (although I shouldn't speak for G)
Back to top
View user's profile Send private message
psummerill



Joined: 17 Apr 2007
Posts: 40

PostPosted: Mon May 26, 2008 6:46 am    Post subject: Only the first instance? Reply with quote
Does Parses only pull the first instance in a long document with multiple instances? E.G. really long document with repeated "Applicant Name: " and then Able, Ben, Carter, Doug instances of applicant name?

Tried to run it as such but it would only pull the first instance, even when using restart rule at end.

pete
Back to top
View user's profile Send private message
phildow
Site Admin


Joined: 11 Dec 2005
Posts: 3407
Location: Berkeley

PostPosted: Tue May 27, 2008 2:10 am    Post subject: Re: Only the first instance? Reply with quote
psummerill wrote:
Does Parses only pull the first instance in a long document with multiple instances? E.G. really long document with repeated "Applicant Name: " and then Able, Ben, Carter, Doug instances of applicant name?


Yes, that is a limitation of the program that I've given some thought to for later versions. Right now you can't control the flow of execution of the rules, which is what you're asking for. So for example a "repeat these rules x times or until no more matches" or an "if this rule doesn't match do this instead".

That kind of behavior would multiply the complexity of the program exponentially, both in terms of programming on my end and in the user interface. But eventually that's where I'd like to see Parsers go.

This points out something that I hope the application does a good job of hiding. When you use Parsers you are writing a program. You're just doing it visually, with only four commands and with linear execution. As soon as you add flow control (repeat, go to, etc) and conditional expressions (if ... then) things get complex fast. Imagine writing a visual programming language. Beats me if I know how to do that.

So far so good though. And how cool would it be to visually *and simply* build complex programs for extracting variable information from pseudo-consistently formed data? Yeah, that'd be sweet. And let's not beat around the bush, it'd be worth some serious cash.

In the meantime you might be able to find a way to use Apple Script in conjunction with another application (or none at all) to break up the text you're working with into smaller chunks and then send those to Parsers.

For example you can set the text delimiter in Apple Script and use it to split text into a list of text, each chunk of which you can then send to Parsers. If you post a sample of the text here we might be able to whip up something for you. I don't imagine it would be too difficult.
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
NovaScotian



Joined: 18 Feb 2007
Posts: 2072

PostPosted: Tue May 27, 2008 2:03 pm    Post subject: Reply with quote
It would, in fact, be quite straight-forward -- I'll work up an example.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic All times are GMT + 1 Hour
Goto page 1, 2  Next
Page 1 of 2
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Copyright © 2006-2007 Journler. Some rights reserved       Contact: Phil | Webmaster
website design by Agee Design