<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: SWISH-E</title>
	<atom:link href="http://mjtsai.com/blog/2003/01/19/swish_e/feed/" rel="self" type="application/rss+xml" />
	<link>http://mjtsai.com/blog/2003/01/19/swish_e/</link>
	<description></description>
	<pubDate>Tue, 06 Jan 2009 15:33:47 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Robert</title>
		<link>http://mjtsai.com/blog/2003/01/19/swish_e/#comment-315231</link>
		<dc:creator>Robert</dc:creator>
		<pubDate>Sun, 01 Jun 2008 21:16:23 +0000</pubDate>
		<guid isPermaLink="false">/?p=177#comment-315231</guid>
		<description>Where can I find an easy to follow tutorial on swish-e.  I just want to use it to spider my site and give me my metadata only.  From what I've seen it can do it, just haven't a clue how to go about it.</description>
		<content:encoded><![CDATA[<p>Where can I find an easy to follow tutorial on swish-e.  I just want to use it to spider my site and give me my metadata only.  From what I've seen it can do it, just haven't a clue how to go about it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: p</title>
		<link>http://mjtsai.com/blog/2003/01/19/swish_e/#comment-77001</link>
		<dc:creator>p</dc:creator>
		<pubDate>Tue, 08 May 2007 15:08:45 +0000</pubDate>
		<guid isPermaLink="false">/?p=177#comment-77001</guid>
		<description>don't bother with swish-e if you're using windows. i've been trying to get it running for the last two days and am giving up now. easy to get it working from command line. impossible to get it running from a web site. finally got it running. no results. no help in doc or on site. delete *.*</description>
		<content:encoded><![CDATA[<p>don't bother with swish-e if you're using windows. i've been trying to get it running for the last two days and am giving up now. easy to get it working from command line. impossible to get it running from a web site. finally got it running. no results. no help in doc or on site. delete *.*</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dharmbhsi</title>
		<link>http://mjtsai.com/blog/2003/01/19/swish_e/#comment-110</link>
		<dc:creator>dharmbhsi</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=177#comment-110</guid>
		<description>Dear Michael,

Even I am struggling with Swish-e &#38; pdf indexing.

I am a no linux programmer, i can though look a bit around on the net &#38; configure things in linux.



With swish-e the problem with me is tht , i don't know how to get started. I have read the docs &#38; visited the prog-bin dir for examples. But wht i request you is to pls provide me with sample conf files for pdf indexing, &#38; wht small tid-bits of info you could offer so tht i can index my site as well.



pls help

Regards

dharmesh</description>
		<content:encoded><![CDATA[<p>Dear Michael,</p>
<p>Even I am struggling with Swish-e &amp; pdf indexing.</p>
<p>I am a no linux programmer, i can though look a bit around on the net &amp; configure things in linux.</p>
<p>With swish-e the problem with me is tht , i don't know how to get started. I have read the docs &amp; visited the prog-bin dir for examples. But wht i request you is to pls provide me with sample conf files for pdf indexing, &amp; wht small tid-bits of info you could offer so tht i can index my site as well.</p>
<p>pls help</p>
<p>Regards</p>
<p>dharmesh</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://mjtsai.com/blog/2003/01/19/swish_e/#comment-111</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">/?p=177#comment-111</guid>
		<description>Here's my spider.conf file:



#&#160;Example&#160;spider&#160;configuration&#160;file&#160;to&#160;index&#160;the

#&#160;split&#160;version&#160;of&#160;the&#160;swish-e&#160;documentation





@servers&#160;=&#160;(

&#160;&#160;&#160;&#160;&#160;&#160;{

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;base_url&#160;&#160;&#160;&#160;=&gt;&#160;'&#160;http://www.atpm.com',

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;same_hosts&#160;&#160;&#160;&#160;=&gt;&#160;[&#160;qw!atpm.com&#160;http://www.atpm.com/&#160;www.atpm.com/!&#160;],

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;email&#160;&#160;&#160;&#160;&#160;=&gt;&#160;'swish-e@atpm.com',

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;delay_min&#160;&#160;&#160;=&gt;&#160;.0001,



&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;#&#160;Define&#160;call-back&#160;functions&#160;to&#160;fine-tune&#160;the&#160;spider



&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;test_url&#160;&#160;&#160;&#160;=&gt;&#160;sub&#160;{

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;my&#160;$uri&#160;=&#160;shift;



&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;return&#160;1;&#160;&#160;#&#160;otherwise,&#160;ok&#160;to&#160;search

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;},



&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;#&#160;Only&#160;index&#160;text/html&#160;or&#160;text/plain

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;test_response&#160;&#160;&#160;=&gt;&#160;sub&#160;{

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;my&#160;(&#160;$uri,&#160;$server,&#160;$response&#160;)&#160;=&#160;@_;



&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;return&#160;$response-&gt;content_type&#160;=~&#160;m[(?:text/html&#124;text/plain&#124;application/pdf)];

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;},

&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;filter_content&#160;&#160;=&gt;&#160;[&#160;\&#038;pdf&#160;],

&#160;&#160;&#160;&#160;},

&#160;&#160;);



use&#160;pdf2html;&#160;&#160;#&#160;included&#160;example&#160;pdf&#160;converter&#160;module

sub&#160;pdf&#160;{

&#160;&#160;&#160;my&#160;(&#160;$uri,&#160;$server,&#160;$response,&#160;$content_ref&#160;)&#160;=&#160;@_;



&#160;&#160;&#160;return&#160;1&#160;unless&#160;$response-&gt;content_type&#160;eq&#160;'application/pdf';



&#160;&#160;&#160;#&#160;for&#160;logging&#160;counts

&#160;&#160;&#160;$server-&gt;{counts}{'PDF&#160;transformed'}++;



&#160;&#160;&#160;$$content_ref&#160;=&#160;${pdf2html(&#160;$content_ref,&#160;'title'&#160;)};

&#160;&#160;&#160;$$content_ref&#160;=~&#160;tr/&#160;/&#160;/s;

&#160;&#160;&#160;return&#160;1;

}



1;
</description>
		<content:encoded><![CDATA[<p>Here's my spider.conf file:</p>
<p>#&nbsp;Example&nbsp;spider&nbsp;configuration&nbsp;file&nbsp;to&nbsp;index&nbsp;the</p>
<p>#&nbsp;split&nbsp;version&nbsp;of&nbsp;the&nbsp;swish-e&nbsp;documentation</p>
<p>@servers&nbsp;=&nbsp;(</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;base_url&nbsp;&nbsp;&nbsp;&nbsp;=>&nbsp;'&nbsp;http://www.atpm.com',</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;same_hosts&nbsp;&nbsp;&nbsp;&nbsp;=>&nbsp;[&nbsp;qw!atpm.com&nbsp;http://www.atpm.com/&nbsp;www.atpm.com/!&nbsp;],</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;email&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=>&nbsp;'swish-e@atpm.com',</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;delay_min&nbsp;&nbsp;&nbsp;=>&nbsp;.0001,</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;Define&nbsp;call-back&nbsp;functions&nbsp;to&nbsp;fine-tune&nbsp;the&nbsp;spider</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;test_url&nbsp;&nbsp;&nbsp;&nbsp;=>&nbsp;sub&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;my&nbsp;$uri&nbsp;=&nbsp;shift;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;1;&nbsp;&nbsp;#&nbsp;otherwise,&nbsp;ok&nbsp;to&nbsp;search</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;},</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;Only&nbsp;index&nbsp;text/html&nbsp;or&nbsp;text/plain</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;test_response&nbsp;&nbsp;&nbsp;=>&nbsp;sub&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;my&nbsp;(&nbsp;$uri,&nbsp;$server,&nbsp;$response&nbsp;)&nbsp;=&nbsp;@_;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;$response->content_type&nbsp;=~&nbsp;m[(?:text/html|text/plain|application/pdf)];</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;},</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;filter_content&nbsp;&nbsp;=>&nbsp;[&nbsp;\&#038;pdf&nbsp;],</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;},</p>
<p>&nbsp;&nbsp;);</p>
<p>use&nbsp;pdf2html;&nbsp;&nbsp;#&nbsp;included&nbsp;example&nbsp;pdf&nbsp;converter&nbsp;module</p>
<p>sub&nbsp;pdf&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;my&nbsp;(&nbsp;$uri,&nbsp;$server,&nbsp;$response,&nbsp;$content_ref&nbsp;)&nbsp;=&nbsp;@_;</p>
<p>&nbsp;&nbsp;&nbsp;return&nbsp;1&nbsp;unless&nbsp;$response->content_type&nbsp;eq&nbsp;'application/pdf';</p>
<p>&nbsp;&nbsp;&nbsp;#&nbsp;for&nbsp;logging&nbsp;counts</p>
<p>&nbsp;&nbsp;&nbsp;$server->{counts}{'PDF&nbsp;transformed'}++;</p>
<p>&nbsp;&nbsp;&nbsp;$$content_ref&nbsp;=&nbsp;${pdf2html(&nbsp;$content_ref,&nbsp;'title'&nbsp;)};</p>
<p>&nbsp;&nbsp;&nbsp;$$content_ref&nbsp;=~&nbsp;tr/&nbsp;/&nbsp;/s;</p>
<p>&nbsp;&nbsp;&nbsp;return&nbsp;1;</p>
<p>}</p>
<p>1;</p>
]]></content:encoded>
	</item>
</channel>
</rss>
