{"id":177,"date":"2003-01-19T23:10:56","date_gmt":"2003-01-20T04:10:56","guid":{"rendered":"\/?p=177"},"modified":"-0001-11-30T00:00:00","modified_gmt":"-0001-11-30T04:00:00","slug":"swish_e","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2003\/01\/19\/swish_e\/","title":{"rendered":"SWISH-E"},"content":{"rendered":"<p>I&rsquo;ve been looking for a new search engine for <a href=\"http:\/\/www.atpm.com\">ATPM<\/a>, and right now the leading candidate is <a href=\"http:\/\/www.swish-e.org\/\">SWISH-E<\/a>. Here are some of the reasons I like it:<\/p>\n\n\n\n<ul>\n\n\t<li>It can index PDF documents if you have Xpdf installed. This is especially important for ATPM because our older content is not available in HTML format.<\/li>\n\n\t<li>It can build its index using a Web spider rather than by scanning the local file system. That way, it can also index the dynamic content of the pages.<\/li>\n\n\t<li>I can tell it not to index certain parts of the pages, e.g. the table of contents in the navigation bar.<\/li>\n\n\t<li>I can tell it not to index URLs that match particular patterns. For instance, the printing versions of the pages have URLs that end with &ldquo;?print&rdquo; and should not be indexed.<\/li>\n\n\t<li>It has good documentation and examples.<\/li>\n\n\t<li>It can be installed without logging in as root or writing to any directories outside $HOME.<\/li>\n\n<\/ul>\n\n\n\n\n\n<p>There are many free search engines, but ones that have this combination of features are rare (judging from my quick search). Installation took a while, as I had to first install <a href=\"http:\/\/xmlsoft.org\/\">libxml2<\/a> and <a href=\"http:\/\/www.foolabs.com\/xpdf\/\">Xpdf<\/a>, and then wrestle with why SWISH-E couldn&rsquo;t find pdftotext or pdfinfo even though they were in the path. Once installed, it seems to work well. By tomorrow it should be done indexing, and then I can try some real tests.<\/p>","protected":false},"excerpt":{"rendered":"<p>I&rsquo;ve been looking for a new search engine for ATPM, and right now the leading candidate is SWISH-E. Here are some of the reasons I like it: It can index PDF documents if you have Xpdf installed. This is especially important for ATPM because our older content is not available in HTML format. It can [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"","apple_news_api_id":"","apple_news_api_modified_at":"","apple_news_api_revision":"","apple_news_api_share_url":"","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-177","post","type-post","status-publish","format-standard","hentry","category-technology"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/177","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=177"}],"version-history":[{"count":0,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/177\/revisions"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=177"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=177"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=177"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}