Common Search and Replace Patterns
May 11, 2009
We get a lot of questions on how to perform various search and replace operations in AutoBlogged. Because not everyone is an expert with regular expressions, we thought we would share some of the most common patterns we have found useful.
Search and Replace Overview
The Search and Replace feature allows you to modify the content of a feed based on Regular Expression searches. With this feature you can do things such as rewrite words, enforce naming standards, insert affiliate ID’s, correct non-standard feeds, create unique content, or just about anything else you can imagine. Search and Replace uses the PCRE syntax.
- Search for – The search term or regular expression to find.
- Replace with – The expression to use for replacing the search term.
Note that the Search and Replace feature uses Regular Expressions so you must escape any special characters in your search pattern with a slash (\) character. The special characters that need escaping are \^.$|()[].
Note that if your search expression contains multiple grouped matches, the Replace operation will only be performed on the primary match. For more precise control over replacement, you can use back references in your replace expression. Back references are special variables that return a portion of your match. Matches are grouped by numbers corresponding to portions of your pattern that are in parenthesis. The variable $1 refers to the first match, $2 refers to the second, and so on. $0 refers to the entire matched string.
Common Patterns
| Description | Search For | Replace With |
| HTML Formatting – Some searches such as Google Blog Search will apply the <b> tags around any search keywords that appear in the results.This pattern removes those tags. | <\/?b> | (leave empty) |
| Feedburner Feed Flares – These are the links at the bottom of many feedburner feeds. These links can be confusing since they aren’t from your site and can mess up your blog formatting. | <div class=”feedflare”>.*<\/div> | (leave empty) |
| Create Hyperlinks – You can automatically turn any plain-text URL into a clickable hyperlink using this pattern. | [^\"](https?|ftp)://([-A-Z0-9.]+)(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)? | <a href=”$0″>$0</a> |
| Replace Words – Sometimes you want to fix common misspellings, ensure common terminology, fix capitalization, or just mix things up a bit to ensure you don’t run into any duplicate content penalties. | wordpress|word\spress|wrodpress | WordPress |
| Clickable Keywords – Sometimes you want certain words in your blog to automatically become hyperlinks to another site you want to promote. | auto\s?blog[^\s]* | <a href=”http://autoblogged.com”>$0</a> |
| Fix Bad Feeds – Sometimes a feed has the wrong encoding or they don’t just follow standards and you see html markup appear encoded in your page content. These next two patterns turns > and < back into their literal characters. | > | > |
| < | < | |
| Remove Hyperlinks – If you are using the %content% variable in your post template but want to remove any live hyperlinks, use this pattern. | <a\s[^>]*>([^<*]*)< | $1 |
If you develop any search and replace patterns that you think others might find helpful, please add them as comments here. Here are some resources for testing regular expressions:
…
Category: Search and Replace
Tags: autoblogged-regex patterns regex replace search
Working With Tags and Categories
May 16, 2008
Tags
Tags play an important role in improving the usability and findability of your web site. AutoBlogged uses tags a number of ways. First, it can pull the tags that the original author set on the feed itself. Second, it can visit the original URL to find additional tags, and finally you can also use the Yahoo! tagging API to extract additional tags.
Note that tag support in WordPress is fairly new so many older themes do not display a post’s tags. You can fix this by modifying the theme itself or finding a plugin such as Simple Tags that will do this for you.
Tag Clouds
Tag clouds are an excellent way to increase relevant keyword density on every page of your web site. WordPress has a sidebar widget to display a tag cloud for the most popular tags on your site. Also consider the Simple Tags plugin for a much more configurable sidebar widget.
When you first add feeds to AutoBlogged and run the script, you might notice that the tag cloud contains tags that are not relevant to your content. However, as time passes and your site content grows, the more relevant tags will appear more frequently and the off-topic tags will fall off the tag cloud.
Using Categories
Although you do want a large number of tags to help search engine ranking, you should be more selective with your use of categories. Use categories as a simple navigational aid on your web site and try not to create more than fifteen primary categories for your site. Every time AutoBlogged processes feeds, it goes through all blog categories to see if they appear in the feed content so a large number of categories will slow down feed processing.
Subcategories
One helpful use for categories is to use them as subcategories to group content into consistent main categories. For example, under a main category you might want to create subcategories for synonyms or alternate terminology.
- Comments Off
- Read Entire Post
AutoBlogged Tagging Engine
May 16, 2008
AutoBlogged can automatically identify tags from the original article based on frequency, importance, sentence structure, HTML formatting, and other factors. The purpose of this is to identify relevant and natural keywords and phrases that the author did not include as tags. This is important for search engine results and also to assist visitors in finding the content of your site.
The tagging engine will quickly fill your blog pages with relevant keywords that will help your site dominate the search engines for your topic.
Although the internal tagging engine is quite effective in most cases, it is still just a script and could never accomplish what a human could. Often you will encounter keyword phrases that are not relevant, consist of sentence framgents, or simply do not make sense.
Although you might want to manually delete tags that detract from your content, you should also consider that a search engine will see the tags differntly than a human would. Often those sentence fragments will help to make your content unique and will help to diversify your page content while still keeping it on topic. It is not uncommon to find that some tags you would normally delete turn out to be the ones that bring in the most search engine traffic.
For example, you might have a blog that covers topics related to Microsoft Windows. Invariable you will pick up tags related to Linux operating systems, or software that is only relevant because it runs on Windows. These tags might turn out to be extremely helpful in search engine positions due to the unique mix of words.
What is Autoblogging?
Oct 1, 2009
Autoblogging is the term we use to automatically create content for blogs, as opposed to manually writing individual posts. In the case of AutoBlogged, you can create posts based on the contents of another RSS feed. Since you can get RSS feeds on just about anything, you can easily find content to automatically add to your WordPress blog. For example, you could get a feed from Google Blog Search that returns articles written by other bloggers on a particular topic. These articles will appear on your blog as short excerpts with a link attributing the original source.
Technorati.com and Google News are both examples of essentially how an autoblog looks.
Autoblogs are useful for many things, but they are a great way to aggregate articles on a particular niche topic for your blog. By pulling feeds from multiple sources and using smart searches and filtering you can provide valuable portals to your niche topic. Autoblogs can also work well to build blogs from multiple affiliate feeds or to augment your own content.
Autoblogs ensure keyword-rich, fresh content that will greatly improve your search engine results.
Are Autoblogs the Same as Splogs?
While many people do use autoblogs as spam blogs–or splogs–autoblogging itself is not spamming. Splogs are spam in the sense that they are spamming search engines to build backlinks or drive traffic to affiliate links, increase ptc ad clicks, or even to spread malware. Splogs quickly get blacklisted on search engines and work based on volume and rapidly creating new splogs. Splogs are a blackhat SEO technique that do not produce quality long-term results and are generally annoying for anyone using a search engine.
Are Autoblogs the Same as Scrapers?
Scraping is similar to autoblogging in the sense that it uses content from other web sites. Scraping, however, is different in that it uses a significant amount of content from targeted sites and often is combined with rewriting techniques to obscure the original content while maintaining the topical context.
Autoblogging is not about stealing content, but rather sifting through, aggregating, and linking to the world’s content to create added value.
Can I Get a Discount on AutoBlogged?
Sep 14, 2009
We occasionally distribute coupon codes on various forums or to our affiliates. We recommend hopping to your favorite search engine and see what you can find. If you still can’t afford to purchase AutoBlogged, you are welcome to make us a trade offer. We are generally looking for anything that will help promote our product although don’t make offers for advertising or links unless your site gets enough traffic or has enough rank to make a difference.
Copyright Considerations
Jul 11, 2009
The whole concept of an autoblog is using content written or owned by others. Because of that, copyrights are always an important consideration. We really can’t give legal advice on copyright and fair use, and even if we could those laws are very much up to interpretation. Fair use for the most part is decided on a case-by-case basis so there aren’t any fixed rules. Even if you are clearly in the realm of fair use, that doesn’t mean someone can’t try to sue, incurring substantial legal fees ……
Article Spinning
Jul 11, 2009
Some people ask us if AutoBlogged can rewrite or spin articles to avoid duplicate content penalties when adding posts from other RSS feeds. Although you can do some simple rewriting using the Search and Replace feature it is pretty limited and we generally do not recommend it for other than simple substitutions.
Article spinning is a technique of rewriting words in an article to avoid duplicate content penalties in search engines while maintaining the basic meaning of the article. Most article spinning techniques involve randomly replacing certain words or phrases based on a database of synonyms
We generally do not recommend article spinning. Although the content may look different to a search engine, a human can easily spot a spinned article and sometimes synonyms may produce unexpected results and actually hurt search engine placement. Furthermore, there is a fine line between autoblogging and plagiarizing. Using small content excerpts of someone else’s article to provide added value (as in the case of Google News or Technorati) is generally an accepted (or at least tolerable) practice.
However, spinning someone else’s content to plagiarize and avoid detection can quickly get your site flagged as spam and significantly (and sometimes permanently) penalized in the search engines. Furthermore, article spinning gives autobloggers in general a bad name and many are quick to label all autoblogs as spam sites.
For the most part, autoblogs are not affected by duplicate content penalties, especially if you only use fair excerpts and pull from a variety of sources. In fact, using small excerpts can improve your keyword density which often results in ranking higher than the original articles.
Having said that, if you still wish to avoid duplicate content detection, one tool we recommend is the WP Uniquefier Plugin, which does not affect the readability of an article. WP Spinner is another WordPress plugin, although we have not tested this.
Autoblogs and Duplicate Content Penalties
Jul 11, 2009
A common myth perpetuated in the SEO world is that you need to be careful with duplicate content to avoid penalties from search engines. The fact is that the Internet is full of duplicate content. Press releases, syndicated news stories, newsgroup and mailing list archives, and open source content all produce massive amounts of duplicated content. For example, take a popular Wikipedia entry and drop an excerpt into a search engine. You will see that many people use this content and often these pages rank higher than the original Wikipedia entry.
Consider that it would be a massive computing effort for any search engine to identify all pages on the Internet that are even 90% alike. When you consider each web site has unique headers, footers, sidebars, and comments, chances are your site would be more like 50% similar to any other site even if you copied most of the articles from that site. Furthermore, if your autoblog pulls from many sources your site really is not a duplicate of any single site anymore. You can rest assured that there is no automatic detection that your site contains partial duplicate content from multiple sources.
However, we have seen many autoblogs that would never pass a manual inspection. If your site looks like spam then chances are that it will be penalized or even banned from the search engines. The problem is that if you aren’t courteous to other webmasters, it is really easy for them to report your site as spam, triggering a manual review by the search engines. If your web site looks like it does nothing more than steal content from others, chances are they will penalize you for being nothing other than duplicate content.
Here are some tips to help avoid any search engine penalties:
Add Value – Keep in mind, that there are many big web sites that are nothing but duplicated content. Technorati, Google News, and many other sites are nothing but fancy autoblogs. If your site does nothing but repeat the content of a couple other blogs, you can expect to fail a manual review. We really don’t need more sites like that on the Internet. Create your autoblog with a purpose and give the user real value. A professional design and a personal touch can also make a big difference when it comes to manual reviews.
Be Courteous – The most important thing is to not anger other webmasters because they are they ones most likely to report you to the search engines. We find that it is best to not take every single article from another site without asking them permission. It is better to use the search-based feeds to pull your articles from a variety of different sites rather than directly pulling the feed from one or two specific blogs. You would be surprised to find that many bloggers are willing to let you pull excerpts from their feeds, especially if your site looks professional and has good page rank.
Still, you may find that some authors will complain even if you pull just one excerpt from their site. Although with just an excerpt you are probably safe from a legal perspective, but it is always best to show courtesy by apologizing and placing their site on the URL Blacklist in AutoBlogged so their articles don’t appear.
Make sure your web site has a way to contact you so that other webmasters will come to you first rather than just reporting you as spam to the search engines.
Fair Usage - When pulling articles from another site, be sure to keep your excerpts short and respect the copyrights of others. Some feeds include copyright notices and you can even include those in your post template. Always give credit to your sources and link to the original article. We also like to include a footer and about page that explains that the site is an autoblog and that the content was written by others. It is best to not take full articles, attempt to rewrite or spin articles, or use other blackhat methods that make it look like you are hiding something.
The User Experience - Although their are many algorithms that search engines use to rank content, it ultimately comes down to user experience. The whole point of having penalties is to prevent spammers from manipulating the system. If you build your site with real users in mind, and provide real value, chances are you won’t ever face any search engine penalties.
Performance Tips
Jul 9, 2009
We have been receiving lots of feedback from our customers and we are surprised with the variety of sites people have built with AutoBlogged. Autoblogs have traditionally had a bad name as spammy or as content thieves but some of you have used AutoBlogged to build some very useful web sites.
One thing we are seeing is people using AutoBlogged in ways we really never considered. AutoBlogged works best when you give it two or three RSS feeds based on various searches. However, when you load up hundreds of feeds and have complex filtering requirements, you might notice a significant hit on your site’s performance.
Part of the problem is that PHP, as an interpreted language, will always require more overhead than a compiled application. Part of the problem is that WordPress is a complex platform that already does an enormous amount of processing for each page view. Part of it is that AutoBlogged does quite a bit of work to process and tag each post.
While much has been done to optimize the performance of AutoBlogged, how you configure the plugin can have a significant impact on the load it puts on the server, especially when the plugin adds a large number of posts each day.
If you want to get maximum performance from AutoBlogged, here are some things you can do:
1. Remote Filtering - Because all filtering tasks increase script processing load, try to limit your use of feed-level filters or search and replace operations. Filtering requires repetitive searching that could potentially have an impact on a busy site.Try to offload as much filtering as possible on the remote end.
For example, use advanced search options with Google Blog Search to filter out unwanted words, limit to a specific date range, or specify the language. Then under Filtering, clear all the words from the keywords blacklist. Also consider using Yahoo! Pipes, MySyndicaat, or another feed aggregator with filtering capabilities to fine-tune your source feed. Anything you can do to move the processing off your server means that much less work your server has to do.
The feed level filters and search and replace filters are useful for simple processing but can quickly slow down the script if you overuse that feature. If you need more advanced filtering capabilities, we suggest using Yahoo! Pipes, MySyndicaat, or another feed aggregator with filtering capabilities.
2. Limit the Number of Feeds – Although there is no specific limit to the number of feeds AutoBlogged can handle, adding too many feeds can slow down the process and possibly result in script timeouts. Again, an external feed aggregator is an excellent solution.
3. Do not Retrieve the Original Article – AutoBlogged by default will visit the URL of the original article in order to gather additional keywords to use as tags. Skipping this step will save a significant amount of CPU usage, will reduce network traffic, but will limit the effectiveness of the built-in tagging engine. Under Tag Options, only check the box to use original tags from feed. This will save a visit to the original URL and the subsequent parsing of tags. Instead of automatically parsing tags from the original URL, under Tag Options, use the Additional Tags box as a random source for tags for each post.
4. Do not Search for Existing Categories – AutoBlogged has two options for dealing with existing blog categories that appear in an article: it can add that category to the post or it can add it as a tag. This is very useful for automatically categorizing each post but it also means that AutoBlogged must loop through each blog category to see if it exists in the post.
5. Limit Duplicate Matching – To prevent duplicate posts from appearing AutoBlogged will search for duplicates based on the post title or based on the original link. Filtering by title works best in some situations but filtering by link works better in others. We do not recommend using both at the same time because AutoBlogged must perform a database lookup for each one.
6. Limit Your Plugins – If you find that WordPress in general is slow, you should take a look at your plugins and consider only enabling the bare minimum. Remember that most of these plugins will run with every page load.
7. Limit the Features you Use – If you have a very busy blog and limited CPU resources, you may have to limit the AutoBlogged features you use. AutoBlogged can do quite a bit but sometimes you may want it not do so much to help performance. Image and video processing, checking to see if links already exist in your blog, checking to see if the author exists, and saving images locally all require extra processing that can slow things down. At some point you need to decide which is most important: features, performance, or the amount of money you spend on server equipment.
Performing Meta Searches
May 11, 2009
One thing we do here to be proactive with our technical support is we monitor search engine results for autoblogged-related errors. By watching to see what errors the search engines are indexing we can identify bugs that haven’t yet been reported. And since we are always looking for new uses for AutoBlogged, we monitor these errors with our own internal autoblog site.
To get comprehensive results, we use several tools. First, we need to get RSS feeds from various search engines, which we build using one of these tools:
http://alp-uckan.net/free/monitorthis/
http://www.researchbuzz.org/tools/kebberfegg.pl
What these tools do is let you enter in some search terms and it will generate the RSS feed URLs for various search engines in OPML format. Although AutoBlogged doesn’t as of yet allow importing OPML, we copy and paste each feed URL one at a time.
Another way to gather search results is to use a Yahoo! Pipes meta search such as this one here:
http://pipes.yahoo.com/pipes/pipe.info?_id=nHNB8TJm3BGumlGA9YS63A
Our internal support autoblog doesn’t just watch for errors, we also watch to see who is asking questions about our script, who is illegally sharing our script, who is writing about our script, and of course, who is praising it. Each group of feeds goes into its own category and it is all nicely organized, automatically tagged, and presented in a great-looking WordPress blog.



Blog Posts