Hi. This script is used to find news articles that allow commenting other than Disqus, Facebook, and other JavaScript based commenting systems.
Do you like this sort of thing? Make sure to submit your email on the left so you’ll be updated about more releases.
== Download ==
Hidden Block (you must be registered and have 1 posts):
You do not have sufficient rights to see the hidden data contained here.
== Background ==
This is the script I was planning on using for my Newspaper Links (NL) BST but, after extensive testing, I concluded that the BST wouldn’t be profitable so I’m releasing the script to the community.
== Functions ==
This python script pulls an rss feed(s) from Google Alerts, downloads the web pages, renders the JavaScript, removes pages with bad footprints, and gives you a list of URLs in your mailbox.
== Original Usage ==
The original use was to find niche related news articles that allowed commenting but the commenting system must not be Disqus, Facebook, or some other Javascript based commenter. The tricky thing with news articles is that commenting tends to close after a few days. This script solves that by giving you daily URL lists in your inbox.
The issue is that the approval rate is about 10% even with US IPs and legit comments. Not enough to scale efficiently, but enough to get some good links if you’re willing to put in the time.
== Documentation ==
All of the following scripts provide usage information when you run them without arguments. Here are the usage info for them.
Usage: rssScraper.py FEED
Fetches urls from feed and appends them to ‘rss’ file in FEED directory
It will not add duplicates to ‘rss’ file
Usage: jsTester.py FEED
Fetches urls found in FEED’s ‘rss’ file and
filters urls into ‘toKeep.csv’ and ‘toRemove.csv’
Ignores urls found in FEED’s ‘master.list’
After feed is processed, ‘master.list’ is updated with processed feed urls
Usage: eMailer.py FEED
Emails contents of a FEED’s toKeep.csv file
Shouldn’t need to be called directly as jsTester.py calls it after it’s done
processing a feed.
Usage: check_links.py FEED
Checks links in emailed.csv file in FEED directory
INCOMPLETE
The following two scripts are the ones being run by the cron scheduler. They take no arguments.
fetch_rss_feeds.py
runs rssScraper on each of the feeds in feeds.csv
test_rss_feeds.py
runs jsTester.py on each of the feeds in feeds.csv
== Installation ==
If you can’t figure it out from the above, you shouldn’t be getting newspaper links.
Bookmarks