From 765ed07354c655210ee25586988bad98353ebeef Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Sun, 8 Jan 2017 21:28:18 +0100 Subject: Initial commit --- README.md | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..98e3ac1 --- /dev/null +++ b/README.md @@ -0,0 +1,69 @@ +# Soup.io backup scripts + +## Usage + +This Soup.io backup solution consists of two scripts: + +## fetch-pages + +Will crawl through the Soup pages (which consist of 20 posts each) and download +them to a given output directory. + +Usage: fetch-pages URL [ OUTDIR ] + +URL is the base domain of your Soup (e.g. 'kitchen.soup.io'). + +OUTDIR defaults the current directory. A directory called 'pages' will be +created inside the output directory. + + +## fetch-enclosures + +Tries to download all enclosed images and videos of the previously downloaded +pages. + +Usage: fetch-enclosures [ OUTDIR ] + +OUTDIR defaults the current directory. A directory called 'enclosures' will be +created inside the output directory; the output of fetch-pages is expected in +the 'pages' directory inside OUTDIR. + + +## Bugs and missing features + +* A failed page download will interrupt fetch-pages. fetch-pages can't resume + the backup at the point it failed; either the base URL or LIMIT need to be + adjusted in the script, or previously downloaded pages need to be removed so + the LIMIT calculation will allow downloading the missing pages +* fetch-enclosures could be adjusted to try multiple asset servers on failures. + Just re-running fetch-enclosures will work in case of transient failures, the + script will only attempt to retrieve missing files. +* Adding a script to extract the HTML code of individual posts from the pages + might be interesting to allow mirroring Soups that aren't primarily made up of + images and videos to other blog systems. + + +## LICENSE + +Copyright (c) 2017, Matthias Schiffer +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -- cgit v1.2.3