amuck-landowner

Download all MP3 in XML file

NodeBytes

Dedi Addict
Hello all,

I have an XML file with links to mp3 files. (RSS feed)

I need to download all the mp3 files that are linked to on this XML file.

Here's a sample item from the RSS feed.


<item>
<title>Simply Christmas, Part 3</title>
<itunes:author>Tom Hughes</itunes:author>
<itunes:summary/>
<enclosure url="http://cachurch.com//podcasttracking/667d51e5-828a-4091-bb86-e74ed94a8c75/3e3ad3c2-8df2-45c8-b155-d3bc89f96d51/SimplyChristmasPt3.mp3" length="51853488" type="audio/mpeg"/>
<guid>
http://cachurch.com//podcasttracking/667d51e5-828a-4091-bb86-e74ed94a8c75/3e3ad3c2-8df2-45c8-b155-d3bc89f96d51/SimplyChristmasPt3.mp3
</guid>
<pubDate>Sun, 22 Dec 2013 12:00:00 GMT</pubDate>
<itunes:duration>33:08</itunes:duration>
<itunes:keywords>
Tom, Hughes, Simply, Christmas, December, 21, 22, 2013, sermon
</itunes:keywords>
</item>

How would you scrape this for just the links to the mp3 files and then download them?

Thanks,

Brendan
 

dannix

New Member
the simplest would be to let wget do it for you. This will however download all listed files. wget -F -i your_xml_file
 

fisle

Active Member
In Python:


# -*- coding: utf-8 -*-
import urllib
from bs4 import BeatifulSoup

url = 'insert_url_here'
data = urllib.request.urlopen(url).read()

soup = BeatifulSoup(data)
songs = soup.find_all('guid')
for song in songs:
song = song.string.strip()
filename = song.split('/')
urllib.urlretrieve(song, filename[-1])
print '{!s} downloaded'.format(filename[-1])

Needs BeautifulSoup4 (pip install BeautifulSoup4)
 
Last edited by a moderator:

texteditor

Premium Buffalo-based Hosting
Flexget is basically made from magic and is my go-to multitool for grabbing files from RSS, I use it for torrents but their example page has an example for scraping mp3s from HTML - since flexget filters can be applied to RSS the same way it should work just fine

http://flexget.com/wiki/Plugins/rss

http://flexget.com/wiki/Plugins/html

once you get your little yaml config setup correctly, just run flexget in cron.

It also keeps a database of files it has seen before so it doesn't accidentally grab the same one twice
 
Top
amuck-landowner