But I soon found out that mplayer will play them. So I wrote up a little script that will rip the realaudio stream to MP3, which is better anyway because then I can listen on my iPod. The quality isn't great - I think it's a 32kbps stream, but that's roughly equivalent to AM radio quality. It's listenable. It will run on any unix-based system (like a mac) with the following pre-requisites:
- mplayer
- wget
- lame
- sox
- Perl 5.8.8+
- Perl Module: MP3::Tag
One caveat is that although mplayer has the capability to dump realaudio streams very quickly (by adding a large value, like 5000000 to the -bandwidth switch), I found that for some reason alot of audio in these streams was skipped. Not good. So I left out the bandwidth switch and let it dump the stream in real time. Takes alot longer, but you get all the audio.
Click "expand source" to view the code below. I call it "nextbigthing.pl":
#!/usr/bin/perl
# 2009-06-21
# created to take a URL of a WNYC's "The Next Big Thing" show archives as input (first argument on cmd line)
# ex: $ ./nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/05
# and create tagged MP3s of each show on the page as output
#
# PREREQs
# * mplayer with links to realaudio codecs (see here: http://www.macosxhints.com/article.php?story=20050130184054216)
# * wget
# * lame
# * sox
use strict;
use POSIX qw(strftime);
use MP3::Tag;
# config these
my $wget = "/usr/bin/wget";
my $mplayer = "/usr/bin/mplayer";
my $lame = "/usr/bin/lame";
my $sox = "/usr/bin/sox";
my $tmpfolder = "/home/user/tmp/tnbt/"; # create this if it doesn't exist
my $outfolder = "/home/user/Music/the_next_big_thing/"; # create this if it doesn't exist
my @rmfiles = ();
# get the url off the command line
if (!$ARGV[0]) {
print "no URL found. put the url in single quotes as the first argument after calling this program on the command line\n";
exit;
}
# wget the page
my $outfile = $tmpfolder . "tmpout.txt";
my $wout = `$wget -U 'Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20041001' --timeout=30 -t 2 -q --output-document=${outfile} $ARGV[0]`;
if (not(-e $outfile)) {
myExit("nothing found. check url and try again",\@rmfiles);
} else {
push(@rmfiles,$outfile);
}
# put file into array
open(FILE,"$outfile");
my @lines = <FILE>;
close(FILE);
# search the array for keywords and place our show info into a hash
my %SHOWS = ();
my $counter=0;
my ($showdate,$episode_name,$episode_date,$episode_description);
foreach my $line (@lines) {
chomp($line);
# there's a really specific format we're looking for here. if they change their template, this won't work at all
# we're looking for show info here
if ($line =~ /episodenamesmall/) {
$lines[$counter] =~ /^.*?episodes\/(\d\d\d\d)\/(\d\d)\/(\d\d)">$/;
$showdate = "${1}-${2}-${3}";
$lines[$counter+1] =~ /^(.*?)</;
$episode_name = $1;
$lines[$counter+2] =~ /^<.*?>(.*?)<.*?>$/;
$episode_date = $1;
$lines[$counter+3] =~ /^<p>(.*?)<\/p>$/;
$episode_description = $1;
$SHOWS{$showdate}{'showdate'} = $showdate;
$SHOWS{$showdate}{'episode_name'} = $episode_name;
$SHOWS{$showdate}{'episode_date'} = $episode_date;
$SHOWS{$showdate}{'episode_desc'} = $episode_description;
} # end search for 'episodenamesmall'
# we're looking for the url to the realmedia here
if ($line =~ /<a class="listen"/) {
my $rafiles;
my @ramfiles = ();
$lines[$counter+2] =~ /^\s+href="\/stream\/ram.py\?(.*?)" class.*?$/;
$rafiles = $1;
$rafiles =~ s/file[\d]{0,2}=//g;
my @tmpfiles = split(/&/,$rafiles);
foreach my $tmpfile (@tmpfiles) {
my $fullurl = 'rtsp://raudio.wnyc.org/' . $tmpfile;
push(@ramfiles,$fullurl);
}
my $print_ra = join("^",@ramfiles);
@{ $SHOWS{$showdate}{'files'} } = @ramfiles;
} # end search for 'listen'
$counter++;
} # end foreach loop thru html file
# loop thru the hash, grab each show, and encode
for my $show ( sort keys %SHOWS ) {
my @wavfiles = ();
print "Episode $SHOWS{$show}{'episode_name'} - $SHOWS{$show}{'episode_date'}\n";
# our filenames
my $wavfilename = ${tmpfolder} . replaceSpace(stripChars($SHOWS{$show}{'episode_name'})) . '_' . $SHOWS{$show}{'showdate'} . '.wav';
my $mp3 = $wavfilename;
$mp3 =~ s/${tmpfolder}/${outfolder}/;
$mp3 =~ s/\.wav/\.mp3/;
# if an mp3 already exists in the dest. dir, skip
if (-e $mp3) {
print "\nFilename $mp3 already exists, skipping...\n\n";
next;
}
foreach my $file (@{ $SHOWS{$show}{'files'} }) {
#print "File: $file\n";
$file =~ /^rtsp:\/\/raudio.wnyc.org\/nbt\/(.*?)$/;
my $ra_name = $1;
# dump the raw .ra file
my $raw = "${tmpfolder}${ra_name}";
# removed the bandwidth switch because although it speeds up the dumping, it seems to skip alot of audio.
# so as of now it's just dumping in real-time
#my $get_cmd = "$mplayer -bandwidth 5000000 -noframedrop -dumpfile $raw -dumpstream '$file'";
my $get_cmd = "$mplayer -noframedrop -dumpfile $raw -dumpstream '$file'";
print "MPLAYER DUMP COMMAND: $get_cmd\n";
my $get_out = `$get_cmd`;
# convert files to wav
my $filename = "${tmpfolder}${ra_name}.wav";
my $mplayer_cmd = "$mplayer $raw -vc dummy -vo null -af volume=0,channels=2 -ao pcm:waveheader:file=$filename";
print "MPLAYER WAVE COMMAND: $mplayer_cmd\n";
my $mp_out = `$mplayer_cmd`;
if (-e $filename) {
push(@rmfiles,$filename);
push(@rmfiles,$raw);
push(@wavfiles,$filename);
} else {
myExit("no wavfile was created from $mplayer_cmd",\@rmfiles);
}
} # end foreach thru wavfiles
# cat wavfiles together with sox
my $wavfilejoin = join(' ',@wavfiles);
my $soxcmd = "(for wavfile in ${tmpfolder}*.wav; do $sox \"\$wavfile\" -t .raw -r 44100 -sw -c 2 -; done) | $sox -t .raw -r 44100 -sw -c 2 - -t .wav $wavfilename";
print "SOX CONCAT COMMAND: $soxcmd\n";
my $soxout = `$soxcmd`;
# encode the wav file
if (-e $wavfilename) {
push(@rmfiles,$wavfilename);
my $title = "The Next Big Thing: \"" . stripChars($SHOWS{$show}{'episode_name'}) . "\" (" . $SHOWS{$show}{'episode_date'} . ")";
my $author = "Dean Olsher";
my $composer = "WNYC";
my $album = "The Next Big Thing";
$SHOWS{$show}{'showdate'} =~ /^(\d\d\d\d)-(\d\d)-(\d\d)$/;
my $year = $1;
my $info_url = $ARGV[0];
my $genre = "Talk";
# caused some problems adding the ID3v2 stuff on cmd line so we'll just have MP3::Tag do it
#my $id3 = " --add-id3v2 --ignore-tag-errors --tt \"$title\" --ta \"$author $composer\" --tl \"$album\" --ty \"$year\" --tc \"$info_url\" --tg \"$genre\"";
#my $encode = "$lame -V0 -h -b 128 --quiet --vbr-new$id3 $wavfilename $mp3";
my $encode = "$lame -V0 -h -b 128 --quiet --vbr-new $wavfilename $mp3";
print "LAME ENCODE COMMAND: $encode\n";
my $enc_out = `$encode`;
# add proper ID3 tags and clean up
if($mp3) {
my $comments = stripChars($SHOWS{$show}{'episode_desc'});
&id3v2tag($mp3,$title,$year,$author,$album,$info_url,$comments,$genre,"lame");
myExit("SUCCESS! Encoding complete for $title",\@rmfiles,1);
} else {
myExit("the mp3 doesn't exist so I can't tag it",\@rmfiles);
} # end if for mp3 exists
} else {
myExit("our resampled wav file doesn't exist so I can't encode an mp3",\@rmfiles);
} # end if for wav exists
print "\n\n";
} # end foreach thru shows
print "Done\n";
exit;
##################### SUBS ###########################
sub myExit {
my ($message,$rmfiles_ref,$donotexit) = @_;
my @rms = @$rmfiles_ref;
foreach my $remove (@rms) {
if (-e $remove) {
print "REMOVING $remove\n";
unlink($remove)
}
}
print "$message\n";
if (!$donotexit) {
exit;
}
} # end sub myExit
sub stripChars {
my($text) = @_;
$text =~ s/\n/ /g; # strip carraige returns
$text =~ s/\t/ /g; # strip tabs
$text =~ s/\a/ /g; # strip carraige returns
$text =~ s/"/'/g; # strip quotes and replace with single quotes
$text =~ s/\s+/ /g; # strip repeating spaces and replace with one
return ($text);
} # end sub stripchars
sub replaceSpace {
my($text) = shift;
$text =~ s/([^\w+\s+])//g;
$text =~ s/^\s+//;
$text =~ s/\s+$//;
$text =~ s/([\s+])/_/g;
return ($text);
} # end sub replacespace
sub id3v2tag {
# adds id3v2 tag to mp3s
my ($file,$title,$year,$author,$album,$info_url,$comments,$genre,$encoder) = @_;
my $mp3 = MP3::Tag->new($file);
$mp3->get_tags();
$mp3->new_tag("ID3v2");
$mp3->{ID3v2}->add_frame("TALB", "$album");
$mp3->{ID3v2}->add_frame("TIT2", "$title");
$mp3->{ID3v2}->add_frame("TPE1", "$author");
$mp3->{ID3v2}->add_frame("TCON", "$genre");
$mp3->{ID3v2}->add_frame("TSSE", "$encoder");
$mp3->{ID3v2}->add_frame("TYER", "$year");
$mp3->{ID3v2}->add_frame("COMM", "ENG", "", "$comments $info_url");
$mp3->{ID3v2}->add_frame("TIT3", "$comments $info_url");
$mp3->{ID3v2}->add_frame("TRSN", "$comments $info_url");
$mp3->{ID3v2}->add_frame("TXXX", "$comments $info_url");
$mp3->{ID3v2}->add_frame("WORS", "$comments $info_url");
$mp3->{ID3v2}->add_frame("WXXX", "$comments $info_url");
$mp3->{ID3v2}->write_tag;
$mp3->close();
} # end sub id3v2tag
The neat thing about this is it will retrieve the show info from the archive web page and add it to the ID3 info, so the MP3s are named and dated properly. It's called with the URL of each archive page like this:
./nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/05I created a wrapper script that sends a whole bunch of the URLs to the script like this:
#/bin/bash /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/06; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/07; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/08; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/09; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/10; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/11; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2002/12; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2003/01; /home/user/bin/nextbigthing.pl http://www.wnyc.org/shows/tnbt/episodes/2003/02;Then just let it run until all the streams are recorded!


0 comments:
Post a Comment