Finally the series I've been waiting for so long! Thanks a lot for your brilliant screencasts! They're one of the best Rails resources I know and extraordinarily well prepared.
I was just about to mention, this won't work in Windows. The ampersand forking is a *nix only thing.
I don't believe there is a simple way to do this in windows, except to use the Ruby Process module with #fork. Not as easy, but similar functionality, just using the Ruby library.
Interesting alternative to backgroundrb or starling, seems easier to set up.
I use rake tasks, launched with a cron job, for my background processing, but this is suitable only for periodical tasks. Your solution is fine for user-launched tasks.
if you want to "spawn" a process under windows you can do so by typing "start rake ..." then again you're better off having the whole path to the rake command.
It does work, except that the command shell window remains open after execution of the task, making subsequent executions be silently ignored (the window needs to be manually closed before clicking 'Deliver' again).
Since it wasn't made fully explicit: the code in the screencast doesn't escape the env values (or keys), only upcases the key and puts some quotes around the value.
If regular users can pass in the parameters used, they can exploit it to run arbitrary code.
It was hinted at, but could be misunderstood to mean that the current code does proper escaping.
@tony and Jean-Marc, thanks for pointing out how to accomplish this in Windows. I don't have access to one at home so I can't fully test these. I'll add this to the show notes.
@Henrik, thanks for clarifying this. Along that line, does anyone know of a good way to escape a string to be entered into a shell command?
This is great, but what about firing off processes whose status the user is interested in later? For example, whether it finished successfully, unsuccessfully, what percentage complete it is, etc.?
Perhaps you'll cover this in the upcoming background processing episodes, but it would be nice to have some sort of rails-model driven status mechanism that could be tied into the rake technique shown here.
Taking the Mailing as an example, you could use an attributed called status where the background task updates it and your web application displays the current value (you could do it with a periodical refresh).
Would 'rake thinking_sphinx:index' or delta indices make for good uses of the approach outlined in this screencast? Or, are we better off with backgroundrb and the convenience of its :trigger_args intervals?
I've got an excel file as a view that takes ages to render. Can I generate it in the background and direct it to disk so I can have a link to it appear later?
@Reuben, this can be accomplished by regularly updating the database record associated to the task. In this case it could be the given Mailing record. True this could be extracted out into a generic jobs table, but at that point I think you're better off going with another solution (covering in future episodes).
@Neil, it depends on how often the indexing will be done. This approach is not optimal for frequent tasks, so if you're calling it on each record update then I'd say no.
@Amir, yep, that's definitely possible. The hardest part is notifying the user after it's done. One way to do that is just have the browser reload periodically (or ask them to come back later). You can update a database record at the end of the task so your app knows when it's done building and can display the link.
@AC, I plan to cover more "proper" ways on doing this kind of thing in the future. But as far as reliability is concerned, I haven't had any problems with this approach. Admittedly it is a bit of a hack, but it works okay (in my testing) and is much simpler than most alternatives.
@Ryan: Sure - it works, when it works. But if a task is killed for whatever reason (say, power outage), you've lost it along with all the data it carried. That is bad enough. And unless you check your logs religously, you won't even notice. Hell, it might not even be reflected in the log, that the task wasn't completed successfuly. Apart from that, there's the memory consumption and CPU time wasted. Seems ugly to me. I wouldn't dare rely on something like this in a serious production environment..
You probably want to add 2>&1 into that command somewhere so that you capture STDERR into your log file as well as STDOUT.
I've noticed that you tend to use Rake for all command line tasks in place of just a straight call to a Ruby script. Is there any reason for this other than the ability to easily load the Rails environment ? Is there any overhead to running Rake as opposed to a straight Ruby script ?
Hi Ryan,
Thanks for another great screencast. In your future ones about other ways to do background tasks, could you spend some time covering how/where to put the code that starts the process so that each time you redeploy your app it will automatically start up the background processes too? (and not wind up with a bunch of extra ones running either)
I've been using "spawn" for background stuff, but it requires a fair amount of manual starting and killing the way I'm doing it now. There must be a (much) better way!
Deamons is pretty easy to use and will prevent the "more than one copy running at a time" problem ( http://daemons.rubyforge.org/ ).
The Rails Way book goes into this a little bit (enough to get started anyways) and you can avoid most of the problems listed above by making updates to the database as tasks complete. If you have 1,000 emails you need to send and after you send each one you update the row to show it was sent you'd know where it stopped, and with only a little more work you could have it pick up there again latter, if you had to.
I suppose this wouldn't work as well if the tasks take a very long time to execute, but I generally find that if that's the case there are probably ways to make it faster. Some code that is inside a loop that should be outside, or a way to break it down into smaller tasks rather than one giant one. Ask for help if you get stuck, sometimes you just can't see the answer yourself because you are too close to the problem and need a fresh persective.
I believe the best way to protect yourself is to not escape at all, but instead use the multiple-argument syntax to system:
Rather than system("ls -l"), use:
system("ls", "-l")
I believe this will bypass the shell and therefore any risk of injected characters.
> system("ls", "`ps ax`")
ls: `ps ax`: No such file or directory
Clearly those back-ticks we so worry about were considered arguments, not parsed by the shell.
Warning though, if you pass those to poorly written shell scripts, you are still risking a lot. Be careful, perhaps converting known-integer arguments using params[:id].to_i
Also, should you write an 'escape' program, you might just consider not doing that at all, and instead using a whitelist (not blacklist) to pass only specific character types through, such as A-Z a-z 0-9 and so on. Whitelisting is preferred because if you miss one, the app won't work, but if you blacklist you could leave a hole open.
The guys at Railsenvy.com just posted a video about the talks that were done at Lone Star Ruby Conf and Confreaks have their videos of the individual talks online now.
@Matthew, good point. I'll add that to the code in the notes. As for using Rake, I generally prefer it for two reasons: 1) it's an easy way to load the Rails environment. 2) it is easy to organize and have built in descriptions.
There are definitely times I prefer not using Rake, especially when needing to pass a lot of arguments/options.
@Michael, I originally planned to pass the arguments as separate parameters to system, but then realized this doesn't allow you to end it with "&" or direct the output to something else. Anyone know of a workaround?
Also, good points about security. It's a good idea to be extremely careful whenever passing web parameters to a shell command.
You will probably need to end up writing your own background thing, using something like:
Kernel.fork { system()... }
This is more or less what system() does internally. It will more than likely do a few more things (like closing specific file descriptors in the child, and making certain things are correct there.)
Another option is to have a simple shell script (forker.sh?) which you run instead of directly running rake. This shell script could even have commands in it to run "rake ... &" based on input.
Of course, it need not be a shell script, it could be a Ruby script too. All this is more complicated, but in all actuality, I dislike calling programs directly from the Rails framework unless they are intended to display something to the user, or to prod other processes into action.
I have done background tasks before (not in Rails, but in roll-your-own PHP coding) for a large DNS management system. Sometimes actions could take seconds and sometimes hours. We queued up the requests in a database table, and used PostgreSQL's NOTIFY command to tell listeners there was something to do. Doing a periodic (even once a second) SQL query was also added when we had to support MySQL as well. It wasn't a major delay.
@Asis
I had a similar problem using passenger and system() calls. I was on a CentOS box, and the root user owned my rails app. In passenger whoever owns the environment.rb file is the user passenger tries to run the rails app as, unless it is the root user. Then passenger runs as the PassengerDefaultUser, which is set to nobody. I changed that setting in my httpd.conf file to root and it worked fine. Obviously you don't want root running around doing things on a system exposed to the public. Try using chown and setting your rails app to someone who can run whatever command you want to run.
Thanks for the great video! One additional thing I had to do on my CentOS box to get this working was to specify the location of my Rakefile in the system call. I.e., I had to add:
The variable v must be escaped properly. This is a security vulnerability. Consider replacing the video, or adding overlay to it with the correct solution!
Something I was bit by lately was the SIGHUP signal.
I put a long running rake task in the background, making sure to redirect stderr to stdout. But when I logged out of my terminal, rake received a SIGHUP and aborted my processes.
Probably should have executed this with Resque or DelayedJob.
I'm not getting any output into the log file. How can I get it to write to file? Not sure what it's supposed to write to log in this tutorial (?), but it's not writing anything for me. Command system "rake reload --trace >> #{Rails.root}/log/rake.log &"
Finally the series I've been waiting for so long! Thanks a lot for your brilliant screencasts! They're one of the best Rails resources I know and extraordinarily well prepared.
And, by the way, nice new intro!
Ryan,
Very nice article!
I believe the forking (&) won't work under Windows. Any ideas on how to accomplish it in this case?
I was just about to mention, this won't work in Windows. The ampersand forking is a *nix only thing.
I don't believe there is a simple way to do this in windows, except to use the Ruby Process module with #fork. Not as easy, but similar functionality, just using the Ruby library.
http://www.ruby-doc.org/core-1.8.7/classes/Process.html#M000968
Interesting alternative to backgroundrb or starling, seems easier to set up.
I use rake tasks, launched with a cron job, for my background processing, but this is suitable only for periodical tasks. Your solution is fine for user-launched tasks.
Thanks!
for those asking about background processes in windows.
the best way to accomplish this from the command line is to use the "start" command.
http://www.ss64.com/nt/start.html
has a pretty good reference.
@Cassiano
if you want to "spawn" a process under windows you can do so by typing "start rake ..." then again you're better off having the whole path to the rake command.
Start will trigger a new command shell.
Jean-Marc
@Jean-Marc (and others)
Thanks for the Windows tip.
It does work, except that the command shell window remains open after execution of the task, making subsequent executions be silently ignored (the window needs to be manually closed before clicking 'Deliver' again).
Since it wasn't made fully explicit: the code in the screencast doesn't escape the env values (or keys), only upcases the key and puts some quotes around the value.
If regular users can pass in the parameters used, they can exploit it to run arbitrary code.
It was hinted at, but could be misunderstood to mean that the current code does proper escaping.
@tony and Jean-Marc, thanks for pointing out how to accomplish this in Windows. I don't have access to one at home so I can't fully test these. I'll add this to the show notes.
@Henrik, thanks for clarifying this. Along that line, does anyone know of a good way to escape a string to be entered into a shell command?
This is great, but what about firing off processes whose status the user is interested in later? For example, whether it finished successfully, unsuccessfully, what percentage complete it is, etc.?
Perhaps you'll cover this in the upcoming background processing episodes, but it would be nice to have some sort of rails-model driven status mechanism that could be tied into the rake technique shown here.
@Reuben
Taking the Mailing as an example, you could use an attributed called status where the background task updates it and your web application displays the current value (you could do it with a periodical refresh).
Would 'rake thinking_sphinx:index' or delta indices make for good uses of the approach outlined in this screencast? Or, are we better off with backgroundrb and the convenience of its :trigger_args intervals?
I've got an excel file as a view that takes ages to render. Can I generate it in the background and direct it to disk so I can have a link to it appear later?
Unrealiable, suboptimal, fast and easy. Just the way the rails community likes it ;]
I really don't mean to be an asshole, but how hard is it to set up backgroundrb and do this properly ?
@Reuben, this can be accomplished by regularly updating the database record associated to the task. In this case it could be the given Mailing record. True this could be extracted out into a generic jobs table, but at that point I think you're better off going with another solution (covering in future episodes).
@Neil, it depends on how often the indexing will be done. This approach is not optimal for frequent tasks, so if you're calling it on each record update then I'd say no.
@Amir, yep, that's definitely possible. The hardest part is notifying the user after it's done. One way to do that is just have the browser reload periodically (or ask them to come back later). You can update a database record at the end of the task so your app knows when it's done building and can display the link.
@AC, I plan to cover more "proper" ways on doing this kind of thing in the future. But as far as reliability is concerned, I haven't had any problems with this approach. Admittedly it is a bit of a hack, but it works okay (in my testing) and is much simpler than most alternatives.
@Amir,
How about sending an e-mail to the user (with an embedded link) when it's done?
@Ryan: Sure - it works, when it works. But if a task is killed for whatever reason (say, power outage), you've lost it along with all the data it carried. That is bad enough. And unless you check your logs religously, you won't even notice. Hell, it might not even be reflected in the log, that the task wasn't completed successfuly. Apart from that, there's the memory consumption and CPU time wasted. Seems ugly to me. I wouldn't dare rely on something like this in a serious production environment..
You probably want to add 2>&1 into that command somewhere so that you capture STDERR into your log file as well as STDOUT.
I've noticed that you tend to use Rake for all command line tasks in place of just a straight call to a Ruby script. Is there any reason for this other than the ability to easily load the Rails environment ? Is there any overhead to running Rake as opposed to a straight Ruby script ?
Hi Ryan,
Thanks for another great screencast. In your future ones about other ways to do background tasks, could you spend some time covering how/where to put the code that starts the process so that each time you redeploy your app it will automatically start up the background processes too? (and not wind up with a bunch of extra ones running either)
I've been using "spawn" for background stuff, but it requires a fair amount of manual starting and killing the way I'm doing it now. There must be a (much) better way!
thanks,
jp
@Jeff,
Deamons is pretty easy to use and will prevent the "more than one copy running at a time" problem ( http://daemons.rubyforge.org/ ).
The Rails Way book goes into this a little bit (enough to get started anyways) and you can avoid most of the problems listed above by making updates to the database as tasks complete. If you have 1,000 emails you need to send and after you send each one you update the row to show it was sent you'd know where it stopped, and with only a little more work you could have it pick up there again latter, if you had to.
I suppose this wouldn't work as well if the tasks take a very long time to execute, but I generally find that if that's the case there are probably ways to make it faster. Some code that is inside a loop that should be outside, or a way to break it down into smaller tasks rather than one giant one. Ask for help if you get stuck, sometimes you just can't see the answer yourself because you are too close to the problem and need a fresh persective.
That's my 2 cents anyways.
Ruby 1.8.7 introduced Shellwords.escape and String#shellescape.
I believe the best way to protect yourself is to not escape at all, but instead use the multiple-argument syntax to system:
Rather than system("ls -l"), use:
system("ls", "-l")
I believe this will bypass the shell and therefore any risk of injected characters.
> system("ls", "`ps ax`")
ls: `ps ax`: No such file or directory
Clearly those back-ticks we so worry about were considered arguments, not parsed by the shell.
Warning though, if you pass those to poorly written shell scripts, you are still risking a lot. Be careful, perhaps converting known-integer arguments using params[:id].to_i
Also, should you write an 'escape' program, you might just consider not doing that at all, and instead using a whitelist (not blacklist) to pass only specific character types through, such as A-Z a-z 0-9 and so on. Whitelisting is preferred because if you miss one, the app won't work, but if you blacklist you could leave a hole open.
Totally awesome new intro!
Since we were just talking about optimizing Rails code I thought I'd post this:
http://lsrc2008.confreaks.com/02-james-edward-gray-ii-hidden-gems.html
The guys at Railsenvy.com just posted a video about the talks that were done at Lone Star Ruby Conf and Confreaks have their videos of the individual talks online now.
@Matthew, good point. I'll add that to the code in the notes. As for using Rake, I generally prefer it for two reasons: 1) it's an easy way to load the Rails environment. 2) it is easy to organize and have built in descriptions.
There are definitely times I prefer not using Rake, especially when needing to pass a lot of arguments/options.
@Michael, I originally planned to pass the arguments as separate parameters to system, but then realized this doesn't allow you to end it with "&" or direct the output to something else. Anyone know of a workaround?
Also, good points about security. It's a good idea to be extremely careful whenever passing web parameters to a shell command.
@Ryan (re system and background tasks)
You will probably need to end up writing your own background thing, using something like:
Kernel.fork { system()... }
This is more or less what system() does internally. It will more than likely do a few more things (like closing specific file descriptors in the child, and making certain things are correct there.)
Another option is to have a simple shell script (forker.sh?) which you run instead of directly running rake. This shell script could even have commands in it to run "rake ... &" based on input.
Of course, it need not be a shell script, it could be a Ruby script too. All this is more complicated, but in all actuality, I dislike calling programs directly from the Rails framework unless they are intended to display something to the user, or to prod other processes into action.
I have done background tasks before (not in Rails, but in roll-your-own PHP coding) for a large DNS management system. Sometimes actions could take seconds and sometimes hours. We queued up the requests in a database table, and used PostgreSQL's NOTIFY command to tell listeners there was something to do. Doing a periodic (even once a second) SQL query was also added when we had to support MySQL as well. It wasn't a major delay.
Loving the new intro, wonderfully useful tutorial too!
@8 Regarding the start command:
You can make sure your cmd windows close after the emails are sent like this:
start cmd /c "title Email@%TIME% & cd c:/www & rake -T > emails.log 2>&1"
/c closes cmd window
2>&1 redirects STDERR
Hey Ryan, nice one! However it's not working under passenger here... Maybe its the way passenger processes reacts to the system call.
Any advice?
Thanks and keep up the good work!
glad to note that new intro is way better ;)
thanks a lot for your work, keep it up
@Asis
I had a similar problem using passenger and system() calls. I was on a CentOS box, and the root user owned my rails app. In passenger whoever owns the environment.rb file is the user passenger tries to run the rails app as, unless it is the root user. Then passenger runs as the PassengerDefaultUser, which is set to nobody. I changed that setting in my httpd.conf file to root and it worked fine. Obviously you don't want root running around doing things on a system exposed to the public. Try using chown and setting your rails app to someone who can run whatever command you want to run.
@Asis and @Justin
Passenger did not support background processes until the latest version 2.0.4. Just update your version of Passenger and all should work perfectly.
how would you call the call_rake method if your rake task was namespaced my:cool:rake:task? Would you still pass the whole path as a symbol?
Thanks for the great video! One additional thing I had to do on my CentOS box to get this working was to specify the location of my Rakefile in the system call. I.e., I had to add:
system "/usr/bin/rake #{task} #{args.join(' ')} --trace --rakefile #{RAILS_ROOT}/Rakefile 2>&1 >> #{Rails.root}/log/rake.log &"
@wysRd, @ryanb - is there anyway you can use a namespaced task in the call_rake method? I'm looking to do backup:db:dump
Thanks
@wysRd, @ryan
For namespaced rake tasks, it looks like call_rake "abc:def:ghi" should work.
It just gets fed into a command line call (system).
To really capture STDERR as well, I had to reverse these two:
>> log/rake.log 2>&1
Not very intuitive, but I believe this is how it works on many Unix-like systems
I think that the 2>&1 must be at the end, just before the final &. If you put it before the >> file, it doesn't seem to work.
Also, one should check the path for their rake. If you are using passenger, it could be /usr/local/bin/rake.
I've also added the --rakefile option, for precaution.
The complete line on my setup looks like this:
system
"/usr/local/bin/rake #{task} #{args.join(' ')} --rakefile #{Rails.root}/Rakefile >> #{Rails.root}/log/rake.log 2>&1 &"
Hello.
is there anyway to close automatically the shell in Windows after the rake task is executed?
Thanks.
SERIOUS SECURITY PROBLEM
In function
call_rake()
you have some shell escaping problem.The variable
v
must be escaped properly. This is a security vulnerability. Consider replacing the video, or adding overlay to it with the correct solution!/usr/bin/rake? ugly enough
replace it with
which rake
some zsh settings break
which rake
use with caution :)Also, you may want to use
bundle exec rake
Something I was bit by lately was the
SIGHUP
signal.I put a long running rake task in the background, making sure to redirect
stderr
tostdout
. But when I logged out of my terminal, rake received aSIGHUP
and aborted my processes.Probably should have executed this with Resque or DelayedJob.
I'm not getting any output into the log file. How can I get it to write to file? Not sure what it's supposed to write to log in this tutorial (?), but it's not writing anything for me. Command system "rake reload --trace >> #{Rails.root}/log/rake.log &"