Object
Main daemon controller object. See the README for an introduction and tutorial.
daemon_controller, library for robust daemon management Copyright © 2010, 2011, 2012 Phusion
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Create a new DaemonController object.
A human-readable, unique name for this daemon, e.g. “Sphinx search server”. This identifier will be used in some error messages. On some platforms, it will be used for concurrency control: on such platforms, no two DaemonController objects will operate on the same identifier on the same time.
The command to start the daemon. This must be a a String, e.g. “mongrel_rails start -e production”, or a Proc which returns a String.
If the value is a Proc, and the before_start option is given too, then the start_command Proc is guaranteed to be called after the before_start Proc is called.
The ping command is used to check whether the daemon can be connected to. It is also used to ensure that # only returns when the daemon can be connected to.
The value may be a command string. This command must exit with an exit code of 0 if the daemon can be successfully connected to, or exit with a non-0 exit code on failure.
The value may also be an Array which specifies the socket address of the daemon. It must be in one of the following forms:
The value may also be a Proc, which returns an expression that evaluates to true (indicating that the daemon can be connected to) or false (failure). If the Proc raises Errno::ECONNREFUSED, Errno::ENETUNREACH, Errno::ETIMEDOUT Errno::ECONNRESET, Errno::EINVAL or Errno::EADDRNOTAVAIL then that also means that the daemon cannot be connected to. NOTE: if the ping command returns an object which responds to #, then that method will be called on it. This makes it possible to specify a ping command such as lambda { TCPSocket.new('localhost', 1234) }, without having to worry about closing it afterwards. Any exceptions raised by # are ignored.
The PID file that the daemon will write to. Used to check whether the daemon is running.
The lock file to use for serializing concurrent daemon management operations. Defaults to “(filename of PID file).lock”.
The log file that the daemon will write to. It will be consulted to see whether the daemon has printed any error messages during startup.
A command to stop the daemon with, e.g. “/etc/rc.d/nginx stop”. If no stop command is given (i.e. nil), then DaemonController will stop the daemon by killing the PID written in the PID file.
The default value is nil.
This may be a Proc. It will be called just before running the start command. The before_start proc is not subject to the start timeout.
The maximum amount of time, in seconds, that # may take to start the daemon. Since # also waits until the daemon can be connected to, that wait time is counted as well. If the daemon does not start in time, then # will raise an exception.
The default value is 15.
The maximum amount of time, in seconds, that # may take to stop the daemon. Since # also waits until the daemon is no longer running, that wait time is counted as well. If the daemon does not stop in time, then # will raise an exception.
The default value is 15.
Once a daemon has gone into the background, it will become difficult to know for certain whether it is still initializing or whether it has failed and exited, until it has written its PID file. Suppose that it failed with an error after daemonizing but before it has written its PID file; not many system administrators want to wait 15 seconds (the default start timeout) to be notified of whether the daemon has terminated with an error.
An alternative way to check whether the daemon has terminated with an error, is by checking whether its log file has been recently updated. If, after the daemon has started, the log file hasn’t been updated for the amount of seconds given by the :log_file_activity_timeout option, then the daemon is assumed to have terminated with an error.
The default value is 7.
Normally daemon_controller will wait until the daemon has daemonized into the background, in order to capture any errors that it may print on stdout or stderr before daemonizing. However, if the daemon doesn’t support daemonization for some reason, then setting this option to true will cause daemon_controller to do the daemonization for the daemon.
The default is false.
Upon spawning the daemon, daemon_controller will normally close all file descriptors except stdin, stdout and stderr. However if there are any file descriptors you want to keep open, specify the IO objects here. This must be an array of IO objects.
# File lib/daemon_controller.rb, line 176 176: def initialize(options) 177: [:identifier, :start_command, :ping_command, :pid_file, :log_file].each do |option| 178: if !options.has_key?(option) 179: raise ArgumentError, "The ':#{option}' option is mandatory." 180: end 181: end 182: @identifier = options[:identifier] 183: @start_command = options[:start_command] 184: @stop_command = options[:stop_command] 185: @ping_command = options[:ping_command] 186: @ping_interval = options[:ping_interval] || 0.1 187: @pid_file = options[:pid_file] 188: @log_file = options[:log_file] 189: @before_start = options[:before_start] 190: @start_timeout = options[:start_timeout] || 15 191: @stop_timeout = options[:stop_timeout] || 15 192: @log_file_activity_timeout = options[:log_file_activity_timeout] || 7 193: @daemonize_for_me = options[:daemonize_for_me] 194: @keep_ios = options[:keep_ios] || [] 195: @lock_file = determine_lock_file(options, @identifier, @pid_file) 196: end
Connect to the daemon by running the given block, which contains the connection logic. If the daemon isn’t already running, then it will be started.
The block must return nil or raise Errno::ECONNREFUSED, Errno::ENETUNREACH, Errno::ETIMEDOUT, Errno::ECONNRESET, Errno::EINVAL and Errno::EADDRNOTAVAIL to indicate that the daemon cannot be connected to. It must return non-nil if the daemon can be connected to. Upon successful connection, the return value of the block will be returned by #.
Note that the block may be called multiple times.
Raises:
StartError - an attempt to start the daemon was made, but the start command failed with an error.
StartTimeout - an attempt to start the daemon was made, but the daemon did not start in time, or it failed after it has gone into the background.
ConnectError - the daemon wasn’t already running, but we couldn’t connect to the daemon even after starting it.
# File lib/daemon_controller.rb, line 231 231: def connect 232: connection = nil 233: @lock_file.shared_lock do 234: begin 235: connection = yield 236: rescue *ALLOWED_CONNECT_EXCEPTIONS 237: connection = nil 238: end 239: end 240: if connection.nil? 241: @lock_file.exclusive_lock do 242: if !daemon_is_running? 243: start_without_locking 244: end 245: connect_exception = nil 246: begin 247: connection = yield 248: rescue *ALLOWED_CONNECT_EXCEPTIONS => e 249: connection = nil 250: connect_exception = e 251: end 252: if connection.nil? 253: # Daemon is running but we couldn't connect to it. Possible 254: # reasons: 255: # - The daemon froze. 256: # - Bizarre security restrictions. 257: # - There's a bug in the yielded code. 258: if connect_exception 259: raise ConnectError, "Cannot connect to the daemon: #{connect_exception} (#{connect_exception.class})" 260: else 261: raise ConnectError, "Cannot connect to the daemon" 262: end 263: else 264: return connection 265: end 266: end 267: else 268: return connection 269: end 270: end
Returns the daemon’s PID, as reported by its PID file. Returns the PID as an integer, or nil there is no valid PID in the PID file.
This method doesn’t check whether the daemon’s actually running. Use # if you want to check whether it’s actually running.
Raises SystemCallError or IOError if something went wrong during reading of the PID file.
# File lib/daemon_controller.rb, line 300 300: def pid 301: @lock_file.shared_lock do 302: return read_pid_file 303: end 304: end
Checks whether the daemon is still running. This is done by reading the PID file and then checking whether there is a process with that PID.
Raises SystemCallError or IOError if something went wrong during reading of the PID file.
# File lib/daemon_controller.rb, line 312 312: def running? 313: @lock_file.shared_lock do 314: return daemon_is_running? 315: end 316: end
Start the daemon and wait until it can be pinged.
Raises:
AlreadyStarted - the daemon is already running.
StartError - the start command failed.
StartTimeout - the daemon did not start in time. This could also mean that the daemon failed after it has gone into the background.
# File lib/daemon_controller.rb, line 205 205: def start 206: @lock_file.exclusive_lock do 207: start_without_locking 208: end 209: end
Stop the daemon and wait until it has exited.
Raises:
StopError - the stop command failed.
StopTimeout - the daemon didn’t stop in time.
# File lib/daemon_controller.rb, line 277 277: def stop 278: @lock_file.exclusive_lock do 279: begin 280: Timeout.timeout(@stop_timeout, Timeout::Error) do 281: kill_daemon 282: wait_until do 283: !daemon_is_running? 284: end 285: end 286: rescue Timeout::Error 287: raise StopTimeout, "Daemon '#{@identifier}' did not exit in time" 288: end 289: end 290: end
# File lib/daemon_controller.rb, line 386 386: def before_start 387: if @before_start 388: @before_start.call 389: end 390: end
# File lib/daemon_controller.rb, line 457 457: def check_pid(pid) 458: Process.kill(0, pid) 459: return true 460: rescue Errno::ESRCH 461: return false 462: rescue Errno::EPERM 463: # We didn't have permission to kill the process. Either the process 464: # is owned by someone else, or the system has draconian security 465: # settings and we aren't allowed to kill *any* process. Assume that 466: # the process is running. 467: return true 468: end
# File lib/daemon_controller.rb, line 424 424: def daemon_is_running? 425: begin 426: pid = read_pid_file 427: rescue Errno::ENOENT 428: # The PID file may not exist, or another thread/process 429: # executing #running? may have just deleted the PID file. 430: # So we catch this error. 431: pid = nil 432: end 433: if pid.nil? 434: return false 435: elsif check_pid(pid) 436: return true 437: else 438: delete_pid_file 439: return false 440: end 441: end
This method does nothing and only serves as a hook for the unit test.
# File lib/daemon_controller.rb, line 508 508: def daemonization_timed_out 509: end
# File lib/daemon_controller.rb, line 452 452: def delete_pid_file 453: File.unlink(@pid_file) 454: rescue Errno::EPERM, Errno::EACCES, Errno::ENOENT # ignore 455: end
# File lib/daemon_controller.rb, line 550 550: def determine_lock_file(options, identifier, pid_file) 551: if options[:lock_file] 552: return LockFile.new(File.expand_path(options[:lock_file])) 553: else 554: return LockFile.new(File.expand_path(pid_file + ".lock")) 555: end 556: end
# File lib/daemon_controller.rb, line 532 532: def differences_in_log_file 533: if @original_log_file_stat 534: File.open(@log_file, 'r') do |f| 535: f.seek(@original_log_file_stat.size, IO::SEEK_SET) 536: diff = f.read.strip 537: if diff.empty? 538: return nil 539: else 540: return diff 541: end 542: end 543: else 544: return nil 545: end 546: rescue Errno::ENOENT 547: return nil 548: end
# File lib/daemon_controller.rb, line 740 740: def interruptable_waitpid(pid) 741: Process.waitpid(pid) 742: end
On Ruby 1.9, Thread#kill (which is called by timeout.rb) may not be able to interrupt Process.waitpid. So here we use a special version that’s a bit less efficient but is at least interruptable.
# File lib/daemon_controller.rb, line 748 748: def interruptable_waitpid(pid) 749: result = nil 750: while !result 751: result = Process.waitpid(pid, Process::WNOHANG) 752: sleep 0.01 if !result 753: end 754: return result 755: end
# File lib/daemon_controller.rb, line 400 400: def kill_daemon 401: if @stop_command 402: begin 403: run_command(@stop_command) 404: rescue StartError => e 405: raise StopError, e.message 406: end 407: else 408: kill_daemon_with_signal 409: end 410: end
# File lib/daemon_controller.rb, line 412 412: def kill_daemon_with_signal(force = false) 413: pid = read_pid_file 414: if pid 415: if force 416: Process.kill('SIGKILL', pid) 417: else 418: Process.kill('SIGTERM', pid) 419: end 420: end 421: rescue Errno::ESRCH, Errno::ENOENT 422: end
# File lib/daemon_controller.rb, line 516 516: def log_file_has_changed? 517: if @current_log_file_stat 518: stat = File.stat(@log_file) rescue nil 519: if stat 520: result = @current_log_file_stat.mtime != stat.mtime || 521: @current_log_file_stat.size != stat.size 522: @current_log_file_stat = stat 523: return result 524: else 525: return true 526: end 527: else 528: return false 529: end 530: end
Check whether there has been no recorded activity in the past seconds seconds.
# File lib/daemon_controller.rb, line 495 495: def no_activity?(seconds) 496: return Time.now - @last_activity_time > seconds 497: end
# File lib/daemon_controller.rb, line 499 499: def pid_file_available? 500: return File.exist?(@pid_file) && File.stat(@pid_file).size != 0 501: end
# File lib/daemon_controller.rb, line 443 443: def read_pid_file 444: pid = File.read(@pid_file).strip 445: if pid =~ /\A\d+\Z/ 446: return pid.to_i 447: else 448: return nil 449: end 450: end
# File lib/daemon_controller.rb, line 490 490: def record_activity 491: @last_activity_time = Time.now 492: end
# File lib/daemon_controller.rb, line 562 562: def run_command(command) 563: # Create tempfile for storing the command's output. 564: tempfile = Tempfile.new('daemon-output') 565: tempfile_path = tempfile.path 566: File.chmod(0666, tempfile_path) 567: tempfile.close 568: 569: if self.class.fork_supported? || Process.respond_to?(:spawn) 570: if Process.respond_to?(:spawn) 571: options = { 572: :in => "/dev/null", 573: :out => tempfile_path, 574: :err => tempfile_path, 575: :close_others => true 576: } 577: @keep_ios.each do |io| 578: options[io] = io 579: end 580: if @daemonize_for_me 581: ruby_interpreter = File.join( 582: Config::CONFIG['bindir'], 583: Config::CONFIG['RUBY_INSTALL_NAME'] 584: ) + Config::CONFIG['EXEEXT'] 585: pid = Process.spawn(ruby_interpreter, SPAWNER_FILE, 586: command, options) 587: else 588: pid = Process.spawn(command, options) 589: end 590: else 591: pid = safe_fork(@daemonize_for_me) do 592: ObjectSpace.each_object(IO) do |obj| 593: if !@keep_ios.include?(obj) 594: obj.close rescue nil 595: end 596: end 597: STDIN.reopen("/dev/null", "r") 598: STDOUT.reopen(tempfile_path, "w") 599: STDERR.reopen(tempfile_path, "w") 600: exec(command) 601: end 602: end 603: 604: # run_command might be running in a timeout block (like 605: # in #start_without_locking). 606: begin 607: interruptable_waitpid(pid) 608: rescue Errno::ECHILD 609: # Maybe a background thread or whatever waitpid()'ed 610: # this child process before we had the chance. There's 611: # no way to obtain the exit status now. Assume that 612: # it started successfully; if it didn't we'll know 613: # that later by checking the PID file and by pinging 614: # it. 615: return 616: rescue Timeout::Error 617: daemonization_timed_out 618: 619: # If the daemon doesn't fork into the background 620: # in time, then kill it. 621: begin 622: Process.kill('SIGTERM', pid) 623: rescue SystemCallError 624: end 625: begin 626: Timeout.timeout(5, Timeout::Error) do 627: begin 628: interruptable_waitpid(pid) 629: rescue SystemCallError 630: end 631: end 632: rescue Timeout::Error 633: begin 634: Process.kill('SIGKILL', pid) 635: interruptable_waitpid(pid) 636: rescue SystemCallError 637: end 638: end 639: raise DaemonizationTimeout 640: end 641: if $?.exitstatus != 0 642: raise StartError, File.read(tempfile_path).strip 643: end 644: else 645: cmd = "#{command} >\"#{tempfile_path}\"" 646: cmd += " 2>\"#{tempfile_path}\"" unless PLATFORM =~ /mswin/ 647: if !system(cmd) 648: raise StartError, File.read(tempfile_path).strip 649: end 650: end 651: ensure 652: File.unlink(tempfile_path) rescue nil 653: end
# File lib/daemon_controller.rb, line 655 655: def run_ping_command 656: if @ping_command.respond_to?(:call) 657: begin 658: value = @ping_command.call 659: if value.respond_to?(:close) 660: value.close rescue nil 661: end 662: return value 663: rescue *ALLOWED_CONNECT_EXCEPTIONS 664: return false 665: end 666: elsif @ping_command.is_a?(Array) 667: type, *args = @ping_command 668: 669: case type 670: when :tcp 671: socket_domain = Socket::Constants::AF_INET 672: hostname, port = args 673: sockaddr = Socket.pack_sockaddr_in(port, hostname) 674: when :unix 675: socket_domain = Socket::Constants::AF_LOCAL 676: sockaddr = Socket.pack_sockaddr_un(args[0]) 677: else 678: raise ArgumentError, "Unknown ping command type #{type.inspect}" 679: end 680: 681: begin 682: socket = Socket.new(socket_domain, Socket::Constants::SOCK_STREAM, 0) 683: begin 684: socket.connect_nonblock(sockaddr) 685: rescue Errno::ENOENT, Errno::EINPROGRESS, Errno::EAGAIN, Errno::EWOULDBLOCK 686: if select(nil, [socket], nil, 0.1) 687: begin 688: socket.connect_nonblock(sockaddr) 689: rescue Errno::EISCONN 690: end 691: else 692: raise Errno::ECONNREFUSED 693: end 694: end 695: return true 696: rescue Errno::ECONNREFUSED, Errno::ENOENT 697: return false 698: ensure 699: socket.close if socket 700: end 701: else 702: return system(@ping_command) 703: end 704: end
# File lib/daemon_controller.rb, line 706 706: def safe_fork(double_fork) 707: pid = fork 708: if pid.nil? 709: begin 710: if double_fork 711: pid2 = fork 712: if pid2.nil? 713: Process.setsid 714: yield 715: end 716: else 717: yield 718: end 719: rescue Exception => e 720: message = "*** Exception #{e.class} " << 721: "(#{e}) (process #{$$}):\n" << 722: "\tfrom " << e.backtrace.join("\n\tfrom ") 723: STDERR.write(e) 724: STDERR.flush 725: exit! 726: ensure 727: exit!(0) 728: end 729: else 730: if double_fork 731: Process.waitpid(pid) rescue nil 732: return pid 733: else 734: return pid 735: end 736: end 737: end
# File lib/daemon_controller.rb, line 511 511: def save_log_file_information 512: @original_log_file_stat = File.stat(@log_file) rescue nil 513: @current_log_file_stat = @original_log_file_stat 514: end
# File lib/daemon_controller.rb, line 392 392: def spawn_daemon 393: if @start_command.respond_to?(:call) 394: run_command(@start_command.call) 395: else 396: run_command(@start_command) 397: end 398: end
This method does nothing and only serves as a hook for the unit test.
# File lib/daemon_controller.rb, line 504 504: def start_timed_out 505: end
# File lib/daemon_controller.rb, line 319 319: def start_without_locking 320: if daemon_is_running? 321: raise AlreadyStarted, "Daemon '#{@identifier}' is already started" 322: end 323: save_log_file_information 324: delete_pid_file 325: begin 326: started = false 327: before_start 328: Timeout.timeout(@start_timeout, Timeout::Error) do 329: done = false 330: spawn_daemon 331: record_activity 332: 333: # We wait until the PID file is available and until 334: # the daemon responds to pings, but we wait no longer 335: # than @start_timeout seconds in total (including daemon 336: # spawn time). 337: # Furthermore, if the log file hasn't changed for 338: # @log_file_activity_timeout seconds, and the PID file 339: # still isn't available or the daemon still doesn't 340: # respond to pings, then assume that the daemon has 341: # terminated with an error. 342: wait_until do 343: if log_file_has_changed? 344: record_activity 345: elsif no_activity?(@log_file_activity_timeout) 346: raise Timeout::Error, "Daemon seems to have exited" 347: end 348: pid_file_available? 349: end 350: wait_until(@ping_interval) do 351: if log_file_has_changed? 352: record_activity 353: elsif no_activity?(@log_file_activity_timeout) 354: raise Timeout::Error, "Daemon seems to have exited" 355: end 356: run_ping_command || !daemon_is_running? 357: end 358: started = run_ping_command 359: end 360: result = started 361: rescue DaemonizationTimeout, Timeout::Error => e 362: start_timed_out 363: if pid_file_available? 364: kill_daemon_with_signal(true) 365: end 366: if e.is_a?(DaemonizationTimeout) 367: result = :daemonization_timeout 368: else 369: result = :start_timeout 370: end 371: end 372: if !result 373: raise(StartError, differences_in_log_file || 374: "Daemon '#{@identifier}' failed to start.") 375: elsif result == :daemonization_timeout 376: raise(StartTimeout, differences_in_log_file || 377: "Daemon '#{@identifier}' didn't daemonize in time.") 378: elsif result == :start_timeout 379: raise(StartTimeout, differences_in_log_file || 380: "Daemon '#{@identifier}' failed to start in time.") 381: else 382: return true 383: end 384: end
# File lib/daemon_controller.rb, line 470 470: def wait_until(sleep_interval = 0.1) 471: while !yield 472: sleep(sleep_interval) 473: end 474: end
# File lib/daemon_controller.rb, line 483 483: def wait_until_daemon_responds_to_ping_or_has_exited_or_log_file_has_changed 484: while !(run_ping_command || !daemon_is_running? || log_file_has_changed?) 485: sleep(@ping_interval) 486: end 487: return run_ping_command 488: end
# File lib/daemon_controller.rb, line 476 476: def wait_until_pid_file_is_available_or_log_file_has_changed 477: while !(pid_file_available? || log_file_has_changed?) 478: sleep 0.1 479: end 480: return pid_file_is_available? 481: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.