Python Job Runner

For 5.20.0-5.29.0, Python 2.7 is the system default. For Amazon EMR version 5.30.0 and later, Python 3 is the system default. To upgrade the Python version that PySpark uses, point the PYSPARKPYTHON environment variable for the spark-env classification to the directory where Python 3.4 or 3.6 is installed. From crontab import CronTab cron = CronTab(user= 'username') job = cron.new(command= 'python example1.py') job.minute.every(1) cron.write In the above code we have first accessed cron via the username, and then created a job that consists of running a Python script named example1.py. In addition, we have set the task to be run every 1 minute. And simply use @Job Decorator in your Flask functions: from flask.ext.rq import job @job def process(i): # Long stuff to process process.delay(3) And finally you need rqworker to start the worker: rqworker. You can see RQ docs for more info. RQ designed for simple long running processes.

Python Job Runner Download

The file with the job class is sent to Hadoop to be run. Therefore, the job file cannot attempt to start the Hadoop job, or you would be recursively creating Hadoop jobs! The code that runs the job should only run outside of the Hadoop context. The if name 'main' block is only run if you invoke the job file as a script.

Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python codeto be executed later, either just once or periodically. You can add new jobs or remove old ones onthe fly as you please. If you store your jobs in a database, they will also survive schedulerrestarts and maintain their state. When the scheduler is restarted, it will then run all the jobsit should have run while it was offline 1.

Among other things, APScheduler can be used as a cross-platform, application specific replacementto platform specific schedulers, such as the cron daemon or the Windows task scheduler. Pleasenote, however, that APScheduler is not a daemon or service itself, nor does it come with anycommand line tools. It is primarily meant to be run inside existing applications. That said,APScheduler does provide some building blocks for you to build a scheduler service or to run adedicated scheduler process.

APScheduler has three built-in scheduling systems you can use:

Job
  • Cron-style scheduling (with optional start/end times)

  • Interval-based execution (runs jobs on even intervals, with optional start/end times)

  • One-off delayed execution (runs jobs once, on a set date/time)

You can mix and match scheduling systems and the backends where the jobs are stored any way youlike. Supported backends for storing jobs include:

  • Memory

  • SQLAlchemy (any RDBMS supported by SQLAlchemy works)

APScheduler also integrates with several common Python frameworks, like:

  • asyncio (PEP 3156)

  • Qt (using eitherPyQt ,PySide2 orPySide)

There are third party solutions for integrating APScheduler with other frameworks:

1

The cutoff period for this is also configurable.

Latest version

Released:

Job-Runner Worker

Project description

This package contains the Job-Runner Worker, which is responsible for executingthe scheduled jobs managed by the Job-Runner.

Installation

Requirements (depending on your distro, the naming might be a bit different):

  • python-dev
  • build-essential
  • libevent-dev

Then you should be able to install this package withpip install job-runner-worker.

If you want to install this package in development mode, clone this repositoryand then execute python setup.py develop. In the latter, you might wantto install the testing requirements by executingpip install -rtest-requirements.txt.

See the getting started section in the Job-Runner documentation (in the job-runner repo) for setting up the whole project.

Configuration

Example with required settings:

All available settings

api_base_url
The base URL which will be used to access the API. This should start withhttp:// or https://.
api_key
Public-key to access the API.
secret
Private-key to access the API.
concurrent_jobs
The number of jobs to run concurrently. Default: 4.
log_level

The log level. Default: 'info'. Valid options are:

  • debug
  • info
  • warning
  • error
max_log_bytes
The maximum number of bytes of the log that is sent back to the API. Thisis to avoid 413 Request Entity Too Large errors. If the log will belarger than this value, 20% of the allowed size will be taken from the topof the log, the remaining 80% will be taken from the bottom. Everythingin between will be truncated. Default: 819200 (800kb).
ws_server_hostname
The hostname of the WebSocket Server.
ws_server_port
The port of the WebSocket Server. Default: 5555.
script_temp_path
The path where the scripts that are being executed through the Job-Runnerare temporarily stored. Default: '/tmp'.
broadcaster_server_hostname
The hostname of the queue broadcaster server.
broadcaster_server_port
The port of the queue broadcaster server. Default: 5556.
reconnect_after_inactivity
Seconds after which the subscriber is re-connecting to the publisherwhen no data has been received. Default: 300. This is useful when youare loadbalancing the publisher and it keeps the TCP connection open on thefront-end, when the connection on the back-end has been closed. Because ofthis ZMQ doesn’t detect that it is not connected anymore and jobs getstuck.

Command-line usage

For starting the worker, you can use the job_runner_worker command:

Changes

v2.1.2

  • Rollback retry on 4xx errors. Instead, recover when an unexpected erroroccurs in the execute_run, enqueue_actions, or kill_run. Thiswill recover from when a run was claimed by two workers (e.g. in the casewhen it was sent to worker a, which doesn’t respond directly, then it wassent to worker b which claims it after which a claims it too).

v2.1.1

  • Make sure a shebang does exist on scripts to be run. Use shlex to makePopen safer.
  • Retry request 5x when the response is in the 4xx range before raising anexception.

v2.1.0

  • On ping response, send back the version of the worker and the number ofconcurrent jobs. This version requires that you have job-runner>=3.4.0running.

v2.0.3

  • Update error message when job does not start to be more verbose and specific.

v2.0.2

  • Fix the case where in case of an exception, the run was marked as completedbut not started.

v2.0.1

  • Make sure to only cleanup runs that are assigned to the worker. This versionis dependent on job-runner>=3.0.1.

v2.0.0

  • Make the worker compatible with the new worker-pool structure.IMPORTANT: This version is dependent on job-runner>=2.0.0!
  • Change SETTINGS_PATH environment variable to CONFIG_PATH for betternaming consistency.
  • Make sure that when a run already has log, it is updated (before it wouldhang on the database integrity error).

v1.2.1

  • Make the worker crash early instead of hanging on errors happening before theactual job starts, to give the user a visible cue that something went wrong.

v1.2.0

  • The worker will now terminate gracefully when receiving the TERM signal.This means that all pending jobs will be completed, but that it will notaccept any new jobs. After finishing the last pending job, the worker willterminate.

v1.1.4

  • Set reconnect_after_inactivity default to 10 minutes. This is 2 x theJOB_RUNNER_WORKER_PING_INTERVAL default setting in Job-Runner.

v1.1.2

  • Add and implement reconnect_after_inactivity setting.

Python Job Runners

v1.1.1

  • Run script by finding their shebang without the x bit being needed.

v1.1.0

  • Handle separate run log-output resource. This requires Job-Runner >= v1.3.0.

v1.0.7

  • Fix killing job-runs. Where v1.0.5 was killing children processes, it didnot kill children of children, … This should kill the full tree ofchild-processes.

v1.0.6

  • Freeze requests library version, since 1.0.0 contains backwards compatiblechanges.

v1.0.5

  • Fix killing job-runs. When the process had sub-processes, only the parentprocess was killed and the worker was waiting for the child-processes tocomplete.

v1.0.4

  • Add config variable max_log_bytes to limit the amount of logdata thatwill be send back to the API (to avoid 413 Request Entity Too Largeerrors).

v1.0.3

  • Send pid back to the REST API when a job has been started.
  • Kill a job-run when a kill action is received.

v1.0.1

  • Make the timezones send to the REST API timezone aware.

v0.7.1

  • Fix encoding issue when writing the file.

v0.7.0

  • Refactor to make the worker compatible with the 0.7 version of thejob-runner package.
  • Make it consume runs from the queue broadcaster instead of hitting the RESTinterface every x seconds.
  • Add retry on error to recover from temporary REST interface errors.

v0.6.1

  • Merge fixes v0.5.1 and v0.5.2 into v0.6.x version.

v0.6.0

  • Refactor to make use of separate WebSocket Server.

v0.5.2

  • Make temporary path for scripts configurable.

v0.5.0

  • Initial release.

Release historyRelease notifications | RSS feed

Python Job Runner Tutorial

2.1.2

2.1.1

2.1.0

2.0.2

2.0.1

2.0.0

1.2.1

Python Job Runner

1.2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for job-runner-worker, version 2.1.2
Filename, sizeFile typePython versionUpload dateHashes
Filename, size job-runner-worker-2.1.2.tar.gz (11.8 kB) File type Source Python version None Upload dateHashes
Close

Hashes for job-runner-worker-2.1.2.tar.gz

Python Job Runner Online

Hashes for job-runner-worker-2.1.2.tar.gz
AlgorithmHash digest
SHA256bde3bfe1373e0bfcbbc8db2cc3a74c9dd085a26f6e09fc631b601b2b4c0342ea
MD5c1213ffe307777b536bd6c9107aa5a48
BLAKE2-2566ba62a449edfc9156bd7539c4a641a79d729b04ff50a105aee157c21022d5cd6