Core library¶
Core library is a set of wrapped API call and message handling classes. All of these classes could be used without extra dependency.
livy.LivyClient¶
- class livy.client.LivyClient¶
Client that wraps requests to Livy server
This implementation follows Livy API v0.7.0 spec.
- __init__(url, verify=True, timeout=30.0)¶
- Parameters
url (str) – URL to the livy server
verify (Union[bool, ssl.SSLContext]) – Verifies SSL certificates or not; or use customized SSL context
timeout (float) – Timeout seconds for the connection.
- Raises
TypeError – On a invalid data type is used for inputted argument
OperationError – On URL scheme is not supportted
- Return type
Note
This package is designed to use lesser third-party libraries, thus it does not use high-level interface to handling the requests. And it results with limitations that we could not use such rich features as requests.
- check(capture=True)¶
Check if server is up.
- Parameters
capture (bool) – Capture the exception and returns boolean. Set to
False
for raiselivy.exception.RequestError
on error.- Return type
- create_batch(file, proxy_user=None, class_name=None, args=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, conf=None)¶
Request to create a batch.
- Parameters
file (str) – File containing the application to execute
proxy_user (str) – User to impersonate when running the job
class_name (str) – Application Java/Spark main class
args (List[str]) – Command line arguments for the application
jars (List[str]) – Java dependencies to be used in this batch
py_files (List[str]) – Python dependencies to be used in this batch
files (List[str]) – files to be used in this batch
driver_memory (str) – Amount of memory to use for the driver process
driver_cores (int) – Number of cores to use for the driver process
executor_memory (str) – Amount of memory to use per executor process
executor_cores (int) – Number of cores to use for each executor
num_executors (int) – Number of executors to launch for this batch
archives (List[str]) – Archives to be used in this batch
queue (str) – The name of the YARN queue to which submitted
name (str) – The session name to execute this batch
- Returns
batch – Created batch object from livy server
- Return type
- Raises
TypeError – On input parameters not matches expected data type
RequestError – On connection error
- delete_batch(batch_id)¶
Kill the batch job.
- Parameters
batch_id (int) – Batch ID
- Raises
TypeError – On input parameters not matches expected data type
RequestError – On connection error
- Return type
- get_batch_information(batch_id)¶
Get summary information to specific batch.
- Parameters
batch_id (int) – Batch ID
- Returns
batch – Batch information form livy server
- Return type
- Raises
TypeError – On input parameters not matches expected data type
RequestError – On connection error
- get_batch_log(batch_id, from_=None, size=None)¶
Get logs from the batch.
- get_batch_state(batch_id)¶
Get state of the batch.
- Parameters
batch_id (int) – Batch ID
- Returns
state – Current state; Literally
starting
,running
,dead
,killed
, orsuccess
.- Return type
- Raises
TypeError – On input parameters not matches expected data type
RequestError – On connection error
Note
There is no complete batch state list in official document.
The given state list is based on my observation in real practice (perhaps we could check it’s source code as confirmation?). Since both state from session list and statement list could be observed, the complete possible list could be:
not_started
,starting
,available
,idle
,waiting
,busy
,running
,shutting_down
,error
,dead
,killed
,success
,cancelling
andcancelled
.
- is_batch_ended(batch_id)¶
Check batch state and return
True
if it is finished.- Parameters
batch_id (int) – Batch ID
- Returns
finished – Task is over
- Return type
- Raises
TypeError – On input parameters not matches expected data type
RequestError – On connection error
livy.LivyBatchLogReader¶
- class livy.logreader.LivyBatchLogReader¶
Read Livy batch logs and publish to Python’s
logging
infrastructure.- __init__(client, batch_id, timezone=datetime.timezone.utc, prefix=None)¶
- Parameters
client (livy.client.LivyClient) – Livy client that is pre-configured
batch_id (int) – Batch ID to be watched
timezone (datetime.tzinfo) – Server time zone
prefix (str) – Prefix to be added to logger name
- Raises
TypeError – On a invalid data type is used for inputted argument
- Return type
- add_parsers(pattern, parser)¶
Add log parser to this reader.
- Parameters
pattern (re.Pattern) – Regex pattern to match the log. Note the pattern must match line start (
^
) symbol and compiled with multiline (re.M
) flag.parser (callable) – The parser to extract fields from the
re.Match
object. It should takesre.Match
as input and returnslivy.logreader.LivyLogParseResult
instance.
- Returns
- Return type
No return. It raises exception on any error.
Note
- The pattern must wrapped entire log
The
re.Match
object is also used to locate the position in the log. It might cause error if the regex pattern does not match entire log lines.- Do not directly emit log from the parser
The fetched log might have overlap with previous action, this reader does cached the processed result and prevent duplicated logs emitted.
- read()¶
Read log once.
- Returns
- Return type
No data return. All logs would be pipe to Python’s
logging
.
Note
Livy does not split logs into different object or something. What we could be reterived from server is a list of string that mixed with outputs in stdout and stderr.
This function is degned to match through each of known format via regex, and fallback to stdout/stderr if we could not parse it.
Parsers are pluggable. Beyond the builtin parsers, read instruction from docstring of
add_parser()
.
- read_until_finish(block=True, interval=0.4)¶
Keep monitoring and read logs until the task is finished.
- Parameters
- Returns
- Return type
No data return. All logs would be pipe to Python’s
logging
.
See also
- stop_read()¶
Stop background which is created by
read_until_finish()
. Only takes effect after it is created.
- class livy.logreader.LivyLogParseResult¶
Log parse result.
- static __new__(_cls, created, level, name, message)¶
Create new instance of LivyLogParseResult(created, level, name, message)
- Parameters
created (datetime.datetime) –
level (int) –
name (str) –
message (str) –
- created: datetime.datetime¶
Timestamp that this log is created. Could be
None
if we could not determine when does it created. The system would fill with the time which last log is created.