Agenty Jobs API is used to start new background jobs by given agent_id, to get the job result, download results in CSV format etc.
Start a job
This API will start a new asynchronous job for the given agent_id in the request body.
Endpoint:
Method: POST
URL: https://api.agenty.com/v2/jobs/start
Headers:
Key | Value | Description |
---|---|---|
Content-Type | application/json |
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} |
Body:
{"agent_id":"{{AGENT_ID}}"}
Responses:
Status: OK | Code: 200
{
"job_id": 3689994,
"account_id": 59703,
"agent_id": "45l0wewqzk",
"type": "scraping",
"status": "submitted",
"priority": 5,
"pages_total": 0,
"pages_processed": 0,
"pages_succeeded": 0,
"pages_failed": 0,
"pages_credit": 0,
"created_at": "2022-01-10T11:32:18.2362494Z",
"started_at": null,
"completed_at": null,
"stopped_at": null,
"is_scheduled": false,
"queue_time": null,
"run_duration": null,
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": 1
}
Stop a running job
This API will send a stop request to Agenty workers running that particular job id in background.
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs/{{JOB_ID}}/stop
Headers:
Key | Value | Description |
---|---|---|
Content-Type | application/json |
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} | Your api key |
Responses:
Status: OK | Code: 400
{
"job_id": 3687911,
"account_id": 59703,
"agent_id": "w9qmlp5475",
"type": "scraping",
"status": "stopped",
"priority": 5,
"pages_total": 0,
"pages_processed": 0,
"pages_succeeded": 0,
"pages_failed": 0,
"pages_credit": 0,
"created_at": "2022-01-10T09:47:48Z",
"started_at": null,
"completed_at": "2022-01-10T09:49:16.8233821Z",
"stopped_at": "2022-01-10T09:49:16.8233806Z",
"is_scheduled": false,
"queue_time": null,
"run_duration": null,
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": null
}
Get job status by job id
Get the job status and other property associated with the job. E.g pages_credit, pages_processed etc.
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs/{{JOB_ID}}
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} | Your api key |
Responses:
Status: OK | Code: 200
{
"job_id": 3689856,
"account_id": 59703,
"agent_id": "45l0wewqzk",
"type": "scraping",
"status": "completed",
"priority": 5,
"pages_total": 1,
"pages_processed": 1,
"pages_succeeded": 1,
"pages_failed": 0,
"pages_credit": 1,
"created_at": "2022-01-10T11:10:55Z",
"started_at": "2022-01-10T11:10:56Z",
"completed_at": "2022-01-10T11:10:58Z",
"stopped_at": null,
"is_scheduled": false,
"queue_time": null,
"run_duration": null,
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": 3
}
Get job result by job id
This API will fetch the job result by given job id.
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs/{{JOB_ID}}/result
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} | Your api key |
offset | 0 | A number of lines to skip, for showing the next page. Must be number (int), use this to paginate when there are more than 2500 rows |
limit | 2500 | A number between 1 and 2500 to display maximum number of rows per page. Must be number (int) |
collection | 1 | The collection number you wants to fetch. Default is 1 |
modified | 1 | To fetch the modified result if post-processing script is used. By default is 1, to fetch the modified version when available or default otherwise. Use 0 if you want to force Agenty to fetch the default result only |
Responses:
Status: OK | Code: 200
{
"total": 2,
"limit": 1000,
"offset": 0,
"returned": 2,
"result": [
{
"name": "A Light in the ...",
"price": "£51.77",
"image": "http://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg",
"details_page_url": "http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
},
{
"name": "Tipping the Velvet",
"price": "£53.74",
"image": "http://books.toscrape.com/media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg",
"details_page_url": "http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"
}
]
}
Get job logs by job id
This API will fetch the job logs
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs/{{JOB_ID}}/logs
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} | Your api key |
offset | 0 | A number of lines to skip, for showing the next page. Must be number (int), use this to paginate when there are more then 2500 rows |
limit | 2500 | A number between 1 and 2500 to display maximum number of rows per page. Must be number (int) |
Responses:
Status: OK | Code: 200
2022-01-10T07:46:08.145Z INFO Worker id: 2
2022-01-10T07:46:08.169Z INFO Job: {"job_id":3683463,"agent_id":"459ypoj1gk","account_id":59703,"type":"scraping","status":"started","priority":16,"created_at":"2022-01-10T07:45:57Z","pages_total":0,"pages_processed":0,"pages_succeeded":0,"pages_failed":0,"pages_credit":0,"is_scheduled":0,"assigned_worker_id":null,"running_worker_id":2,"running_server_ip":null,"error":null,"attempts":null}
2022-01-10T07:46:08.250Z INFO Input type: url
2022-01-10T07:46:08.268Z INFO Job id: 3683463, Type: scraping, Status: running
2022-01-10T07:46:08.268Z INFO Total inputs: 1
2022-01-10T07:46:08.367Z INFO Plan: Free
2022-01-10T07:46:08.368Z INFO Proxy type: Default
2022-01-10T07:46:08.395Z INFO Running page 1 of 1
2022-01-10T07:46:08.395Z INFO https://raw.githubusercontent.com/Agenty/Agenty.TestData/master/scraping/csv/top-usa-retailers-2011.csv
2022-01-10T07:46:10.221Z INFO Status: 200
2022-01-10T07:46:10.230Z INFO REGEX: (\d+)\,(.*?)\,(\d*),(\d+.\d+) extracted 100 match(s) for field Rank
2022-01-10T07:46:10.230Z INFO REGEX: (\d+)\,(.*?)\,(\d*),(\d+.\d+) extracted 100 match(s) for field Retailer Name
2022-01-10T07:46:10.230Z INFO REGEX: (\d+)\,(.*?)\,(\d*),(\d+.\d+) extracted 100 match(s) for field # Stores
2022-01-10T07:46:10.230Z INFO REGEX: (\d+)\,(.*?)\,(\d*),(\d+.\d+) extracted 100 match(s) for field Revenue
2022-01-10T07:46:10.246Z INFO Job 3683463 completed successfully
2022-01-10T07:46:17.758Z INFO Preparing files for backup...
2022-01-10T07:46:17.812Z INFO Gzip files...
2022-01-10T07:46:17.814Z INFO 4 files gziped successfully
2022-01-10T07:46:17.833Z INFO Uploading 4 files to S3...
2022-01-10T07:46:17.967Z INFO collection1.csv.gz (Bytes: 1881) uploaded successfully
2022-01-10T07:46:17.967Z INFO collection1.json.gz (Bytes: 2050) uploaded successfully
2022-01-10T07:46:17.967Z INFO collection1.tsv.gz (Bytes: 1881) uploaded successfully
2022-01-10T07:46:17.967Z INFO input.txt.gz (Bytes: 117) uploaded successfully
2022-01-10T07:46:17.967Z INFO Backup completed successfully
Export job result by job id
This API will create a download link to download the job result or logs in CSV format.
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs/{{JOB_ID}}/export
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} | Your api key |
type | result | The type of file to export. Must be result or logs |
collection | 1 | The collection number you wants to export. Must be 1 or greater. Default is 1 |
modified | 1 | To export the modified result if post-processing script is used. By default is 1 to export modified version when available, Use 0 if you wants to download the default result |
filename | output | Use this to give custom name to your download file. Default is export.csv |
Responses:
Status: Download job result by job id | Code: 200
{
"downloadlink": "https://server1.agenty.com/Job_12995/output1.csv?signature=sdlfjasoywerxvjsaldfkjpwqeroiiu9123e7"
}
Get all jobs
Get all the jobs for all agents under an account
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs
Query params:
Key | Value | Description |
---|---|---|
apikey | {{API_KEY}} | Your api key |
Responses:
Status: OK | Code: 200
{
"total": 5,
"limit": 1000,
"offset": 0,
"returned": 5,
"result": [
{
"job_id": 3689527,
"account_id": 59703,
"agent_id": "lmqdjwd972",
"type": "scraping",
"status": "completed",
"priority": 0,
"pages_total": 1,
"pages_processed": 1,
"pages_succeeded": 1,
"pages_failed": 1,
"pages_credit": 1,
"created_at": "2022-01-10T10:16:41Z",
"started_at": "2022-01-10T10:19:59Z",
"completed_at": "2022-01-10T10:20:01Z",
"stopped_at": null,
"is_scheduled": false,
"queue_time": "00:03:18",
"run_duration": "00:00:02",
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": null
},
{
"job_id": 3689448,
"account_id": 59703,
"agent_id": "w9qmlp5475",
"type": "scraping",
"status": "completed",
"priority": 0,
"pages_total": 1,
"pages_processed": 1,
"pages_succeeded": 1,
"pages_failed": 0,
"pages_credit": 1,
"created_at": "2022-01-10T10:01:58Z",
"started_at": "2022-01-10T10:11:11Z",
"completed_at": "2022-01-10T10:11:13Z",
"stopped_at": null,
"is_scheduled": false,
"queue_time": "00:09:13",
"run_duration": "00:00:02",
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": null
},
{
"job_id": 3689064,
"account_id": 59703,
"agent_id": "w9qmlp5475",
"type": "scraping",
"status": "stopped",
"priority": 0,
"pages_total": 0,
"pages_processed": 0,
"pages_succeeded": 0,
"pages_failed": 0,
"pages_credit": 0,
"created_at": "2022-01-10T09:59:05Z",
"started_at": null,
"completed_at": "2022-01-10T09:59:37Z",
"stopped_at": "2022-01-10T09:59:37Z",
"is_scheduled": false,
"queue_time": null,
"run_duration": null,
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": null
}
]
}
Get jobs by agent id
Get all the historical jobs for given agent id
Endpoint:
Method: GET
URL: https://api.agenty.com/v2/jobs
Query params:
Key | Value | Description |
---|---|---|
agent_id | {{AGENT_ID}} | Your agent id |
apikey | {{API_KEY}} | Your api key |
Responses:
Status: OK | Code: 200
{
"total": 2,
"limit": 1000,
"offset": 0,
"returned": 2,
"result": [
{
"job_id": 3689773,
"account_id": 59703,
"agent_id": "45l0wewqzk",
"type": "scraping",
"status": "completed",
"priority": 0,
"pages_total": 1,
"pages_processed": 1,
"pages_succeeded": 1,
"pages_failed": 0,
"pages_credit": 1,
"created_at": "2022-01-10T10:55:27Z",
"started_at": "2022-01-10T10:55:28Z",
"completed_at": "2022-01-10T10:55:30Z",
"stopped_at": null,
"is_scheduled": false,
"queue_time": "00:00:01",
"run_duration": "00:00:02",
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": null
},
{
"job_id": 3683466,
"account_id": 59703,
"agent_id": "45l0wewqzk",
"type": "scraping",
"status": "completed",
"priority": 0,
"pages_total": 1,
"pages_processed": 1,
"pages_succeeded": 1,
"pages_failed": 0,
"pages_credit": 1,
"created_at": "2022-01-10T07:47:49Z",
"started_at": "2022-01-10T07:48:18Z",
"completed_at": "2022-01-10T07:48:21Z",
"stopped_at": null,
"is_scheduled": false,
"queue_time": "00:00:29",
"run_duration": "00:00:03",
"error": null,
"ping_at": null,
"assigned_worker_id": null,
"running_worker_id": null
}
]
}