Article Extractor API
Introductionā
Ujeebu Extract converts a news or blog article into structured JSON data. It extracts the main text and html bodies, the author, publish date, any embeddable media such as YouTube and twitter cards, the RSS feeds or social feeds (Facebook/Twitter timelines or YouTube channels) among other relevant pieces of data.
To use API, subscribe to a plan here and connect to :
GET https://api.ujeebu.com/extract
Parametersā
Name | Type | Description | Default Value |
---|---|---|---|
url REQUIRED | string | URL of article to be extracted. | - |
raw_html | string | HTML of article to be extracted. When this is passed, article extraction is carried out on the value of this parameter (i.e. without fetching article from url ), however the extractor still relies on url to resolve relative links and relatively referenced assets in the provided html. | |
js | boolean | indicates whether to execute JavaScript or not. Set to 'auto' to let the extractor decide. | false |
text | boolean | indicates whether API should return extracted text. | true |
html | boolean | indicates whether API should extract html. | true |
media | boolean | indicates whether API should extract media. | false |
feeds | boolean | indicates whether API should extract RSS feeds. | false |
images | boolean | indicates whether API should extract all images present in HTML. | true |
author | boolean | indicates whether API should extract article's author. | true |
pub_date | boolean | indicates whether API should extract article's publish date. | true |
partial | number | number of characters or percentage of text (if percent sign is present) of text/html to be returned. 0 means all. | 0 |
is_article | boolean | when true returns the probability [0-1] of URL being an article. Anything scoring 0.5 and above should be an article, but this may slightly vary from one site to another. | true |
quick_mode | boolean | when true, does a quick analysis of the content instead of the normal advanced parsing. Usually cuts down response time by about 30% to 60%. | false |
strip_tags | csv-string | indicates which tags to strip from the extracted article HTML. Expects a comma separated list of tag names/css selectors. | form |
timeout | number | maximum number of seconds before request timeout | 60 |
js_timeout | number | when js is enabled, indicates how many seconds the API should wait for the JS engine to render the supplied URL. | timeout /2 |
scroll_down | boolean | indicates whether to scroll down the page or not, this applies only when js is enabled. | true |
image_analysis | boolean | indicates whether API should analyse images for minimum width and height (see parameters min_image_width and min_image_height for more details). | true |
min_image_width | number | minimum width of the images kept in the HTML (if image_analysis is false this parameter has no effect). | 200 |
min_image_height | number | minimum height of the images kept in the HTML (if image_analysis is false this parameter has no effect). | 100 |
image_timeout | number | image fetching timeout in seconds. | 2 |
return_only_enclosed_text_images | boolean | indicates whether to return only images that are enclosed within extracted article HTML. | true |
proxy_type | string | indicates type of proxy to use. Possible values: 'rotating', 'advanced', 'premium', 'residential', 'custom'. | rotating |
proxy_country | string | country ISO 3166-1 alpha-2 code to proxy from. Valid only when premium proxy type is chosen. | US |
custom_proxy | string | URI for your custom proxy in the following format: scheme://user:pass@host:port . applicable and required only if proxy_type=custom | null |
auto_proxy | string | enable a more advanced proxy by default when rotating proxy is not working. It will move to the next proxy option until it gets the content and will only stop when content is available or none of the options worked. Please note that you are billed only on the top option attempted. | false |
session_id | alphanumeric | alphanumeric identifier with a length between 1 and 16 characters, used to route multiple requests from the same proxy instance. Sessions remain active for 30 minutes | null |
pagination | boolean | extract and concatenate multiple-page articles. | true |
pagination_max_pages | string | indicates the number of pages to extract when pagination is enabled. | 30 |
UJB-headerName | string | indicates which headers to send to target URL. This can be useful when article is behind a paywall for example, and that you need to pass your authentication cookies. | null |
Responsesā
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | successful operation | SuccessResponse |
400 | Bad Request | Invalid parameter value | APIResponseError |
Schemasā
Article Schemaā
{
"url": "string",
"canonical_url": "string",
"title": "string",
"text": "string",
"html": "string",
"summary": "string",
"image": "string",
"images": ["string"],
"media": ["string"],
"language": "string",
"author": "string",
"pub_date": "string",
"modified_date": "string",
"site_name": "string",
"favicon": "string",
"encoding": "string"
}
Propertiesā
Name | Type | Description |
---|---|---|
url | string | the URL parameter. |
canonical_url | string | the final (resolved) URL. |
title | string | the title of the article. |
text | string | the extracted text. |
html | string | the extracted html. |
summary | string | summary (if available) of the article text. |
image | string | main image of the article. |
images | [string] | all images present in article. |
media | [string] | all media present in article. |
language | string | language code of article text. |
author | string | author of article. |
pub_date | string | publication date of article. |
modified_date | string | last modified date of article. |
site_name | string | name of site hosting article. |
favicon | string | favicon of site hosting article. |
encoding | string | character encoding of article text. |
Success Response exampleā
{
"article": {
"text": "I began learning German at the age of 13, and I\u2019m still trying to explain to myself why it was love at first sound. The answer must surely be: the excellence of my teacher. At an English public school not famed for its cultural generosity, Mr King was that rare thing: a kindly and intelligent man who, in the thick of the second world war, determinedly loved the Germany that he knew was still there somewhere.\nRather than join the chorus of anti-German propaganda, he preferred, doggedly, to inspire his little class with the beauty of the language, and of its literature and culture. One day, he used to say, the real Germany will come back. And he was right. Because now it has.\nWhy was it love at first sound for me? Well...",
"html": "<p><span>I<\/span> began learning German at the age of 13, and Iām still trying to explain to myself why it was love at first sound. The answer must surely be: the excellence of my teacher. At an English public school not famed for its cultural generosity, Mr King was that rare thing: a kindly and intelligent man who, in the thick of the second world war, determinedly loved the Germany that he knew was still there somewhere.<\/p><p>Rather than join the chorus of anti-German propaganda, he preferred, doggedly, to inspire his little class with the beauty of the language, and of its literature and culture. One day, he used to say, the real Germany will come back. And he was right. Because now it has....",
"media": [],
"images": [],
"author": "John le Carr\u00e9",
"pub_date": "2017-07-01 23:05:12",
"is_article": 1,
"url": "https:\/\/www.theguardian.com\/education\/2017\/jul\/02\/why-we-should-learn-german-john-le-carre",
"canonical_url": "https:\/\/www.theguardian.com\/education\/2017\/jul\/02\/why-we-should-learn-german-john-le-carre",
"title": "Why we should learn German | John le Carr\u00e9",
"language": "en",
"image": "https:\/\/i.guim.co.uk\/img\/media\/f19eff6f7e1751d88b38e725cfbe6687084d5f64\/0_235_9010_5405\/master\/9010.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdG8tb3BpbmlvbnMtYWdlLTIwMTcucG5n&enable=upscale&s=efeec857dffdb94cd84c4b652b4e287f",
"summary": "To help make the European debate decent and civilised, it is now more important than ever to value the skills of the linguist",
"modified_date": "2017-12-02 03:00:56",
"site_name": "the Guardian",
"favicon": "https:\/\/static.guim.co.uk\/images\/favicon-32x32.ico",
"encoding": "utf-8",
"time": 0.85
}
}
Error Response Schemaā
{
"url": "string",
"message": "string",
"error_code": "string",
"errors": ["string"]
}
Propertiesā
Name | Type | Description |
---|---|---|
url | string | Given URL |
message | string | Error message |
error_code | string | Error code |
errors | [string] | List of all errors |
Response Codesā
Code | Billed | Meaning | Suggestion |
---|---|---|---|
200 | Yes | Successful request | - |
400 | NO | Some required parameter is missing (URL) | Set |
401 | NO | Missing API-KEY | Provide API-KEY |
404 | YES | Provided URL not found | Provide a valid URL |
408 | YES | Request timeout | Increase timeout parameter, use premium proxy or force JS |
429 | NO | Too many requests | upgrade your plan |
500 | NO | Internal error | Try request or contact us |
Examplesā
Stripping tagsā
If you want to delete some html element(s) before the extraction is carried out, use parameter strip_tags
to pass a comma-separated list of css selectors of elements to delete.
The example below will remove any meta, form and input tags as well as any element with class hidden
curl -i \
-H 'ApiKey: <API Key>' \
-X GET \
"https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&strip_tags=meta,form,.hidden,input"
- cURL
- NodeJs
- Python
- Java
- PHP
- Go
curl --location --request GET 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form' \--header 'ApiKey: <API Key>'
var request = require('request');var options = {'method': 'GET','url': 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form','headers': {'ApiKey': '<API Key>'}};request(options, function (error, response) {if (error) throw new Error(error);console.log(response.body);});
import requestsurl = "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form"payload={}headers = {'ApiKey': '<API Key>'}response = requests.request("GET", url, headers=headers, data=payload)print(response.text)
OkHttpClient client = new OkHttpClient().newBuilder().build();Request request = new Request.Builder().url("https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form").method("GET", null).addHeader("ApiKey", "<API Key>").build();Response response = client.newCall(request).execute();
<?php$curl = curl_init();curl_setopt_array($curl, array(CURLOPT_URL => 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form',CURLOPT_RETURNTRANSFER => true,CURLOPT_ENCODING => '',CURLOPT_MAXREDIRS => 10,CURLOPT_TIMEOUT => 0,CURLOPT_FOLLOWLOCATION => true,CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,CURLOPT_CUSTOMREQUEST => 'GET',CURLOPT_HTTPHEADER => array('ApiKey: <API Key>'),));$response = curl_exec($curl);curl_close($curl);echo $response;
package mainimport ("fmt""net/http""io/ioutil")func main() {url := "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form"method := "GET"client := &http.Client {}req, err := http.NewRequest(method, url, nil)if err != nil {fmt.Println(err)return}req.Header.Add("ApiKey", "<API Key>")res, err := client.Do(req)if err != nil {fmt.Println(err)return}defer res.Body.Close()body, err := ioutil.ReadAll(res.Body)if err != nil {fmt.Println(err)return}fmt.Println(string(body))}
Passing custom headersā
curl -i \
-H 'UJB-Username: username' \
-H 'UJB-Authorisation: Basic dXNlcm5hbWU6cGFzc3dvcmQ=' \
-H 'ApiKey: <API Key>' \
-X GET \
https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html
- cURL
- NodeJs
- Python
- Java
- PHP
- Go
curl --location --request GET 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0' \--header 'UJB-User-Agent: Custom user agent' \--header 'ApiKey: <API Key>'
var request = require('request');var options = {'method': 'GET','url': 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0','headers': {'UJB-User-Agent': 'Custom user agent','ApiKey': '<API Key>'}};request(options, function (error, response) {if (error) throw new Error(error);console.log(response.body);});
import requestsurl = "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0"payload={}headers = {'UJB-User-Agent': 'Custom user agent','ApiKey': '<API Key>'}response = requests.request("GET", url, headers=headers, data=payload)print(response.text)
OkHttpClient client = new OkHttpClient().newBuilder().build();Request request = new Request.Builder().url("https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0").method("GET", null).addHeader("UJB-User-Agent", "Custom user agent").addHeader("ApiKey", "<API Key>").build();Response response = client.newCall(request).execute();
<?php$curl = curl_init();curl_setopt_array($curl, array(CURLOPT_URL => 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0',CURLOPT_RETURNTRANSFER => true,CURLOPT_ENCODING => '',CURLOPT_MAXREDIRS => 10,CURLOPT_TIMEOUT => 0,CURLOPT_FOLLOWLOCATION => true,CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,CURLOPT_CUSTOMREQUEST => 'GET',CURLOPT_HTTPHEADER => array('UJB-User-Agent: Custom user agent','ApiKey: <API Key>'),));$response = curl_exec($curl);curl_close($curl);echo $response;
package mainimport ("fmt""net/http""io/ioutil")func main() {url := "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0"method := "GET"client := &http.Client {}req, err := http.NewRequest(method, url, nil)if err != nil {fmt.Println(err)return}req.Header.Add("UJB-User-Agent", "Custom user agent")req.Header.Add("ApiKey", "<API Key>")res, err := client.Do(req)if err != nil {fmt.Println(err)return}defer res.Body.Close()body, err := ioutil.ReadAll(res.Body)if err != nil {fmt.Println(err)return}fmt.Println(string(body))}
The code above will return the following response:
{
"article": {
"author": "Sam",
"pub_date": "2019-08-09 12:42:25",
"is_article": 1,
"url": "https://ujeebu.com/blog/how-to-extract-clean-text-from-html",
"canonical_url": "https://ujeebu.com/blog/how-to-extract-clean-text-from-html/",
"title": "Extracting clean data from blog and news articles",
"site_name": "Ujeebu blog",
"favicon": "https://ujeebu.com/blog/favicon.png",
"encoding": "utf-8",
"pages": ["https://ujeebu.com/blog/how-to-extract-clean-text-from-html/"]
},
"time": 6.366053104400635,
"js": false,
"pagination": false
}
Using Proxiesā
We realize your scraping activities might be blocked once in a while. In order to help you achieve the most success we developed a multi-tiered proxy offering which lets you select the proxy type that best fits your needs
Your API calls go through our rotating proxy by default.
The default proxy uses top IPs that will get the job done most the time.
If the default rotating proxy is working great for your needs, no need to do anything.
For tougher URLs, you need to set proxy_type
to one of the following options:
rotating
: Default.advanced
: US IPs only.premium
: US IPs only. Premium proxies which work well with social media and shopping sites.residential
: Geo-targeted residential IPs which work on "tough" sites that aren't accessible with the other options. Please note that data scraped via non-US residential IPs is currently metered once requests exceeding 1MB. Keep in mind that all assets associated with an HTML page count toward this total, not just the HTML itself. Meanwhile, US Residential IPs are not metered. Please refer to the Credits section for more details.custom
: Set your own proxy. See custom proxy section
tip
We won't bill for failing requests that aren't 404s.
info
A request length also includes assets downloaded with the page when JS rendering is on.
info
To use premium proxy from a specific country, set the parameter proxy_country
to the ISO 3166-1 alpha-2
country code of one of the following:
Supported countries
- Algeria:
DZ
- Angola:
AO
- Benin:
BJ
- Botswana:
BW
- Burkina Faso:
BF
- Burundi:
BI
- Cameroon:
CM
- Central African Republic:
CF
- Chad:
TD
- Democratic Republic of the Congo:
CD
- Djibouti:
DJ
- Egypt:
EG
- Equatorial Guinea:
GN
- Eritrea:
ER
- Ethiopia:
ET
- Gabon:
GA
- Gambia:
GM
- Ghana:
GH
- Guinea:
PG
- Guinea Bissau:
GN
- Ivory Coast:
CI
- Kenya:
KE
- Lesotho:
LS
- Liberia:
LR
- Libya:
LY
- Madagascar:
MG
- Malawi:
MW
- Mali:
SO
- Mauritania:
MR
- Morocco:
MA
- Mozambique:
MZ
- Namibia:
NA
- Niger:
NE
- Nigeria:
NE
- Republic of the Congo:
CG
- Rwanda:
RW
- Senegal:
SN
- Sierra Leone:
SL
- Somalia:
SO
- Somaliland:
ML
- South Africa:
ZA
- South Sudan:
SS
- Sudan:
SD
- Swaziland:
SZ
- Tanzania:
TZ
- Togo:
TG
- Tunisia:
TN
- Uganda:
UG
- Western Sahara:
EH
- Zambia:
ZM
- Zimbabwe:
ZW
- Afghanistan:
AF
- Armenia:
AM
- Azerbaijan:
AZ
- Bangladesh:
BD
- Bhutan:
BT
- Brunei:
BN
- Cambodia:
KH
- China:
CN
- East Timor:
TL
- Hong Kong:
HK
- India:
IN
- Indonesia:
ID
- Iran:
IR
- Iraq:
IQ
- Israel:
IL
- Japan:
JP
- Jordan:
JO
- Kazakhstan:
KZ
- Kuwait:
KW
- Kyrgyzstan:
KG
- Laos:
LA
- Lebanon:
LB
- Malaysia:
MY
- Maldives:
MV
- Mongolia:
MN
- Myanmar:
MM
- Nepal:
NP
- North Korea:
KP
- Oman:
OM
- Pakistan:
PK
- Palestine:
PS
- Philippines:
PH
- Qatar:
QA
- Saudi Arabia:
SA
- Singapore:
SG
- South Korea:
KR
- Sri Lanka:
LK
- Syria:
SY
- Taiwan:
TW
- Tajikistan:
TJ
- Thailand:
TH
- Turkey:
TR
- Turkmenistan:
TM
- United Arab Emirates:
AE
- Uzbekistan:
UZ
- Vietnam:
VN
- Yemen:
YE
- Albania:
AL
- Andorra:
AD
- Austria:
AT
- Belarus:
BY
- Belgium:
BE
- Bosnia and Herzegovina:
BA
- Bulgaria:
BG
- Croatia:
HR
- Cyprus:
CY
- Czech Republic:
CZ
- Denmark:
DK
- Estonia:
EE
- Finland:
FI
- France:
FR
- Germany:
DE
- Gibraltar:
GI
- Greece:
GR
- Hungary:
HU
- Iceland:
IS
- Ireland:
IE
- Italy:
IT
- Kosovo:
XK
- Latvia:
LV
- Liechtenstein:
LI
- Lithuania:
LT
- Luxembourg:
LU
- Macedonia:
MK
- Malta:
MT
- Moldova:
MD
- Monaco:
MC
- Montenegro:
ME
- Netherlands:
NL
- Northern Cyprus:
CY
- Norway:
NO
- Poland:
PL
- Portugal:
PT
- Romania:
OM
- Russia:
RU
- San Marino:
SM
- Serbia:
RS
- Slovakia:
SK
- Slovenia:
SI
- Spain:
ES
- Sweden:
SE
- Switzerland:
CH
- Ukraine:
UA
- United Kingdom:
GB
- Bahamas:
BS
- Belize:
BZ
- Bermuda:
BM
- Canada:
CA
- Costa Rica:
CR
- Cuba:
CU
- Dominican Republic:
DM
- El Salvador:
SV
- Greenland:
GL
- Guatemala:
GT
- Haiti:
HT
- Honduras:
HN
- Jamaica:
JM
- Nicaragua:
NI
- Panama:
PA
- Puerto Rico:
PR
- Trinidad And Tobago:
TT
- United States:
US
- Australia:
AU
- Fiji:
FJ
- New Caledonia:
NC
- New Zealand:
NZ
- Papua New Guinea:
PG
- Solomon Islands:
SB
- Vanuatu:
VU
- Argentina:
AR
- Bolivia:
BO
- Brazil:
BR
- Chile:
CL
- Colombia:
CO
- Ecuador:
EC
- Falkland Islands:
FK
- French Guiana:
GF
- Guyana:
GY
- Mexico:
MX
- Paraguay:
PY
- Peru:
PE
- Suriname:
SR
- Uruguay:
UY
- Venezuela:
VE
Using Ujeebu Extract with your own proxyā
To use your custom proxy set the proxy_type
parameter to custom
then set custom_proxy
parameter to your own proxy in the following format: scheme://host:port
and set proxy credentials using
custom_proxy_username
and custom_proxy_password
parameters
info
If you're using GET http verb and custom_proxy_password
contains special chars it's better to url encode it before passing it.
curl -i \
-H 'ApiKey: <API Key>' \
-X GET \
https://api.ujeebu.com/scrape?url=url=https://ipinfo.io&response_type=raw&proxy_type=custom&custom_proxy=http://proxyhost:8889&custom_proxy_username=user&custom_proxy_password=pass
Creditsā
Credit cost per request:
Proxy Type | No JS | w/ JS | Geo Targeting | Metered |
---|---|---|---|---|
rotating | 5 | 10 | US | No |
advanced | 10 | 15 | US | No |
premium | 12 | 17 | US | No |
residential(us) | 30 | 35 | US | No |
residential | 10 | 20 | Multiple countries | +10 credits per MB after 1MB |
custom | 5 | 10 | Custom | No |
info
Consumed credits are returned in the Ujb-credits
header
Article Preview APIā
Introductionā
Extracts a preview of an article (article card). This is faster than the extract endpoint as it doesn't do any in-depth analysis of the content of the article, and instead mostly relies on its meta tags.
To use API, subscribe to a plan here and connect to :
GET https://api.ujeebu.com/card
Parametersā
Name | Type | Description | Default Value |
---|---|---|---|
url REQUIRED | string | URL of article to be extracted. | - |
js | boolean | indicates whether to execute JavaScript or not, set to 'auto' to let the extractor decide. | false |
timeout | number | maximum number of seconds before request timeout. | 60 |
js | boolean | indicates whether to execute JavaScript or not. | true |
js_timeout | number | when js is enabled, indicates how many seconds the API should wait for the JS engine to render the supplied URL. | 60 |
proxy_type | string | indicates type of proxy to use. Possible values: 'rotating', 'advanced', 'premium', 'residential', 'custom'. | rotating |
proxy_country | string | country ISO 3166-1 alpha-2 code to proxy from. Valid only when premium proxy type is chosen. | US |
custom_proxy | string | URI for your custom proxy in the following format: scheme://user:pass@host:port . applicable and required only if proxy_type=custom | null |
auto_proxy | string | enable a more advanced proxy by default when rotating proxy is not working. It will move to the next proxy option until it gets the content and will only stop when content is available or none of the options worked. Please note that you are billed only on the top option attempted. | false |
session_id | alphanumeric | alphanumeric identifier with a length between 1 and 16 characters, used to route multiple requests from the same proxy instance. Sessions remain active for 30 minutes | null |
UJB-headerName | string | indicates which headers to send to target URL. This can be useful when article is behind a paywall for example, and that you need to pass your authentication cookies. | null |
Responsesā
Status | Meaning | Description | Schema |
---|---|---|---|
200 | OK | successful operation | SuccessResponse |
400 | Bad Request | Invalid parameter value | APIResponseError |
Schemasā
Article Card Schemaā
{
"url": "string",
"lang": "string",
"favicon": "string",
"title": "string",
"summary": "string",
"author": "string",
"date_published": "string",
"date_modified": "string",
"image": "string",
"site_name": "string"
}
Propertiesā
Name | Type | Description |
---|---|---|
url | string | the URL parameter. |
lang | string | the language of the article. |
favicon | string | Domain favicon. |
title | string | the title of the article. |
summary | string | the description of the article. |
author | string | the author of the article. |
date_published | string | the publish date of the article. |
date_modified | string | the modified date of the article. |
image | string | the main image of the article. |
site_name | string | the site name of the article. |
Error Response Schemaā
{
"url": "string",
"message": "string",
"errors": ["string"]
}
Propertiesā
Name | Type | Description |
---|---|---|
url | string | Given URL |
message | string | Error message |
errors | [string] | List of all errors |
Code Exampleā
This will get the meta info for article https://ujeebu.com/blog/how-to-extract-clean-text-from-html/
- cURL
- NodeJs
- Python
- Java
- PHP
- Go
curl --location --request GET 'https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0' \--header 'ApiKey: <API Key>'
var request = require('request');var options = {'method': 'GET','url': 'https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0','headers': {'ApiKey': '<API Key>'}};request(options, function (error, response) {if (error) throw new Error(error);console.log(response.body);});
import requestsurl = "https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0"payload={}headers = {'ApiKey': '<API Key>'}response = requests.request("GET", url, headers=headers, data=payload)print(response.text)
OkHttpClient client = new OkHttpClient().newBuilder().build();Request request = new Request.Builder().url("https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0").method("GET", null).addHeader("ApiKey", "<API Key>").build();Response response = client.newCall(request).execute();
<?php$curl = curl_init();curl_setopt_array($curl, array(CURLOPT_URL => 'https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0',CURLOPT_RETURNTRANSFER => true,CURLOPT_ENCODING => '',CURLOPT_MAXREDIRS => 10,CURLOPT_TIMEOUT => 0,CURLOPT_FOLLOWLOCATION => true,CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,CURLOPT_CUSTOMREQUEST => 'GET',CURLOPT_HTTPHEADER => array('ApiKey: <API Key>'),));$response = curl_exec($curl);curl_close($curl);echo $response;
package mainimport ("fmt""net/http""io/ioutil")func main() {url := "https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0"method := "GET"client := &http.Client {}req, err := http.NewRequest(method, url, nil)if err != nil {fmt.Println(err)return}req.Header.Add("ApiKey", "<API Key>")res, err := client.Do(req)if err != nil {fmt.Println(err)return}defer res.Body.Close()body, err := ioutil.ReadAll(res.Body)if err != nil {fmt.Println(err)return}fmt.Println(string(body))}
The code above will generate the following response:
{
"author": "Sam",
"title": "Extracting clean data from blog and news articles",
"summary": "Several open source tools allow the extraction of clean text from article HTML. We list the most popular ones below, and run a benchmark to see how they stack up against the Ujeebu API",
"date_published": "2019-08-09 12:42:25",
"date_modified": "2021-05-02 20:22:34",
"favicon": ":///blog/favicon.png",
"charset": "utf-8",
"image": "https://ujeebu.com/blog/content/images/2021/05/ujb-blog-benchmark.png",
"lang": "en",
"keywords": [],
"site_name": "Ujeebu blog",
"time": 1.501387119293213
}
Usage endpointā
Introductionā
To keep track of how much credit you're using programmatically, you can use the /account
endpoint in your program.
Calls to this endpoint will not affect your calls per second, but you can only make 10 /account
calls per minute.
To use the API:
GET https://api.ujeebu.com/account
Usage Endpoint Code Exampleā
This will get the current usage of the account with the given ApiKey
- cURL
- NodeJs
- Python
- Java
- PHP
- Go
curl --location --request GET 'https://api.ujeebu.com/account' \--header 'ApiKey: <API Key>'
var request = require('request');var options = {'method': 'GET','url': 'https://api.ujeebu.com/account','headers': {'ApiKey': '<API Key>'}};request(options, function (error, response) {if (error) throw new Error(error);console.log(response.body);});
import requestsurl = "https://api.ujeebu.com/account"payload={}headers = {'ApiKey': '<API Key>'}response = requests.request("GET", url, headers=headers, data=payload)print(response.text)
OkHttpClient client = new OkHttpClient().newBuilder().build();Request request = new Request.Builder().url("https://api.ujeebu.com/account").method("GET", null).addHeader("ApiKey", "<API Key>").build();Response response = client.newCall(request).execute();
<?php$curl = curl_init();curl_setopt_array($curl, array(CURLOPT_URL => 'https://api.ujeebu.com/account',CURLOPT_RETURNTRANSFER => true,CURLOPT_ENCODING => '',CURLOPT_MAXREDIRS => 10,CURLOPT_TIMEOUT => 0,CURLOPT_FOLLOWLOCATION => true,CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,CURLOPT_CUSTOMREQUEST => 'GET',CURLOPT_HTTPHEADER => array('ApiKey: <API Key>'),));$response = curl_exec($curl);curl_close($curl);echo $response;
package mainimport ("fmt""net/http""io/ioutil")func main() {url := "https://api.ujeebu.com/account"method := "GET"client := &http.Client {}req, err := http.NewRequest(method, url, nil)if err != nil {fmt.Println(err)return}req.Header.Add("ApiKey", "<API Key>")res, err := client.Do(req)if err != nil {fmt.Println(err)return}defer res.Body.Close()body, err := ioutil.ReadAll(res.Body)if err != nil {fmt.Println(err)return}fmt.Println(string(body))}
The code above will generate the following response:
{
"balance": 0,
"days_till_next_billing": 0,
"next_billing_date": null,
"plan": "TRIAL",
"quota": "5000",
"requests_per_second": "0",
"concurrent_requests": 1,
"total_requests": 14,
"used": 95,
"used_percent": 1.9,
"userid": "8155"
}