Article Extractor API

Introduction

Ujeebu Extract converts a news or blog article into structured JSON data. It extracts the main text and html bodies, the author, publish date, any embeddable media such as YouTube and twitter cards, the RSS feeds or social feeds (Facebook/Twitter timelines or YouTube channels) among other relevant pieces of data.

To use API, subscribe to a plan here and connect to :

GET https://api.ujeebu.com/extract

Parameters

Name	Type	Description	Default Value
url REQUIRED	`string`	URL of article to be extracted.	-
raw_html	`string`	HTML of article to be extracted. When this is passed, article extraction is carried out on the value of this parameter (i.e. without fetching article from `url`), however the extractor still relies on `url` to resolve relative links and relatively referenced assets in the provided html.
js	`boolean`	indicates whether to execute JavaScript or not. Set to 'auto' to let the extractor decide.	false
text	`boolean`	indicates whether API should return extracted text.	true
html	`boolean`	indicates whether API should extract html.	true
media	`boolean`	indicates whether API should extract media.	false
feeds	`boolean`	indicates whether API should extract RSS feeds.	false
images	`boolean`	indicates whether API should extract all images present in HTML.	true
author	`boolean`	indicates whether API should extract article's author.	true
pub_date	`boolean`	indicates whether API should extract article's publish date.	true
partial	`number`	number of characters or percentage of text (if percent sign is present) of text/html to be returned. 0 means all.	0
is_article	`boolean`	when true returns the probability [0-1] of URL being an article. Anything scoring 0.5 and above should be an article, but this may slightly vary from one site to another.	true
quick_mode	`boolean`	when true, does a quick analysis of the content instead of the normal advanced parsing. Usually cuts down response time by about 30% to 60%.	false
strip_tags	`csv-string`	indicates which tags to strip from the extracted article HTML. Expects a comma separated list of tag names/css selectors.	form
timeout	`number`	maximum number of seconds before request timeout	60
js_timeout	`number`	when `js` is enabled, indicates how many seconds the API should wait for the JS engine to render the supplied URL.	`timeout`/2
scroll_down	`boolean`	indicates whether to scroll down the page or not, this applies only when `js` is enabled.	true
image_analysis	`boolean`	indicates whether API should analyse images for minimum width and height (see parameters `min_image_width` and `min_image_height` for more details).	true
min_image_width	`number`	minimum width of the images kept in the HTML (if `image_analysis` is `false` this parameter has no effect).	200
min_image_height	`number`	minimum height of the images kept in the HTML (if `image_analysis` is `false` this parameter has no effect).	100
image_timeout	`number`	image fetching timeout in seconds.	2
return_only_enclosed_text_images	`boolean`	indicates whether to return only images that are enclosed within extracted article HTML.	true
proxy_type	`string`	indicates type of proxy to use. Possible values: 'rotating', 'advanced', 'premium', 'residential', 'custom'.	rotating
proxy_country	`string`	country `ISO 3166-1 alpha-2` code to proxy from. Valid only when `premium` proxy type is chosen.	US
custom_proxy	`string`	URI for your custom proxy in the following format: `scheme://user:pass@host:port`. applicable and required only if `proxy_type=custom`	null
auto_proxy	`string`	enable a more advanced proxy by default when rotating proxy is not working. It will move to the next proxy option until it gets the content and will only stop when content is available or none of the options worked. Please note that you are billed only on the top option attempted.	false
session_id	`alphanumeric`	alphanumeric identifier with a length between 1 and 16 characters, used to route multiple requests from the same proxy instance. Sessions remain active for 30 minutes	null
pagination	`boolean`	extract and concatenate multiple-page articles.	true
pagination_max_pages	`string`	indicates the number of pages to extract when `pagination` is enabled.	30
UJB-headerName	`string`	indicates which headers to send to target URL. This can be useful when article is behind a paywall for example, and that you need to pass your authentication cookies.	null

Responses

Status	Meaning	Description	Schema
200	OK	successful operation	SuccessResponse
400	Bad Request	Invalid parameter value	APIResponseError

Schemas

Article Schema

{
  "url": "string",
  "canonical_url": "string",
  "title": "string",
  "text": "string",
  "html": "string",
  "summary": "string",
  "image": "string",
  "images": ["string"],
  "media": ["string"],
  "language": "string",
  "author": "string",
  "pub_date": "string",
  "modified_date": "string",
  "site_name": "string",
  "favicon": "string",
  "encoding": "string"
}

Properties

Name	Type	Description
url	string	the URL parameter.
canonical_url	string	the final (resolved) URL.
title	string	the title of the article.
text	string	the extracted text.
html	string	the extracted html.
summary	string	summary (if available) of the article text.
image	string	main image of the article.
images	[string]	all images present in article.
media	[string]	all media present in article.
language	string	language code of article text.
author	string	author of article.
pub_date	string	publication date of article.
modified_date	string	last modified date of article.
site_name	string	name of site hosting article.
favicon	string	favicon of site hosting article.
encoding	string	character encoding of article text.

Success Response example

{
    "article": {
        "text": "I began learning German at the age of 13, and I\u2019m still trying to explain to myself why it was love at first sound. The answer must surely be: the excellence of my teacher. At an English public school not famed for its cultural generosity, Mr King was that rare thing: a kindly and intelligent man who, in the thick of the second world war, determinedly loved the Germany that he knew was still there somewhere.\nRather than join the chorus of anti-German propaganda, he preferred, doggedly, to inspire his little class with the beauty of the language, and of its literature and culture. One day, he used to say, the real Germany will come back. And he was right. Because now it has.\nWhy was it love at first sound for me? Well...",
        "html": "<p><span>I<\/span> began learning German at the age of 13, and I’m still trying to explain to myself why it was love at first sound. The answer must surely be: the excellence of my teacher. At an English public school not famed for its cultural generosity, Mr King was that rare thing: a kindly and intelligent man who, in the thick of the second world war, determinedly loved the Germany that he knew was still there somewhere.<\/p><p>Rather than join the chorus of anti-German propaganda, he preferred, doggedly, to inspire his little class with the beauty of the language, and of its literature and culture. One day, he used to say, the real Germany will come back. And he was right. Because now it has....",
        "media": [],
        "images": [],
        "author": "John le Carr\u00e9",
        "pub_date": "2017-07-01 23:05:12",
        "is_article": 1,
        "url": "https:\/\/www.theguardian.com\/education\/2017\/jul\/02\/why-we-should-learn-german-john-le-carre",
        "canonical_url": "https:\/\/www.theguardian.com\/education\/2017\/jul\/02\/why-we-should-learn-german-john-le-carre",
        "title": "Why we should learn German | John le Carr\u00e9",
        "language": "en",
        "image": "https:\/\/i.guim.co.uk\/img\/media\/f19eff6f7e1751d88b38e725cfbe6687084d5f64\/0_235_9010_5405\/master\/9010.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdG8tb3BpbmlvbnMtYWdlLTIwMTcucG5n&enable=upscale&s=efeec857dffdb94cd84c4b652b4e287f",
        "summary": "To help make the European debate decent and civilised, it is now more important than ever to value the skills of the linguist",
        "modified_date": "2017-12-02 03:00:56",
        "site_name": "the Guardian",
        "favicon": "https:\/\/static.guim.co.uk\/images\/favicon-32x32.ico",
        "encoding": "utf-8",
        "time": 0.85
    }
}

Error Response Schema

{
  "url": "string",
  "message": "string",
  "error_code": "string",
  "errors": ["string"]
}

Properties

Name	Type	Description
url	string	Given URL
message	string	Error message
error_code	string	Error code
errors	[string]	List of all errors

Response Codes

Code	Billed	Meaning	Suggestion
200	Yes	Successful request	-
400	NO	Some required parameter is missing (URL)	Set
401	NO	Missing API-KEY	Provide API-KEY
404	YES	Provided URL not found	Provide a valid URL
408	YES	Request timeout	Increase timeout parameter, use premium proxy or force JS
429	NO	Too many requests	upgrade your plan
500	NO	Internal error	Try request or contact us

Examples

Stripping tags

If you want to delete some html element(s) before the extraction is carried out, use parameter strip_tags to pass a comma-separated list of css selectors of elements to delete. The example below will remove any meta, form and input tags as well as any element with class hidden

curl -i \
-H 'ApiKey: <API Key>' \
-X GET \
"https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&strip_tags=meta,form,.hidden,input"


curl --location --request GET 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form' \
--header 'ApiKey: <API Key>'


var request = require('request');
var options = {
  'method': 'GET',
  'url': 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form',
  'headers': {
    'ApiKey': '<API Key>'
  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});


import requests

url = "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form"

payload={}
headers = {
  'ApiKey': '<API Key>'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)


OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
Request request = new Request.Builder()
  .url("https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form")
  .method("GET", null)
  .addHeader("ApiKey", "<API Key>")
  .build();
Response response = client.newCall(request).execute();


<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'GET',
  CURLOPT_HTTPHEADER => array(
    'ApiKey: <API Key>'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;


package main

import (
  "fmt"
  "net/http"
  "io/ioutil"
)

func main() {

  url := "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0&strip_tags=aside,form"
  method := "GET"

  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, nil)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("ApiKey", "<API Key>")

  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}

Passing custom headers

The extract endpoint will forward any headers with the `UJB-` prefix to the target URL https://api.ujeebu.com/extract

curl -i \
-H 'UJB-Username: username' \
-H 'UJB-Authorisation: Basic dXNlcm5hbWU6cGFzc3dvcmQ=' \
-H 'ApiKey: <API Key>' \
-X GET \
https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html


curl --location --request GET 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0' \
--header 'UJB-User-Agent: Custom user agent' \
--header 'ApiKey: <API Key>'


var request = require('request');
var options = {
  'method': 'GET',
  'url': 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0',
  'headers': {
    'UJB-User-Agent': 'Custom user agent',
    'ApiKey': '<API Key>'
  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});


import requests

url = "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0"

payload={}
headers = {
  'UJB-User-Agent': 'Custom user agent',
  'ApiKey': '<API Key>'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)


OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
Request request = new Request.Builder()
  .url("https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0")
  .method("GET", null)
  .addHeader("UJB-User-Agent", "Custom user agent")
  .addHeader("ApiKey", "<API Key>")
  .build();
Response response = client.newCall(request).execute();


<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'GET',
  CURLOPT_HTTPHEADER => array(
    'UJB-User-Agent: Custom user agent',
    'ApiKey: <API Key>'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;


package main

import (
  "fmt"
  "net/http"
  "io/ioutil"
)

func main() {

  url := "https://api.ujeebu.com/extract?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html&js=0"
  method := "GET"

  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, nil)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("UJB-User-Agent", "Custom user agent")
  req.Header.Add("ApiKey", "<API Key>")

  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}

The code above will return the following response:

{
  "article": {

    "author": "Sam",
    "pub_date": "2019-08-09 12:42:25",
    "is_article": 1,
    "url": "https://ujeebu.com/blog/how-to-extract-clean-text-from-html",
    "canonical_url": "https://ujeebu.com/blog/how-to-extract-clean-text-from-html/",
    "title": "Extracting clean data from blog and news articles",
    "site_name": "Ujeebu blog",
    "favicon": "https://ujeebu.com/blog/favicon.png",
    "encoding": "utf-8",
    "pages": ["https://ujeebu.com/blog/how-to-extract-clean-text-from-html/"]
  },
  "time": 6.366053104400635,
  "js": false,
  "pagination": false
}

Using Proxies

We realize your scraping activities might be blocked once in a while. In order to help you achieve the most success we developed a multi-tiered proxy offering which lets you select the proxy type that best fits your needs

Your API calls go through our rotating proxy by default. The default proxy uses top IPs that will get the job done most the time. If the default rotating proxy is working great for your needs, no need to do anything. For tougher URLs, you need to set proxy_type to one of the following options:

rotating: Default.
advanced: US IPs only.
premium: US IPs only. Premium proxies which work well with social media and shopping sites.
residential: Geo-targeted residential IPs which work on "tough" sites that aren't accessible with the other options. Please note that data scraped via non-US residential IPs is currently metered once requests exceeding 1MB. Keep in mind that all assets associated with an HTML page count toward this total, not just the HTML itself. Meanwhile, US Residential IPs are not metered. Please refer to the Credits section for more details.
custom: Set your own proxy. See custom proxy section

tip

We won't bill for failing requests that aren't 404s.

info

A request length also includes assets downloaded with the page when JS rendering is on.

info

To use premium proxy from a specific country, set the parameter proxy_country to the ISO 3166-1 alpha-2 country code of one of the following:

Supported countries

Algeria: DZ
Angola: AO
Benin: BJ
Botswana: BW
Burkina Faso: BF
Burundi: BI
Cameroon: CM
Central African Republic: CF
Chad: TD
Democratic Republic of the Congo: CD
Djibouti: DJ
Egypt: EG
Equatorial Guinea: GN
Eritrea: ER
Ethiopia: ET
Gabon: GA
Gambia: GM
Ghana: GH
Guinea: PG
Guinea Bissau: GN
Ivory Coast: CI
Kenya: KE
Lesotho: LS
Liberia: LR
Libya: LY
Madagascar: MG
Malawi: MW
Mali: SO
Mauritania: MR
Morocco: MA
Mozambique: MZ
Namibia: NA
Niger: NE
Nigeria: NE
Republic of the Congo: CG
Rwanda: RW
Senegal: SN
Sierra Leone: SL
Somalia: SO
Somaliland: ML
South Africa: ZA
South Sudan: SS
Sudan: SD
Swaziland: SZ
Tanzania: TZ
Togo: TG
Tunisia: TN
Uganda: UG
Western Sahara: EH
Zambia: ZM
Zimbabwe: ZW
Afghanistan: AF
Armenia: AM
Azerbaijan: AZ
Bangladesh: BD
Bhutan: BT
Brunei: BN
Cambodia: KH
China: CN
East Timor: TL
Hong Kong: HK
India: IN
Indonesia: ID
Iran: IR
Iraq: IQ
Israel: IL
Japan: JP
Jordan: JO
Kazakhstan: KZ
Kuwait: KW
Kyrgyzstan: KG
Laos: LA
Lebanon: LB
Malaysia: MY
Maldives: MV
Mongolia: MN
Myanmar: MM
Nepal: NP
North Korea: KP
Oman: OM
Pakistan: PK
Palestine: PS
Philippines: PH
Qatar: QA
Saudi Arabia: SA
Singapore: SG
South Korea: KR
Sri Lanka: LK
Syria: SY
Taiwan: TW
Tajikistan: TJ
Thailand: TH
Turkey: TR
Turkmenistan: TM
United Arab Emirates: AE
Uzbekistan: UZ
Vietnam: VN
Yemen: YE
Albania: AL
Andorra: AD
Austria: AT
Belarus: BY
Belgium: BE
Bosnia and Herzegovina: BA
Bulgaria: BG
Croatia: HR
Cyprus: CY
Czech Republic: CZ
Denmark: DK
Estonia: EE
Finland: FI
France: FR
Germany: DE
Gibraltar: GI
Greece: GR
Hungary: HU
Iceland: IS
Ireland: IE
Italy: IT
Kosovo: XK
Latvia: LV
Liechtenstein: LI
Lithuania: LT
Luxembourg: LU
Macedonia: MK
Malta: MT
Moldova: MD
Monaco: MC
Montenegro: ME
Netherlands: NL
Northern Cyprus: CY
Norway: NO
Poland: PL
Portugal: PT
Romania: OM
Russia: RU
San Marino: SM
Serbia: RS
Slovakia: SK
Slovenia: SI
Spain: ES
Sweden: SE
Switzerland: CH
Ukraine: UA
United Kingdom: GB
Bahamas: BS
Belize: BZ
Bermuda: BM
Canada: CA
Costa Rica: CR
Cuba: CU
Dominican Republic: DM
El Salvador: SV
Greenland: GL
Guatemala: GT
Haiti: HT
Honduras: HN
Jamaica: JM
Nicaragua: NI
Panama: PA
Puerto Rico: PR
Trinidad And Tobago: TT
United States: US
Australia: AU
Fiji: FJ
New Caledonia: NC
New Zealand: NZ
Papua New Guinea: PG
Solomon Islands: SB
Vanuatu: VU
Argentina: AR
Bolivia: BO
Brazil: BR
Chile: CL
Colombia: CO
Ecuador: EC
Falkland Islands: FK
French Guiana: GF
Guyana: GY
Mexico: MX
Paraguay: PY
Peru: PE
Suriname: SR
Uruguay: UY
Venezuela: VE

Using Ujeebu Extract with your own proxy

To use your custom proxy set the proxy_type parameter to custom then set custom_proxy parameter to your own proxy in the following format: scheme://host:port and set proxy credentials using custom_proxy_username and custom_proxy_password parameters

info

If you're using GET http verb and custom_proxy_password contains special chars it's better to url encode it before passing it.

curl -i \
-H 'ApiKey: <API Key>' \
-X GET \
https://api.ujeebu.com/scrape?url=url=https://ipinfo.io&response_type=raw&proxy_type=custom&custom_proxy=http://proxyhost:8889&custom_proxy_username=user&custom_proxy_password=pass

Credits

Credit cost per request:

Proxy Type	No JS	w/ JS	Geo Targeting	Metered
rotating	5	10	US	No
advanced	10	15	US	No
premium	12	17	US	No
residential(us)	30	35	US	No
residential	10	20	Multiple countries	+10 credits per MB after 1MB
custom	5	10	Custom	No

info

Consumed credits are returned in the Ujb-credits header

Article Preview API

Introduction

Extracts a preview of an article (article card). This is faster than the extract endpoint as it doesn't do any in-depth analysis of the content of the article, and instead mostly relies on its meta tags.

To use API, subscribe to a plan here and connect to :

GET https://api.ujeebu.com/card

Parameters

Name	Type	Description	Default Value
url REQUIRED	`string`	URL of article to be extracted.	-
js	`boolean`	indicates whether to execute JavaScript or not, set to 'auto' to let the extractor decide.	false
timeout	`number`	maximum number of seconds before request timeout.	60
js	`boolean`	indicates whether to execute JavaScript or not.	true
js_timeout	`number`	when `js` is enabled, indicates how many seconds the API should wait for the JS engine to render the supplied URL.	60
proxy_type	`string`	indicates type of proxy to use. Possible values: 'rotating', 'advanced', 'premium', 'residential', 'custom'.	rotating
proxy_country	`string`	country `ISO 3166-1 alpha-2` code to proxy from. Valid only when `premium` proxy type is chosen.	US
custom_proxy	`string`	URI for your custom proxy in the following format: `scheme://user:pass@host:port`. applicable and required only if `proxy_type=custom`	null
auto_proxy	`string`	enable a more advanced proxy by default when rotating proxy is not working. It will move to the next proxy option until it gets the content and will only stop when content is available or none of the options worked. Please note that you are billed only on the top option attempted.	false
session_id	`alphanumeric`	alphanumeric identifier with a length between 1 and 16 characters, used to route multiple requests from the same proxy instance. Sessions remain active for 30 minutes	null
UJB-headerName	`string`	indicates which headers to send to target URL. This can be useful when article is behind a paywall for example, and that you need to pass your authentication cookies.	null

Responses

Status	Meaning	Description	Schema
200	OK	successful operation	SuccessResponse
400	Bad Request	Invalid parameter value	APIResponseError

Schemas

Article Card Schema

{
  "url": "string",
  "lang": "string",
  "favicon": "string",
  "title": "string",
  "summary": "string",
  "author": "string",
  "date_published": "string",
  "date_modified": "string",
  "image": "string",
  "site_name": "string"
}

Properties

Name	Type	Description
url	string	the URL parameter.
lang	string	the language of the article.
favicon	string	Domain favicon.
title	string	the title of the article.
summary	string	the description of the article.
author	string	the author of the article.
date_published	string	the publish date of the article.
date_modified	string	the modified date of the article.
image	string	the main image of the article.
site_name	string	the site name of the article.

Error Response Schema

{
  "url": "string",
  "message": "string",
  "errors": ["string"]
}

Properties

Name	Type	Description
url	string	Given URL
message	string	Error message
errors	[string]	List of all errors

Code Example

This will get the meta info for article https://ujeebu.com/blog/how-to-extract-clean-text-from-html/


curl --location --request GET 'https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0' \
--header 'ApiKey: <API Key>'


var request = require('request');
var options = {
  'method': 'GET',
  'url': 'https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0',
  'headers': {
    'ApiKey': '<API Key>'
  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});


import requests

url = "https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0"

payload={}
headers = {
  'ApiKey': '<API Key>'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)


OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
Request request = new Request.Builder()
  .url("https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0")
  .method("GET", null)
  .addHeader("ApiKey", "<API Key>")
  .build();
Response response = client.newCall(request).execute();


<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'GET',
  CURLOPT_HTTPHEADER => array(
    'ApiKey: <API Key>'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;


package main

import (
  "fmt"
  "net/http"
  "io/ioutil"
)

func main() {

  url := "https://api.ujeebu.com/card?url=https://ujeebu.com/blog/how-to-extract-clean-text-from-html/&js=0"
  method := "GET"

  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, nil)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("ApiKey", "<API Key>")

  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}

The code above will generate the following response:

{
  "author": "Sam",
  "title": "Extracting clean data from blog and news articles",
  "summary": "Several open source tools allow the extraction of clean text from article HTML. We list the most popular ones below, and run a benchmark to see how they stack up against the Ujeebu API",
  "date_published": "2019-08-09 12:42:25",
  "date_modified": "2021-05-02 20:22:34",
  "favicon": ":///blog/favicon.png",
  "charset": "utf-8",
  "image": "https://ujeebu.com/blog/content/images/2021/05/ujb-blog-benchmark.png",
  "lang": "en",
  "keywords": [],
  "site_name": "Ujeebu blog",
  "time": 1.501387119293213
}

Usage endpoint

Introduction

To keep track of how much credit you're using programmatically, you can use the /account endpoint in your program.

Calls to this endpoint will not affect your calls per second, but you can only make 10 /account calls per minute.

To use the API:

GET https://api.ujeebu.com/account

Usage Endpoint Code Example

This will get the current usage of the account with the given ApiKey


curl --location --request GET 'https://api.ujeebu.com/account' \
--header 'ApiKey: <API Key>'


var request = require('request');
var options = {
  'method': 'GET',
  'url': 'https://api.ujeebu.com/account',
  'headers': {
    'ApiKey': '<API Key>'
  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});


import requests

url = "https://api.ujeebu.com/account"

payload={}
headers = {
  'ApiKey': '<API Key>'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)


OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
Request request = new Request.Builder()
  .url("https://api.ujeebu.com/account")
  .method("GET", null)
  .addHeader("ApiKey", "<API Key>")
  .build();
Response response = client.newCall(request).execute();


<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.ujeebu.com/account',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'GET',
  CURLOPT_HTTPHEADER => array(
    'ApiKey: <API Key>'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;


package main

import (
  "fmt"
  "net/http"
  "io/ioutil"
)

func main() {

  url := "https://api.ujeebu.com/account"
  method := "GET"

  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, nil)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("ApiKey", "<API Key>")

  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}

The code above will generate the following response:

{
    "balance": 0,
    "days_till_next_billing": 0,
    "next_billing_date": null,
    "plan": "TRIAL",
    "quota": "5000",
    "concurrent_requests": 1,
    "total_requests": 14,
    "used": 95,
    "used_percent": 1.9,
    "userid": "8155"
}

Introduction​

Parameters​

Responses​

Schemas​

Article Schema​

Properties​

Success Response example​

Error Response Schema​

Properties​

Response Codes​

Examples​

Stripping tags​

Passing custom headers​

Using Proxies​

tip

info

info

Using Ujeebu Extract with your own proxy​

info

Credits​

info

Article Preview API​

Introduction​

Parameters​

Responses​

Schemas​

Article Card Schema​

Properties​

Error Response Schema​

Properties​

Code Example​

Usage endpoint​

Introduction​

Usage Endpoint Code Example​

Introduction

Parameters

Responses

Schemas

Article Schema

Properties

Success Response example

Error Response Schema

Properties

Response Codes

Examples

Stripping tags

Passing custom headers

Using Proxies

Using Ujeebu Extract with your own proxy

Credits

Article Preview API

Introduction

Parameters

Responses

Schemas

Article Card Schema

Properties

Error Response Schema

Properties

Code Example

Usage endpoint

Introduction

Usage Endpoint Code Example