Welcome to Python-Ciur¶
Ciur is a scrapper layer in development
Ciur is a lib because it has less black magic than a framework
It exports all scrapper related code into separate layer.
If you are annoyed by Spaghetti code, SQL inside PHP or inline CSS inside HTML THEN you also are annoyed by XPATH/CSS selectors code inside crawler.
Ciur gives the taste of Lasagna code generally by enforcing encapsulation for scrapping layer.
It tries to not repeat the bad code.
What does CIUR mean?¶
Ciur is Romanian for Sieve.
It fulfils the same purpose in the sense of being a
device for separating wanted elements from unwanted material
.
Python ciur API¶
>>> import ciur
>>> from ciur.shortcuts import pretty_parse_from_resources
>>> with ciur.open_file("example.org.ciur", __file__) as f:
... print pretty_parse_from_resources(
... f,
... "http://example.org"
... )
{
"root": {
"name": "Example Domain",
"paragraph": "This domain is established to be used for illustrative examples in documents. You may use this\n domain in examples without prior coordination or asking for permission."
}
}
Samples of usage:
- Say Hello World in ciur language with http://www.example.org
- Container Docker + lambda amazon + Ciur combination for cuir
- Exchange money rates world wide parsers based on Ciur –> parsing world wide (40 sources, 4 country) currency exchange rates.
- https://bitbucket.org/ada/ciur.example.social –> parsing networking sites (such as Facebook, Linkedin, Xing ...) (not yet ready for open realease)
For Developers:¶
- Local Python Virtual environment for cuir
TODO:¶
demo on cloud9
build documentation on readthedocs
http://lxml.de/lxmlhtml.html#parsing-html
.cssselect(expr):
.base_url:
Ciur Documentation¶
If you can’t find the information you’re looking for, have a look at the index or try to find it using the search function: