Contributors: Adrien Barbaresi
Name: Adrien Barbaresi
Login: adbar
Email: [email protected]
Description: Research scientist – natural language processing, web scraping and text analytics. Mostly with Python.
Website: adrien.barbaresi.eu
Location: Berlin
Twitter: adbarbaresi
Company: Berlin-Brg. Academy of Sciences (BBAW)
Repositories count: 37
Last synced at: 2024-06-11T15:59:25.206Z
Total stars: 3746
Followers: 360
Following: 281
OST Projects: 1
Languages: Python
Categories: Consumption
Sub Categories: Computation and Communication
Topics: cleaner, crawler, domain, preprocessing, rate-limiting, tld, uri, url-parsing, url-validation, webcrawling, article-extractor, corpus, corpus-builder, corpus-tools, html-to-markdown, html2text, news-aggregator, news-crawler, readability, rss-feed, scraping, tei, text-cleaning, text-extraction, text-preprocessing, web-scraping, detect-language, langid, language-detection, language-identification, language-recognition, whatlang, date-parser, datetime, entity-extraction, html-parsing, information-extraction, lxml, metadata-extraction, parsing, webscraping, lemmatiser, lemmatization, lemmatizer, low-resource-nlp, morphological-analysis, tokenization, tokenizer, wordlist
Contributed Projects
energyusage
A Python package that measures the environmental impact of computation.
Consumption - Computation and Communication - Last synced: 28 Apr 2025 - Ranking: 14.2
