System for managing access to web application resources based on user behavior analysis
Abstract
Web-scraping is a process of extracting data from web-pages on the Internet by automating web-sites requests. Importance of web-scraping is increased with developing of the Internet. This is evidenced by the appearance of vacancies in companies where it is necessary to develop protection tools against web scrapers and articles about malicious activity of web-scrapers.
The article studies the behavior of web-scrapers. The characteristic features of these programs are highlighted. A method for collecting and analyzing user behavior data to identify web-scrapers is proposed.
The Django web-framework module has been developed for defining web-scraper programs. The module is able to collect and analyze data about user behavior on a website. Web-scrapers have been created to collect data and test the module's operation.
Full Text:
PDF (Russian)References
https://habr.com/ru/company/habr_career/blog/499740/
https://bitbucket.org/mascai/scraperclassifier/src/master/main.py
Zhong J. Kind of Identity Authentication Method Based on Browsing Behaviors. Seventh International Symposium on Computational Intelligence and Design. Hangzhou. 2014. P. 279-284.
Vishnevsky A. S. Content based attack detection in web-oriented honeypots. Russia. Cybersecurity issues № 3, 2018.
R. Mitchell, Web Scraping with Python. USA.: O’Reilly Media, 2015.
G. Hajba, Website Scraping with Python: Using BeautifulSoup. USA.: O’Reilly Media, 2018.
G. Nair, Getting Started with Beautiful Soup. USA.: Packt Publishing, 2014.
M. Shrenk, Webbots, spiders, and screen scrapers. USA.: Packt Publishing, 2012.
Buelta, Python Automation Cookbook. USA.: Packt Publishing, 2018.
D. Koundal Ontology Based Crawler: Semantic web application USA.: Lambert, 2013.
Emilio Ferraraa,. Web data extraction, applications and techniques: A survey. Knowledge-Based Systems, Band 70, pp. 301-323., 2014.
Hai Liang,. Big Data, Collection of (Social Media, Harvesting). The International Encyclopedia of Communication Research Methods., pp. 1-18., 2017.
J. Hirschey, Symbiotic Relationships: Pragmatic Acceptance of Data Scraping. Berkeley Technology Law Journal, 2014.
Huan Liu, The good, the bad, and the ugly: uncovering novel research opportunities in social media mining. International Journal of Data Science and Analytics, 1(3-4), pp. 137-143., 2016.
Jakob G. Thomsen, WebSelF: A Web Scraping Framework, 2015.
John J. Salerno, Method and apparatus for improved web scraping. United States of America, Patentnr. 2003.
G. Joyce, Data Reveals the GRAMMYs 2017 Highlights on Social Media. 2017.
S. Kalvar, Is scraping and crawling to collect data illegal? USA, 2017.
A. Rezai, Beware of the Spiders: Web Crawling and Screen Scraping – the Legal Position, 2017.
Raulamo-Jurvanen. Using Surveys and Web-Scraping to Select Tools for Software Testing Consultancy. In: Lecture Notes in Computer Science, USA, 2016.
R. Putri, Web scraping for automated water quality monitoring system. Indonesia, 2016.
G. Waddell, Web Scraping and Analyzing Craigslist Rental Listings. Journal of Planning Education and Research, pp. 1-20., 2016.
P. Adamuz, Development of a generic test-bed for web scraping. Barcelona, 2015.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162