Incazelo ye-Spidering ne-Web Crawlers

Spider & Web Crawlers: Okudingeka Ukwazi Ukuvikela Iwebhusayithi Yedatha

Izigaxekile yizinhlelo (noma izikripthi ezizenzakalelayo) 'ezishaya' ngeWebhu efuna idatha. Izipayipi zihamba nge-URL yewebhu futhi zingadonsa idatha kumakhasi ewebhu afana namakheli e-imeyili. Izipayipi nazo zisetshenziselwa ukondla ulwazi olutholakala kumawebhusayithi ezitsheni zokusesha.

Izigaxekile, ezibizwa nangokuthi 'abakwa-web crawlers' bafuna iWebhu futhi akubona bonke abanobungane ngenhloso yabo.

Amawebhusayithi e-Spammers Spider ukuqoqa ulwazi

I-Google, Yahoo!

futhi ezinye izinjini zokusesha akuzona kuphela ezithakazelisayo kumawebusayithi aqhamukayo - ngakho-ke ziyi-scammers ne-spammers.

Ama-Spiders namanye amathuluzi azenzakalelayo asetshenziswa abakwa-spammeri ukuthola amakheli e-imeyili (kwi-intanethi lo mkhuba uvame ukubizwa ngokuthi 'ukuvuna' kumawebhusayithi bese uwasebenzisa ukudala uhlu logaxekile.

Izipayipi futhi ziyithuluzi elisetshenziselwa izinjini ukuthola ulwazi oluthe xaxa mayelana newebhusayithi yakho kodwa lingashiywanga, i-website ngaphandle kwemilayezo (noma, 'izimvume') ngendlela yokukhipha isayithi lakho ingaletha izingozi ezinkulu zokuphepha kolwazi. Izipayipi zihamba ngokulandela izixhumanisi, futhi zikwazi kakhulu ukuthola izixhumanisi ezigciniwe, amafayela ohlelo, nolunye ulwazi ongafuni ukuthi bafinyelele kulo.

Abaphathi bewebhu bangabuka amalogi ukuze babone ukuthi iziphi izigubhu nezinye i-robot zivakashele amasayithi abo. Lolu lwazi luza abakwa-webmasters bazi ukuthi ngubani okhomba indawo yabo, nokuthi kaningi kangakanani.

Lolu lwazi luwusizo ngoba luvumela abakwa-webmasters ukuba bahlahlele i-SEO yabo futhi babuyekeze amafayela we-robot.txt ukwenqabela ama-robot athile ekukhayeni isayithi lawo esikhathini esizayo.

Amathiphu Ekuvikeleni Iwebhusayithi Yakho Kusuka Abaqambi Bama Robot abangafuneki

Kukhona indlela elula yokugcina abagibeli abangadingeki bevela kuwebhusayithi yakho. Ngisho noma ungakhathazeki ngezinsipho ezinonya ezishaya isayithi lakho (ukufaka ikheli le-imeyili ngokungakusizi ngeke kukuvikele kunabo bonke abashayeli), kufanele usadingeke unikeze izinjini zokusesha ngemiyalo ebalulekile.

Wonke amawebhusayithi kufanele abe nefayela elisekhompyutheni yezimpande ebizwa ngefayela le-robots.txt. Leli fayela likuvumela ukuthi ufundise abakwa-web crawlers lapho ufuna ukuthi babheke emakhasini wezinkomba (ngaphandle uma kuchazwe ngenye idatha ematha yekhasi elithile ukuthi lingabikho-indexed) uma kuyi-injini yokusesha.

Njengoba nje ungatshela abakwa-crawlers abafuna ukuwahlola, ungabatshela ukuthi bangabani futhi baze bavimbele abakwa-crawlers abathile kusuka kuwebhusayithi yakho yonke.

Kubalulekile ukukhumbula ukuthi ifomu elihlanganisiwe ifayela le robots.txt liyoba nenani elikhulu lezinjini futhi lize libe yinto ebalulekile ekuthuthukiseni ukusebenza kwewebhusayithi yakho, kodwa ezinye izigebengu ze-robot zizobe zingayinaki imiyalo yakho. Ngenxa yalesi sizathu, kubalulekile ukugcina zonke izinhlelo zakho zokusebenza, ama-plugin, nezinhlelo zokusebenza kusesikhathini sonke.

Izihloko Ezihlobene Nolwazi

Ngenxa yokwanda kolwazi lokuvuna olusetshenziselwa izinhloso ezifanele (spam), umthetho wanyuswa ngo-2003 ukwenza imikhuba ethile engekho emthethweni. Le mithetho yokuvikela abathengi iwela ngaphansi kwe-CAN-SPAM Act ka-2003.

Kubalulekile ukuthi uthathe isikhathi sokufunda ku-CAN-SPAM Umthetho uma ibhizinisi lakho lisebenza kunoma yikuphi ukuthunyelwa kwe-mass noma ukuvuna ulwazi.

Ungathola kabanzi mayelana nemithetho yokulwa nogaxekile nokuthi ungabhekana kanjani nogaxekile, nokuthi yini ongumnikazi webhizinisi ongayenza, ngokufunda izihloko ezilandelayo: