Pages in topic: < [1 2] | Lingvopoint - reduce profile info? Thread poster: Deirdre Brophy (X)
| Dan Lucas United Kingdom Local time: 20:25 Member (2014) Japanese to English Automating OCR is not difficult | Nov 19, 2014 |
Triston Goodwin wrote: They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl. It's probably easier than you think. For example, the PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual. In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them. Dan | | | Automated OCR | Nov 19, 2014 |
Dan Lucas wrote: Triston Goodwin wrote: They would have to come to each of these adapted profiles individually, take a screenshot, run it through a OCR and then upload the information to their site. And that's assuming that they are able to identify which profiles weren't scanned during the crawl. It's probably easier than you think. For example, the PyTesser Python module uses the Tesseract OCR engine, so you could scrape a profile using something like Beautiful Soup and check to see if the profile contains any large images. If it does, pass the image to the OCR engine and parse the (text) result. Not perfect, but likely good enough. Otherwise parse the profile text as usual. In theory - I say that because it is ProZ, not ourselves, who controls the site - we have two obvious choices. First, try to close the site completely, which would, whatever some members may think, have a chilling effect on site use. Second, just accept that scammers are a fact of life in any profession that delivers intangible services over the internet, and work round them. Dan I think you're right. I personally lean more towards the second option. I haven't seen this kind of automated OCR tool before. Using an image might still be effective at first, since it's not something we really see here on Proz. I know Google sure had a hard time with my profile when I used an image instead of text for my About Me a few months ago. | | | Thayenga Germany Local time: 21:25 Member (2009) English to German + ... Additionally | Nov 20, 2014 |
Maija Cirule wrote: As a preventive action, I have included in my "About me" text the following sentence: For business correspondence, I use ONLY the EMAIL address WITH THE DOMAIN NAME specified in my profile, no gmail, yahoo, hotmail, etc., therefore, any my business-related e-mails from free email addresses are INVALID. Besides, I have encrypted my CV (of course, it can be typed but cannot be copied or edited). And last but not the least: never ever include your e-mail address in your CV or elsewhere My CV's are not publicly available, only upon request, and then they include no sensitive information. Address, Skype, location, email address, etc. will be provided upon first job assignment on my invoice. This might "scare off" a few possible customers, but if an agency or an end-client is serious and legitimate, they understand these precautions that protect both parties. Additionally I have password-protected my PDF business brochures so they cannot be copied or printed - only typed if someone has the time. They also have my name in text fields/watermarks across the pages so that screenshots cannot be "marketed". | | | DLyons Ireland Local time: 20:25 Spanish to English + ... Can be bypassed | Nov 20, 2014 |
Thayenga wrote: Additionally I have password-protected my PDF business brochures so they cannot be copied or printed - only typed if someone has the time. They also have my name in text fields/watermarks across the pages so that screenshots cannot be "marketed". It's not hard to get around password-protection on PDFs. But time is money to scammers, so usually they just ignore anything that takes extra effort and move on to someone else. | | | Pages in topic: < [1 2] | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Lingvopoint - reduce profile info? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |