Optimizing Flash for Search Engines

by Collette McNeill <collette(at)mlwebworks.com>

 Date:  Wed, 07 May 2003 05:12:33 -0700
 To:  hwg-techniques(at)mail.hwg.org
  todo: View Thread, Original


Hi list, I thought I'd forward a useful article.
http://www.searchenginewatch.com/searchday/article.php/2200921

Optimizing Flash for Search Engines

By <mailto:craigfif(at)microsoft.com>Craig Fifield, Guest Writer
May 6, 2003

A special report from the Search Engine Strategies conference in Boston, 
MA, March 4-6, 2003.

Macromedia Flash and other non-HTML formats can pose problems for search 
engines, unless you take appropriate steps to optimize the content.

"Search engines were originally built to index and serve HTML documents," 
said Tim Mayer, Vice President of Web Search at FAST. "Now the web has 
become more diverse in content types, knowing how to treat Flash and other 
types of content has become more important for search engines."

"These other content types present different challenges to the search 
engines," Mayer continued. "For example, Flash files generally contain too 
little text whereas PDF documents contain too much text. The technology to 
include differing content types and score them appropriately will become 
even more important as new areas in web search become more important -- 
such as real time data which will provide the challenge of lacking inbound 
links."

Participants in this panel discussed how crawlers interact with non-HTML 
content, and offered a number of workarounds and optimization tips.

Search Engines and Flash

Flash is the leading vector graphics technology for creating design-focused 
web sites. Over 98 percent of Internet users can view Flash content with 
the Flash player software already installed in their browsers. Over 490 
million people use the Flash player.

Gregory Markel, Founder/President of Infuse Creative, an entertainment and 
technology consulting company, discussed issues related to search engine 
visibility and Flash sites. "The good news is that FAST Search and Google 
can follow embedded links within the [Flash] files," he said.

FAST built its Flash indexing capabilities using the Macromedia's Flash 
search engine software developer's kit (SDK). The SDK was designed to 
convert a Flash file's text and links into HTML for indexing.

"Not all search engine spiders have the ability to crawl or index Flash, he 
said. "As far as I am able to determine, Google has not included the 
Flash-SDK setup for indexing, like FAST. But Google can follow embedded links."

Markel warned that the Macromedia's SDK solution is far from perfect. "All 
it does is it takes whatever [content] is there, and converts it to an HTML 
version. But the converted HTML doesn't include anything you actually need 
to do well in the search engines. No title tags, alt tags, body text, etc. 
SDK is a step in the right direction, but has a long way to go."

"One of the big problems with Flash content is that it's very hard to 
find," stated Tim Mayer. "We have a lot of Flash content in the FAST index, 
though I've rarely come across a Flash file, myself, in the main search 
results."

One of the reasons for the paucity of optimized Flash files is that the 
search engine industry hasn't adopted SDK as the standard, explained Mayer. 
"The SEOs out there don't know that we're actually going to index their 
files," he said, "so they don't prepare them in an optimized way (for the 
SDK). This will change as more search engines adopt this."

Mayer recommended that webmasters restrict the text to what they want 
indexed. For example, making the "skip intro" a graphic file is better than 
making it a text link. "Keep what you want as indexable as text," said 
Mayer. "Make what you don't want as Flash graphics. We will also take out 
links from Flash sites and follow them."

Multimedia files and Search Engines

Few search engines provide search for audio and video file formats. 
Currently, AltaVista, FAST Search, and Singingfish support the following 
multimedia formats: Windows Media (Windows Media Encoder), RealMedia, MP3, 
and Quicktime.

Multimedia content can provide a better user experience in some industries.

"Multimedia content enables an immersive and emotive user experience beyond 
text-based content," said Ken Berkun, Founder and VP of Strategy at 
Singingfish. "A 30 second music clip is a strong advertisement for a CD."

When you create a multimedia file, you have the opportunity to give it 
metadata. "Give each file a title, copyright stream, author, description, 
and keywords, said Berkun. "Every single one of these medias have these 
fields."

"You would be surprised at how little that's done," continued Berkun." The 
most popular titles in our database is the default - nothing. People spend 
thousands or even hundreds of thousands of dollars on the production and 
don't even take the time to add a title."

In addition to having metadata in the multimedia file, Berkun also advised 
to have actual content on the HTML page that contains the file. Include 
accurate anchor text and an around multimedia files. And make sure that 
your entire web site is spiderable.

"Singingfish does spider HTML sites with multimedia," he said. "We handle 
frames pretty well, but we've had mixed results with JavaScript. We can't 
handle Flash, although we hope to get to it someday."

PDF Documents and Search Engines

Shari Thurow, Marketing Director of Grantastic Designs, a full-service 
search engine marketing and design firm, addressed PDF files and the search 
engines. "PDF stands for portable document format, which is a universal 
file format that preserves fonts, colors, graphic images, and formatting of 
any source document," she said.

Unlike Flash documents, PDF documents frequently appear within regular 
search results. Thurow stated that she typically finds PDF-formatted 
technical manuals, white papers, press kits, and spec sheets in the top 30 
search engine results pages in both Google and FAST Search. Because of 
this, Thurow commonly optimizes PDF documents for various industries.

Thurow also stressed the importance of using the Robots Exclusion Protocol 
on some PDF documents. "Search engines have made it clear that they do not 
want redundant content in their indices," she said. "Even having both a PDF 
and HTML version of the same content is redundant. For that reason, I place 
the Robots Exclusion Protocol on the PDF version. HTML format is better for 
the search engines, anyway, since HTML files tend to be smaller in file size."

PDF documents can be submitted through the normal submit URL forms and the 
paid inclusion programs at the major search engines. 

HWG hwg-techniques mailing list archives, maintained by Webmasters @ IWA